re:Invent Amazon has introduced a new generation of SageMaker at the re:Invent conference in Las Vegas, bringing together analytics and AI, though with some confusion thanks to the variety of services that bear the SageMaker name.
SageMaker Unified Studio, now in preview, covers model development, data, analytics, and building generative AI applications.
However the old SageMaker remains, now renamed as SageMaker AI, and that has a studio too, distinct from the new one – and also a classic version that is still available. The difference is that SageMaker AI has a narrower focus, on building and training ML models. That said, SageMaker AI is also considered part of Unified Studio, as is Bedrock, a tool for building generative AI applications. Unified Studio can also be used programmatically, via the DataZone API.
Further capabilities for Unified Studio are planned, including access to streaming data such as that from Amazon Kinesis, integration with Amazon Quicksight business intelligence, and with OpenSearch search analytics (Amazon’s fork of Elasticsearch and Kibana).
According to G2 Krishnamoorthy, VP AWS database services, the core of the next-generation SageMaker is Lakehouse, a service introduced here at re:Invent. “We have built an open interoperable data foundation that is very easy for customers to manage,” Krishnamoorthy told us.
SageMaker Lakehouse combines data in S3 data lakes and Redshift (AWS data warehouse) so it can be queried with SQL as an Apache Iceberg database using tools including AWS Athena or Apache Spark. Lakehouse also supports connections to DynamoDB, Google BigQuery, MySQL, PostgreSQL and Snowflake. Data can be imported or analyzed in place. Via Lakehouse and Unified Studio, the same data can be used for analytics as well as for machine learning and developing generative AI applications.
Brian Ross, AWS head of engineering: analytics builder experience, said at a session attended by The Register: “customers say that their analytics workloads are getting bigger, their machine learning workloads are getting bigger, now their generative AI workloads are getting bigger, and they’re also starting to converge.”
The same data is used for analytics, training models, and building knowledgebases for generative AI. “The big challenge with data is trying to find it. It sits somewhere within the organization but where is it? How do I get access to it?” said Ross. He reckons customers tended to build their own enterprise data platforms to solve this problem, using AWS services and tools, but this was costly whereas the new SageMaker offers “a single end to end experience” that supported all these different uses.
SageMaker includes low code / no code tools but it is still aimed at what AWS terms “builders” rather than business users. The latter are directed towards Amazon Q Business apps and Amazon Quicksight dashboards, Krishnamoorthy told us.
SageMaker capabilities introduced at re:Invent also include flexible training plans for HyperPod, a service introduced a year ago that manages the infrastructure for training models. Using flexible training plans, the user specifies the accelerated compute resources required and the start and end date limits. HyperPod will then propose a detailed schedule and calculate the cost.
It appears that there is high demand for accelerated compute and re:Invent attendees were told that using HyperPod is the best way to secure these resources, by taking account of periods of lower usage.
Q Developer, Amazon’s AI assistant, is embedded into SageMaker Unified Studio. AWS has also added Q Developer to SageMaker Canvas, a SageMaker AI tool for building ML models, for a chat-based user interface for selecting a model type, uploading data, preparing the data, testing and deploying.
Pricing is according to the typical AWS model. There is no charge for using SageMaker Unified Studio itself, but most actions consume other AWS resources which will be charged at their usual rate, though some have a free tier which is shown on the SageMaker pricing page. There is some risk, perhaps, that careless experimentation will run up a large bill.
Amazon SageMaker was first introduced seven years ago as a service for data scientists and developers, part of the AWS Management Console. SageMaker offered a simple user interface for selecting training data, selecting a machine learning model, training the model, and deploying it to a cluster of Amazon EC2 instances.
Today’s SageMaker not only has more features, but its scope is expanded. The naming can be confusing, with the overall SageMaker platform including products that are also well known in their own right. Why is it all called SageMaker?
“The world of analytics and AI is coming together. So we thought it’s fitting for us to say that, the new expanded SageMaker platform is the product or product suite for all data analytics and AI … so that’s the naming confusion,” said Krishnamoorthy. “The alternative would have been to come up with a new name, as Microsoft did with Fabric, and then you have to teach everybody all the components that are in there.” ®