What is AWS SageMaker?
Amazon SageMaker is an end-to-end, cloud-based machine learning service. It allows you to:
- Build and train machine learning models
- Deploy them into a production-ready hosted environment
- Monitor and manage the model during its lifecycle
SageMaker integrates with Jupyter Notebook, letting you set up managed notebook instances and easily connect them to data sources for exploration. It provides many common machine learning algorithms optimized to run efficiently in a distributed environment, and also lets you bring your own algorithms and frameworks you use today.
SageMaker provides a UI interface called SageMaker Studio that lets you manage the entire machine learning workflow for your models, and the SageMaker console which lets you create new notebook instances and operate the SageMaker service.
We’ll provide a tutorial showing how to bring your own algorithm and deploy it via the SageMaker console, without using SageMaker Studio.
Related content: Read our guide to AWS SageMaker
In this article:
- How Does Amazon SageMaker Work?
- ~ Prepare and Build AI Models
- ~ Train and Tune
- ~ Deploy and Analyze
- AWS Sagemaker Tutorial: Training Your Own Model and Deploying to Production
- ~ Step 1: Create SageMaker Notebook Instance and Prepare Data
- ~ Step 2: Train the XGBoost Model
- ~ Step 3: Deploy XGBoost Model
- Machine Learning Resource Orchestration with Run.AI
How Does Amazon SageMaker Work?
Before we dive into the tutorial, it can be useful to understand the main building blocks of AWS SageMaker: preparation, training and deployment.
Prepare and Build AI Models
Amazon SageMaker creates managed compute instances in the Elastic Compute Cloud (EC2), pre-configured for machine learning projects. These instances support the open source Jupyter Notebook application, which is commonly used by data scientists to author and share code for their models.
SageMaker notebook instances come with everything you need to connect to your machine learning toolset, including drivers, packages and libraries for deep learning and machine learning frameworks. The default version of the notebook instance comes with many common algorithms including statistical models, natural language processing (NLP) and computer vision. You can customize the instance’s configuration to suit your specific needs.
A useful capability of SageMaker instances is that they can easily accept code previously developed in a supported machine learning framework. Developers can package the code in a Docker container and add it seamlessly to the instance. SageMaker can also pull data from Amazon S3, making it possible to work with datasets of any size.
Related content: Read our guide to SageMaker Notebooks
Train and Tune
SageMaker makes it possible to specify location of training data in an Amazon S3 bucket, set a preferred instance type, and SageMaker automatically starts the training process, transforming data to facilitate feature engineering. SageMaker Model Monitor automatically monitors and tunes the model to find the parameters or hyperparameters that maximize performance.
Deploy and Analyze
When a model is trained and ready for inference, SageMaker automatically deploys it on Amazon infrastructure and scales it as needed. Amazon provides several SageMaker instance types with graphics processing units (GPUs) optimized for machine learning inference.
SageMaker takes care of several operational concerns for ML models in production:
- Deploying the model across multiple availability zones for high availability
- Performing health checks and recovering failed instances
- Applying security patches
- Performing autoscaling
- Providing secure HTTPS endpoints applications can use to connect
- Triggering alarms on changes in production performance via Amazon CloudWatch metrics
After a model is running in production, SageMaker allows you to fine tune and improve the model in a continuous cycle:
- Collect “ground truth” based on user interaction with the model
- Monitoring model performance to identify drift (gradual deterioration of model predictive performance)
- Update training data to include newly collected ground truth
- Retrain the model with the new data
- Deploy the new version of the model to production
Related content: Read our guide to SageMaker Pipelines
AWS Sagemaker Tutorial: Training Your Own Model and Deploying to Production
In this tutorial, we’ll show how to use Amazon SageMaker to build, train, and deploy a machine learning (ML) model using the XGBoost ML algorithm. XGBoost is an ensemble algorithm based on decision trees and a gradient boosting framework. It is the evolution of traditional decision trees and random forest models.
This tutorial is abbreviated from the official SageMaker Hands-on Tutorial.
Step 1: Create SageMaker Notebook Instance and Prepare Data
In the Amazon SageMaker console, select a region and click Create notebook instance.
Image Source: AWS
Select your instance size, and under the Permissions and encryption section, under IAM role, click Create a new role and select Any S3 bucket.
Once your new notebook instance starts, click Open Jupyter and select conda_python3
Image Source: AWS
In your Jupyter notebook, add code cells that perform the following preparation steps:
Import the required libraries
- Define IAM role
- Create an S3 bucket to store your data
- Download the test data CSV for the tutorial
- Reformat the header and first column of the CSV and then load the data from the S3 bucket (this is required for the SageMaker pre-built XGBoost algorithm)
Get the code for these preparation steps in the full tutorial, step 2.
Step 2: Train the XGBoost Model
We create a new code cell in the Jupyter notebook and copy this code, which converts the CSV into a format suitable for training the model:
We’ll create another code cell and copy this code, which sets up an XGBoost estimator and defines its hyperparameters. In a real project you can tweak hyperparameters to see which give you the best results.
We’ll create another code cell and copy this code to start the training:
xgb.fit({'train': s3_input_train})
Step 3: Deploy XGBoost Model
We now have a tained XGBoost model, ready to deploy to production. We’ll create another code cell and use this code to create a SageMaker endpoint running our new trained model:
xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')
Finally, let’s create one more code cell that uses the XGBoost model to predict whether customers listed in the CSV file will buy a bank product or not:
That’s it! We launched a SageMaker instance, loaded training data, trained a model and deployed it to production.
Machine Learning Resource Orchestration with Run.AI
When using AWS SageMaker, your organization might run a large number of machine learning experiments requiring massive amounts of computing resources. Run:AI automates resource management and orchestration for machine learning infrastructure. With Run:AI, you can automatically run as many compute intensive experiments and inference workloads as needed.
Here are some of the capabilities you gain when using Run:AI:
- Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
- No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
- A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.
Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.
Learn more about the Run.ai GPU virtualization platform.