The Machine Learning process is an iterative process that consists of several steps:
- Identifying a business problem and the related Machine Learning problem
- Data ingestion, integration and preparation
- Data visualization and analysis, feature engineering, model training and model evaluation
- Model deployment and deployed model monitoring and debugging
The previous steps are generally repeated multiple times to better meet business goals following to changes in the source data, decrease in the perfomance of the model, etc.
The process can be represented with the following diagram:
After a model has been deployed, we might want to integrate it with our own application to provide insights to our end users.
In this workshop we will go through the steps required to build a fully-fledged machine learning application on AWS. We will execute an iteration of the Machine Learning process to build, train and deploy a model using Amazon SageMaker and AWS Glue, and then we will add inference capabilities to a demo application by deploying a REST API with Amazon API Gateway.
The final architecture will be:
We have been provided with a dataset (stored in an Amazon S3 bucket) containing data collected in a wind turbine plant, where each example includes several sensor measurements and a status indicating wheather the plant was healthy or not.
⚠️ Note: this is a synthetic dataset that oversimplifies the task of doing Predictive Maintenance for the purpose of keeping this workshop easier to execute.
Our goal is building a simple Machine Learning model that, given new sensor data, will predict whether the plant requires maintenance or not, allowing to execute maintenance before a breakdown event happens (Predictive Maintenance).
Following is an excerpt from the dataset:
turbine_id | turbine_type | wind_speed | rpm_blade | oil_temperature | ... | breakdown |
---|---|---|---|---|---|---|
TID003 | HAWT | 85 | 78 | 36.0 | ... | yes |
TID009 | HAWT | 80 | 25 | 37.0 | ... | no |
TID005 | HAWT | 36 | 32 | 40.0 | ... | no |
Our target variable is the breakdown attribute, which is binary and suggests implementing a binary classification model.
After building the model, we will have to host it and expose as a REST API for executing inferences from client-side applications.
This workshops consists of six modules:
- Module 01 - Creating an Amazon SageMaker managed Jupyter notebook instance and an Amazon S3 bucket that will be used for storing data, models and code.
- Module 02 - Using AWS Glue and Amazon Athena to execute data preparation and data exploration, and then feature engineering using SparkML.
- Module 03 - Training a binary classification model using Amazon SageMaker built-in XGBoost algorithm, that will predict whether a wind turbine plant requires maintenance.
- Module 04 - Deploying the feature engineering and ML models as a pipeline using Amazon SageMaker hosting.
- Module 05 - Buiding a REST API using Amazon API Gateway and implementing an AWS Lambda function that will invoke the Amazon SageMaker endpoint for inference.
- Module 06 - Using a single-page demo application to invoke the REST API and get inferences.
You must comply with the order of modules, since the outputs of a module are inputs of the following one.
This workshop has been designed assuming that each participant is using an AWS account that has been provided and pre-configured by the workshop instructor(s). However, you can also choose to use your own AWS account, but you'll have to execute some preliminary configuration steps as described here.
Once you are ready to go, please start with Module 01.
The contents of this workshop are licensed under the Apache 2.0 License.
Giuseppe A. Porcelli - Principal, ML Specialist Solutions Architect - Amazon Web Services EMEA
Antonio Duma - Solutions Architect - Amazon Web Services EMEA