Join 30K+ developers in learning how to responsibly deliver value with ML.
Learn how to apply ML to build a production grade product to deliver value.
- Lessons: https://madewithml.com/courses/mlops/
- Code: GokuMohandas/MLOps
If you need refresh yourself on ML algorithms, check out our Made With ML repository.
π¦Β Product | πΒ Scripting | β»οΈΒ Reproducibility |
Objective | Organization | Git |
Solution | Packaging | Pre-commit |
Iteration | Documentation | Versioning |
π’Β Data | Styling | Docker |
Labeling | Makefile | πΒ Production |
Preprocessing | Logging | Dashboard |
Exploratory data analysis | π¦Β Interfaces | CI/CD workflows |
Splitting | Command-line | Infrastructure |
Augmentation | RESTful API | Monitoring |
πΒ Modeling | β Β Testing | Pipelines |
Evaluation | Code | Feature store |
Experiment tracking | Data | |
Optimization | Models |
πΒ New lessons every month!
Subscribe for our monthly updates on new content.
app/
βββ api.py - FastAPI app
βββ cli.py - CLI app
βββ schemas.py - API model schemas
tagifai/
βββ config.py - configuration setup
βββ data.py - data processing components
βββ eval.py - evaluation components
βββ main.py - training/optimization pipelines
βββ models.py - model architectures
βββ predict.py - inference components
βββ train.py - training components
βββ utils.py - supplementary utilities
Documentation for this application can be found here.
- Set up environment.
make venv
source venv/bin/activate
- Get data
# Download to data/
tagifai download-data
# or Pull from DVC
dvc init
dvc remote add -d storage stores/blob
dvc pull
- Compute features
tagifai compute-features
- Optimize using distributions specified in
tagifai.main.objective
. This also writes the best model's params to config/params.json
tagifai optimize \
--params-fp config/params.json \
--study-name optimization \
--num-trials 100
You can use your own on-prem GPUs, infrastructure from cloud providers (AWS, GCP, Azure, etc.) or check out the optimize.ipynb notebook for how to train on Google Colab and transfer trained artifacts to your local machine.
- Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside
experiments/{experiment_id}/{run_id}
or via the API (GET
/runs/{run_id}).
tagifai train-model \
--params-fp config/params.json \
--model-dir model \
--experiment-name best \
--run-name model
-
Predict tags for an input sentence. It'll use the best model saved from
train-model
but you can also specify arun-id
to choose a specific model.-
Command-line app
tagifai predict-tags --text "Transfer learning with BERT"
-
FastAPI
uvicorn app.api:app \ --host 0.0.0.0 \ --port 5000 \ --reload \ --reload-dir tagifai \ --reload-dir app
-
-
View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.
tagifai diff
- Push versioned assets
# Push
dvc add data/projects.json
dvc add data/tags.json
dvc add data/features.json
dvc add data/features.parquet
dvc push
- Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.
git add .
git commit -m ""
git tag -a <TAG_NAME> -m ""
git push origin <BRANCH_NAME>
python -m pip install -e . --no-cache-dir # prod
python -m pip install -e ".[test]" --no-cache-dir # test
python -m pip install -e ".[docs]" --no-cache-dir # docs
python -m pip install -e ".[dev]" --no-cache-dir # dev
pre-commit install
pre-commit autoupdate
docker build -t tagifai:latest -f Dockerfile .
docker run -p 5000:5000 --name tagifai tagifai:latest
uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app # dev
gunicorn -c config/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app # prod
streamlit run streamlit/app.py
mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/
export AIRFLOW_HOME=${PWD}/airflow
AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db init
airflow users create \
--username admin \
--firstname Goku \
--lastname Mohandas \
--role Admin \
--email [email protected]
airflow webserver --port 8080
# In new terminal
airflow scheduler
feast init --minimal --template local features
touch features/features.py
cd features
feast apply
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
python -m mkdocs serve
-
Great expectation checkpoints (read more here)
great_expectations checkpoint run projects great_expectations checkpoint run tags
-
Full coverage testing
pytest --cov tagifai --cov app --cov-report html
-
Testing only the non-training components
pytest -m "not training"
python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab
You can also run all notebooks on Google Colab.
Software engineers
looking to learn ML and become even better software engineers.Data scientists
who want to learn how to responsibly deliver value with ML.College graduates
looking to learn the practical skills they'll need for the industry.Product Managers
who want to develop a technical foundation for ML applications.
Lessons will be released weekly and each one will include:
intuition
: high level overview of the concepts that will be covered and how it all fits together.code
: simple code examples to illustrate the concept.application
: applying the concept to our specific task.extensions
: brief look at other tools and techniques that will be useful for difference situations.
hands-on
: If you search production ML or MLOps online, you'll find great blog posts and tweets. But in order to really understand these concepts, you need to implement them. Unfortunately, you donβt see a lot of the inner workings of running production ML because of scale, proprietary content & expensive tools. However, Made With ML is free, open and live which makes it a perfect learning opportunity for the community.intuition-first
: We will never jump straight to code. In every lesson, we will develop intuition for the concepts and think about it from a product perspective.software engineering
: This course isn't just about ML. In fact, it's mostly about clean software engineering! We'll cover important concepts like versioning, testing, logging, etc. that really makes something production-grade product.focused yet holistic
: For every concept, we'll not only cover what's most important for our specific task (this is the case study aspect) but we'll also cover related methods (this is the guide aspect) which may prove to be useful in other situations.
- I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned.
- Connect with me on Twitter and LinkedIn
While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I believe that creativity and intelligence are randomly distributed while opportunities are siloed. I want to enable more people to create and contribute to innovation.
To cite this course, please use:
@article{madewithml,
title = "Applied ML - Made With ML",
author = "Goku Mohandas",
url = "https://madewithml.com/courses/mlops/"
year = "2021",
}