Skip to content

A project-based course on the foundations of MLOps with a focus on intuition and application.

License

Notifications You must be signed in to change notification settings

honguyen21/MLOps

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Applied ML Β· MLOps Β· Production
Join 30K+ developers in learning how to responsibly deliver value with ML.

Β  Β  Β 
πŸ”₯Β  Among the top MLOps repositories on GitHub


MLOps

Learn how to apply ML to build a production grade product to deliver value.

If you need refresh yourself on ML algorithms, check out our Made With ML repository.

πŸ“¦Β  Product πŸ“Β  Scripting ♻️  Reproducibility
Objective Organization Git
Solution Packaging Pre-commit
Iteration Documentation Versioning
πŸ”’Β  Data Styling Docker
Labeling Makefile πŸš€Β  Production
Preprocessing Logging Dashboard
Exploratory data analysis πŸ“¦Β  Interfaces CI/CD workflows
Splitting Command-line Infrastructure
Augmentation RESTful API Monitoring
πŸ“ˆΒ  Modeling βœ…Β  Testing Pipelines
Evaluation Code Feature store
Experiment tracking Data
Optimization Models

πŸ“†Β  New lessons every month!
Subscribe for our monthly updates on new content.


Directory structure

app/
β”œβ”€β”€ api.py        - FastAPI app
└── cli.py        - CLI app
β”œβ”€β”€ schemas.py    - API model schemas
tagifai/
β”œβ”€β”€ config.py     - configuration setup
β”œβ”€β”€ data.py       - data processing components
β”œβ”€β”€ eval.py       - evaluation components
β”œβ”€β”€ main.py       - training/optimization pipelines
β”œβ”€β”€ models.py     - model architectures
β”œβ”€β”€ predict.py    - inference components
β”œβ”€β”€ train.py      - training components
└── utils.py      - supplementary utilities

Documentation for this application can be found here.

Workflows

  1. Set up environment.
make venv
source venv/bin/activate
  1. Get data
# Download to data/
tagifai download-data

# or Pull from DVC
dvc init
dvc remote add -d storage stores/blob
dvc pull
  1. Compute features
tagifai compute-features
  1. Optimize using distributions specified in tagifai.main.objective. This also writes the best model's params to config/params.json
tagifai optimize \
    --params-fp config/params.json \
    --study-name optimization \
    --num-trials 100

You can use your own on-prem GPUs, infrastructure from cloud providers (AWS, GCP, Azure, etc.) or check out the optimize.ipynb notebook for how to train on Google Colab and transfer trained artifacts to your local machine.

  1. Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside experiments/{experiment_id}/{run_id} or via the API (GET /runs/{run_id}).
tagifai train-model \
    --params-fp config/params.json \
    --model-dir model \
    --experiment-name best \
    --run-name model
  1. Predict tags for an input sentence. It'll use the best model saved from train-model but you can also specify a run-id to choose a specific model.

    • Command-line app

      tagifai predict-tags --text "Transfer learning with BERT"
    • FastAPI

      uvicorn app.api:app \
          --host 0.0.0.0 \
          --port 5000 \
          --reload \
          --reload-dir tagifai \
          --reload-dir app
  2. View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.

tagifai diff
  1. Push versioned assets
# Push
dvc add data/projects.json
dvc add data/tags.json
dvc add data/features.json
dvc add data/features.parquet
dvc push
  1. Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.
git add .
git commit -m ""
git tag -a <TAG_NAME> -m ""
git push origin <BRANCH_NAME>

Commands

Environments

python -m pip install -e . --no-cache-dir  # prod
python -m pip install -e ".[test]" --no-cache-dir  # test
python -m pip install -e ".[docs]" --no-cache-dir  # docs
python -m pip install -e ".[dev]" --no-cache-dir  # dev
pre-commit install
pre-commit autoupdate

Docker

docker build -t tagifai:latest -f Dockerfile .
docker run -p 5000:5000 --name tagifai tagifai:latest

Application

uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app  # dev
gunicorn -c config/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app  # prod

Streamlit

streamlit run streamlit/app.py

MLFlow

mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/

Airflow

export AIRFLOW_HOME=${PWD}/airflow
AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db init
airflow users create \
    --username admin \
    --firstname Goku \
    --lastname Mohandas \
    --role Admin \
    --email [email protected]
airflow webserver --port 8080

# In new terminal
airflow scheduler

Feature store

feast init --minimal --template local features
touch features/features.py
cd features
feast apply
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Mkdocs

python -m mkdocs serve

Testing

  • Great expectation checkpoints (read more here)

    great_expectations checkpoint run projects
    great_expectations checkpoint run tags
  • Full coverage testing

    pytest --cov tagifai --cov app --cov-report html
  • Testing only the non-training components

    pytest -m "not training"

Jupyterlab

python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab

You can also run all notebooks on Google Colab.

FAQ

Who is this content for?

  • Software engineers looking to learn ML and become even better software engineers.
  • Data scientists who want to learn how to responsibly deliver value with ML.
  • College graduates looking to learn the practical skills they'll need for the industry.
  • Product Managers who want to develop a technical foundation for ML applications.

What is the structure?

Lessons will be released weekly and each one will include:

  • intuition: high level overview of the concepts that will be covered and how it all fits together.
  • code: simple code examples to illustrate the concept.
  • application: applying the concept to our specific task.
  • extensions: brief look at other tools and techniques that will be useful for difference situations.

What makes this content unique?

  • hands-on: If you search production ML or MLOps online, you'll find great blog posts and tweets. But in order to really understand these concepts, you need to implement them. Unfortunately, you don’t see a lot of the inner workings of running production ML because of scale, proprietary content & expensive tools. However, Made With ML is free, open and live which makes it a perfect learning opportunity for the community.
  • intuition-first: We will never jump straight to code. In every lesson, we will develop intuition for the concepts and think about it from a product perspective.
  • software engineering: This course isn't just about ML. In fact, it's mostly about clean software engineering! We'll cover important concepts like versioning, testing, logging, etc. that really makes something production-grade product.
  • focused yet holistic: For every concept, we'll not only cover what's most important for our specific task (this is the case study aspect) but we'll also cover related methods (this is the guide aspect) which may prove to be useful in other situations.

Who is the author?

  • I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned.
  • Connect with me on Twitter and LinkedIn

Why is this free?

While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I believe that creativity and intelligence are randomly distributed while opportunities are siloed. I want to enable more people to create and contribute to innovation.


To cite this course, please use:
@article{madewithml,
    title  = "Applied ML - Made With ML",
    author = "Goku Mohandas",
    url    = "https://madewithml.com/courses/mlops/"
    year   = "2021",
}

About

A project-based course on the foundations of MLOps with a focus on intuition and application.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 94.6%
  • Python 5.4%