Made With ML

Applied ML · MLOps · Production
Join 30K+ developers in learning how to responsibly deliver value with ML.

🔥 Among the top MLOps repositories on GitHub

MLOps

Learn how to apply ML to build a production grade product to deliver value.

Lessons: https://madewithml.com/courses/mlops/
Code: GokuMohandas/MLOps

If you need refresh yourself on ML algorithms, check out our Made With ML repository.

📦 Product	📝 Scripting	♻️ Reproducibility
Objective	Organization	Git
Solution	Packaging	Pre-commit
Iteration	Documentation	Versioning
🔢 Data	Styling	Docker
Labeling	Makefile	🚀 Production
Preprocessing	Logging	Dashboard
Exploratory data analysis	📦 Interfaces	CI/CD workflows
Splitting	Command-line	Infrastructure
Augmentation	RESTful API	Monitoring
📈 Modeling	✅ Testing	Pipelines
Evaluation	Code	Feature store
Experiment tracking	Data
Optimization	Models

📆 New lessons every month!
Subscribe for our monthly updates on new content.

Directory structure

app/
├── api.py        - FastAPI app
└── cli.py        - CLI app
├── schemas.py    - API model schemas
tagifai/
├── config.py     - configuration setup
├── data.py       - data processing components
├── eval.py       - evaluation components
├── main.py       - training/optimization pipelines
├── models.py     - model architectures
├── predict.py    - inference components
├── train.py      - training components
└── utils.py      - supplementary utilities

Documentation for this application can be found here.

Workflows

Set up environment.

make venv
source venv/bin/activate

Get data

# Download to data/
tagifai download-data

# or Pull from DVC
dvc init
dvc remote add -d storage stores/blob
dvc pull

Compute features

tagifai compute-features

Optimize using distributions specified in tagifai.main.objective. This also writes the best model's params to config/params.json

tagifai optimize \
    --params-fp config/params.json \
    --study-name optimization \
    --num-trials 100

You can use your own on-prem GPUs, infrastructure from cloud providers (AWS, GCP, Azure, etc.) or check out the optimize.ipynb notebook for how to train on Google Colab and transfer trained artifacts to your local machine.

Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside experiments/{experiment_id}/{run_id} or via the API (GET /runs/{run_id}).

tagifai train-model \
    --params-fp config/params.json \
    --model-dir model \
    --experiment-name best \
    --run-name model

Predict tags for an input sentence. It'll use the best model saved from train-model but you can also specify a run-id to choose a specific model.

Command-line app

tagifai predict-tags --text "Transfer learning with BERT"

FastAPI

uvicorn app.api:app \
    --host 0.0.0.0 \
    --port 5000 \
    --reload \
    --reload-dir tagifai \
    --reload-dir app

View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.

tagifai diff

Push versioned assets

# Push
dvc add data/projects.json
dvc add data/tags.json
dvc add data/features.json
dvc add data/features.parquet
dvc push

Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.

git add .
git commit -m ""
git tag -a <TAG_NAME> -m ""
git push origin <BRANCH_NAME>

Commands

Environments

python -m pip install -e . --no-cache-dir  # prod
python -m pip install -e ".[test]" --no-cache-dir  # test
python -m pip install -e ".[docs]" --no-cache-dir  # docs
python -m pip install -e ".[dev]" --no-cache-dir  # dev
pre-commit install
pre-commit autoupdate

Docker

docker build -t tagifai:latest -f Dockerfile .
docker run -p 5000:5000 --name tagifai tagifai:latest

Application

uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app  # dev
gunicorn -c config/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app  # prod

Streamlit

streamlit run streamlit/app.py

MLFlow

mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/

Airflow

export AIRFLOW_HOME=${PWD}/airflow
AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
airflow db init
airflow users create \
    --username admin \
    --firstname Goku \
    --lastname Mohandas \
    --role Admin \
    --email [email protected]
airflow webserver --port 8080

# In new terminal
airflow scheduler

Feature store

feast init --minimal --template local features
touch features/features.py
cd features
feast apply
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Mkdocs

python -m mkdocs serve

Testing

Great expectation checkpoints (read more here)

great_expectations checkpoint run projects
great_expectations checkpoint run tags

Full coverage testing

pytest --cov tagifai --cov app --cov-report html

Testing only the non-training components
```
pytest -m "not training"
```

Jupyterlab

python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab

You can also run all notebooks on Google Colab.

FAQ

Who is this content for?

Software engineers looking to learn ML and become even better software engineers.
Data scientists who want to learn how to responsibly deliver value with ML.
College graduates looking to learn the practical skills they'll need for the industry.
Product Managers who want to develop a technical foundation for ML applications.

What is the structure?

Lessons will be released weekly and each one will include:

intuition: high level overview of the concepts that will be covered and how it all fits together.
code: simple code examples to illustrate the concept.
application: applying the concept to our specific task.
extensions: brief look at other tools and techniques that will be useful for difference situations.

What makes this content unique?

hands-on: If you search production ML or MLOps online, you'll find great blog posts and tweets. But in order to really understand these concepts, you need to implement them. Unfortunately, you don’t see a lot of the inner workings of running production ML because of scale, proprietary content & expensive tools. However, Made With ML is free, open and live which makes it a perfect learning opportunity for the community.
intuition-first: We will never jump straight to code. In every lesson, we will develop intuition for the concepts and think about it from a product perspective.
software engineering: This course isn't just about ML. In fact, it's mostly about clean software engineering! We'll cover important concepts like versioning, testing, logging, etc. that really makes something production-grade product.
focused yet holistic: For every concept, we'll not only cover what's most important for our specific task (this is the case study aspect) but we'll also cover related methods (this is the guide aspect) which may prove to be useful in other situations.

Who is the author?

I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned.
Connect with me on Twitter and LinkedIn

Why is this free?

While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I believe that creativity and intelligence are randomly distributed while opportunities are siloed. I want to enable more people to create and contribute to innovation.

To cite this course, please use:

@article{madewithml,
    title  = "Applied ML - Made With ML",
    author = "Goku Mohandas",
    url    = "https://madewithml.com/courses/mlops/"
    year   = "2021",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Made With ML

MLOps

Directory structure

Workflows

Commands

Environments

Docker

Application

Streamlit

MLFlow

Airflow

Feature store

Mkdocs

Testing

Jupyterlab

FAQ

Who is this content for?

What is the structure?

What makes this content unique?

Who is the author?

Why is this free?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.dvc		.dvc
.github/workflows		.github/workflows
airflow		airflow
app		app
config		config
data		data
docs		docs
features		features
great_expectations		great_expectations
model		model
notebooks		notebooks
stores		stores
streamlit		streamlit
tagifai		tagifai
tests		tests
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

honguyen21/MLOps

Folders and files

Latest commit

History

Repository files navigation

Made With ML

MLOps

Directory structure

Workflows

Commands

Environments

Docker

Application

Streamlit

MLFlow

Airflow

Feature store

Mkdocs

Testing

Jupyterlab

FAQ

Who is this content for?

What is the structure?

What makes this content unique?

Who is the author?

Why is this free?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages