What's New

Breaking Change: JsonInput migrating to batch API #860,#953

We are officially changing JsonInput to use the batch-oriented syntax. By now(release 0.8.4), all input adapters in BentoML have migrated to this design. The main difference is that for the user-defined API function, the input parameter is now a list of JSONSerializable objects(Dict, List, Integer, Float, Str) instead of one JSONSerializable object. And the expected return value of the user-defined API function is an Iterable with the exact same length. This makes it possible for API endpoints using JsonInput adapter to take advantage of BentoML's adaptive micro-batching capability.

Here is an example of how JsonInput(formerly JsonHandler) used to work:

        @bentoml.api(input=LegacyJsonInput())
        def predict(self, parsed_json):
            results = self.artifacts.classifier([parsed_json['text']])
            return results[0]

And here is an example with the new JsonInput class:

        @bentoml.api(input=JsonInput())
        def predict(self, parsed_json_list):
            texts = [j['text'] for j in parsed_json_list])
            return self.artifacts.classifier(texts)

The old non-batching JsonInput is still available to help with the transition, simply use from bentoml.adapters import LegacyJsonInput as JsonInput to replace the JsonInput or JsonHandler in your code before BentoML 0.8.4. The LegacyJsonInput behaves exactly the same as JsonInput in previous releases. We will keep supporting it until BentoML version 1.0.

Custom Web UI support in API Server (#839)

Custom web UI can be added to your API server now! Here is an example project: https://github.com/bentoml/gallery/tree/master/scikit-learn/iris-classifier

Add your web frontend project directory to your BentoService class and BentoML will automatically bundle all the web UI files and host them when starting the API server:

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
@web_static_content('./static')
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df):
        return self.artifacts.model.predict(df)

Artifact packing & loading workflow #911, #921, #949

We have refactored the Artifact API, which brings more flexibility to how users package their trained models with BentoML's API.

The most noticeable thing a user can do now is to separate model training job and BentoML model serving development - the user can now use the Artifact API to save a trained model from their training job and load it later for creating BentoService class for model serving. e.g.:

Step 1, model training:

from sklearn import svm
from sklearn import datasets
from bentoml.artifact import SklearnModelArtifact

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y = iris.data, iris.target

    # Model Training
    clf = svm.SVC(gamma='scale')
    clf.fit(X, y)

    # save just the trained model  with the SklearnModelArtifact to a specific directory
    btml_model_artifact = SklearnModelArtifact('model')
    btml_model_artifact.pack(clf)
    btml_model_artifact.save('/tmp/temp_bentoml_artifact')

Step 2: Build BentoService class with the saved artifact:

from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(input=DataframeInput())
    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

if __name__ == "__main__":
    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # load the previously saved artifact
    iris_classifier_service.artifacts.get('model').load('/tmp/temp_bentoml_artifact')

    saved_path = iris_classifier_service.save()

This workflow makes developing and debugging BentoService code a lot easier, user no longer needs to retrain their model every time they change something in the BentoService class definition and wants to try it out.

Note that the old BentoService class method 'pack' has now been deprecated in this release #915

Add `bentoml containerize` command (#847,#884,#941)

$ bentoml containerize --help
Usage: bentoml containerize [OPTIONS] BENTO

  Containerizes given Bento into a ready-to-use Docker image.

Options:
  -p, --push
  -t, --tag TEXT       Optional image tag. If not specified, Bento will
                       generate one from the name of the Bento.

Support multiple images in the same request (#828)

A new input adapter class MultiImageInput https://docs.bentoml.org/en/latest/api/adapters.html#multiimageinput has been added. It is designed for prediction services that require multiple image files as its input:

from bentoml import BentoService
import bentoml

class MyService(BentoService):

    @bentoml.api(input=MultiImageInput(input_names=('imageX', 'imageY')))
    def predict(self, image_groups):
        for image_group in image_groups:
            image_array_x = image_group['imageX']
            image_array_y = image_group['imageY']

Add FileInput adapter(#734)

A new input adapter class FileInput for handling arbitrary binary files as the input for your prediction service https://github.com/bentoml/BentoML/blob/v0.8.4/bentoml/adapters/file_input.py#L33

Added Ngrok support (#917)

Expose your local development model API server over a public URL endpoint, using Ngrok under the hood. To try it out, simply add the --run-with-ngrok flag to your bentoml serve CLI command, e.g.:

bentoml serve IrisClassifier:latest --run-with-ngrok

Add support for CoreML (#939)

Serving CoreML model on Mac OS is now supported! Users can also convert their models trained with other frameworks to the CoreML format, for better performance on Mac OS platforms. Here's an example with Pytorch model serving with CoreML and BentoML:

import torch
from torch import nn

class PytorchModel(nn.Module):
    def __init__(self):
        super().__init__()

        self.linear = nn.Linear(5, 1, bias=False)
        torch.nn.init.ones_(self.linear.weight)

    def forward(self, x):
        x = self.linear(x)

        return x

# ------

import numpy
import pandas as pd

from coremltools.models import MLModel  # pylint: disable=import-error

import bentoml
from bentoml.adapters import DataframeInput
from bentoml.artifact import CoreMLModelArtifact

@bentoml.env(auto_pip_dependencies=True)
@bentoml.artifacts([CoreMLModelArtifact('model')])
class CoreMLClassifier(bentoml.BentoService):
    @bentoml.api(input=DataframeInput())
    def predict(self, df: pd.DataFrame) -> float:
        model: MLModel = self.artifacts.model
        input_data = df.to_numpy().astype(numpy.float32)
        output = model.predict({"input": input_data})
        return next(iter(output.values())).item()


def convert_pytorch_to_coreml(pytorch_model: PytorchModel) -> ct.models.MLModel:
    """CoreML is not for training ML models but rather for converting pretrained models
    and running them on Apple devices. Therefore, in this train we convert the
    pretrained PytorchModel from the tests.integration.test_pytorch_model_artifact
    module into a CoreML module."""
    pytorch_model.eval()
    traced_pytorch_model = torch.jit.trace(pytorch_model, torch.Tensor(test_df.values))
    model: MLModel = ct.convert(
        traced_pytorch_model, inputs=[ct.TensorType(name="input", shape=test_df.shape)]
    )
    return model


# ------

if __name__ == '__main__':
    svc = CoreMLClassifier()
    pytorch_model = PytorchModel()
    model = convert_pytorch_to_coreml(pytorch_model)
    svc.pack('model', model)
    svc.save()

Breaking Change: Remove CLI --with-conda option #898

Run inference job within an automatically generated conda environment seems like a good idea at first but we realized it introduces more problems than it solves. We are removing this option and encourage users to use docker for running inference jobs instead.

Improvements:

#966, #968 Faster save by improving python local module parsing code
#878, #879 Faster import bentoml with lazy module loader
#872 Add BentoService API name validation
#887 Set a smaller page limit for bentoml list
#916 Do not cache pip requirements in Dockerfile
#918 Improve error handling when micro batching service is unavailable
#925 Artifact refactoring: set_dependencies method
#932 Add warning for SavedBundle Python version mismatch
#904 JsonInput handle AWS Lambda event should ignore content type header
#951 Add openjdk to H2O artifact default conda dependencies
#958 Fix typo in cli default argument help message

Bug fixes:

#864 Fix decode headers with latin1
#867 Fix DataFrameInput passing NaN values over HTTP JSON request
#869 Change the default mb_max_latency value to avoid flaky micro-batching initialization
#897 Fix yatai web client import
#907 Fix CORS option in AWS Lambda SAM config
#922 Fix lambda deployment when using AWS assumed-role ARN
#959 Fix RecursionError: maximum recursion depth exceeded when saving BentoService bundle
#969 Fix error in CLI command bentoml --version

Internal & Testing

#870 Add docs for using BentoML's built-in benchmark client
#855, #871, #877 Add integration tests for dockerized BentoML API server workflow
#876, #937 Add integration test for Tensorflow SavedModel artifact
#951 H2O artifact integration test
#939 CoreML artifact integration test
#865 add makefile for BentoML developers
#868 API Server "/feedback" endpoint refactor
#908 BentoService base class refactoring and docstring improvements
#909 Refactor API Server startup
#910 Refactor API server performance tracing
#906 Fix yatai web ui startup script
#875 Increate micro batching server test coverage
#935 Fix list deployments error response

Community Announcements:

We have enabled Github Discussions https://github.com/bentoml/BentoML/discussions feature🎉

This will be a new place for community members to connect, ask questions, and share anything related to model serving and BentoML.

Contributors

Thank you, everyone, for contributing to this amazing release loaded with new features and improvements! @bojiang @joshuacwnewton @guy4261 @Sharathmk99 @co42 @jackyzha0 @Korusuke @akainth015 @omrihar @yubozhao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML-0.8.4

What's New

Breaking Change: JsonInput migrating to batch API #860,#953

Custom Web UI support in API Server (#839)

Artifact packing & loading workflow #911, #921, #949

Add `bentoml containerize` command (#847,#884,#941)

Support multiple images in the same request (#828)

Add FileInput adapter(#734)

Added Ngrok support (#917)

Add support for CoreML (#939)

Breaking Change: Remove CLI --with-conda option #898

Improvements:

Bug fixes:

Internal & Testing

Community Announcements:

Contributors

BentoML-0.8.4

What's New

Breaking Change: JsonInput migrating to batch API #860,#953

Custom Web UI support in API Server (#839)

Artifact packing & loading workflow #911, #921, #949

Add bentoml containerize command (#847,#884,#941)

Support multiple images in the same request (#828)

Add FileInput adapter(#734)

Added Ngrok support (#917)

Add support for CoreML (#939)

Breaking Change: Remove CLI --with-conda option #898

Improvements:

Bug fixes:

Internal & Testing

Community Announcements:

Contributors

Add `bentoml containerize` command (#847,#884,#941)