Skip to content

BentoML-0.9.0

Compare
Choose a tag to compare
@parano parano released this 25 Sep 03:28
6ad3649

What's New

TLDR;

  • New input/output adapter design that let's user choose between batch or non-batch implementation
  • Speed up the API model server docker image build time
  • Changed the recommended import path of artifact classes, now artifact classes should be imported from bentoml.frameworks.*
  • Improved python pip package management
  • Huggingface/Transformers support!!
  • Managed packaged models with Labels API
  • Support GCS(Google Cloud Storage) as model storage backend in YataiService
  • Current Roadmap for feedback: #1128

New Input/Output adapter design

A massive refactoring on BentoML's inference API and input/output adapter redesign, lead by @bojiang with help from @akainth015.

BREAKING CHANGE: API definition now requires declaring if it is a batch API or non-batch API:

from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable  # type annotations are optional

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):

        @api(input=JsonInput(), batch=True)
        def predict_batch(self, parsed_json_list: List[JsonSerializable]):
            results = self.artifacts.classifier([j['text'] for j in parsed_json_list])
            return results

        @api(input=JsonInput())  # default batch=False
        def predict_non_batch(self, parsed_json: JsonSerializable):
            results = self.artifacts.classifier([parsed_json['text']])
            return results[0]

For APIs with batch=True, the user-defined API function will be required to process a list of input item at a time, and return a list of results of the same length. On the contrary, @api by default uses batch=False, which processes one input item at a time. Implementing a batch API allow your workload to benefit from BentoML's adaptive micro-batching mechanism when serving online traffic, and also will speed up offline batch inference job. We recommend using batch=True if performance & throughput is a concern. Non-batch APIs are usually easier to implement, good for quick POC, simple use cases, and deploying on Serverless platforms such as AWS Lambda, Azure function, and Google KNative.

Read more about this change and example usage here: https://docs.bentoml.org/en/latest/api/adapters.html

BREAKING CHANGE: For DataframeInput and TfTensorInput users, it is now required to add batch=True

DataframeInput and TfTensorInput are special input types that only support accepting a batch of input at one time.

Input data validation while handling batch input

When the API function received a list of input, it is now possible to reject a subset of the input data and return an error code to the client, if the input data is invalid or malformated. Users can do this via the InferenceTask#discard API, here's an example:

from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable, InferenceTask  # type annotations are optional

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):

        @api(input=JsonInput(), batch=True)
        def predict_batch(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]):
             model_input = []
             for json, task in zip(parsed_json_list, tasks):
                  if "text" in json:
                      model_input.append(json['text'])
                  else:
                      task.discard(http_status=400, err_msg="input json must contain `text` field")

            results = self.artifacts.classifier(model_input)

            return results

The number of tasks got discarded plus the length of the results array returned, should be equal to the length of the input list, this will allow BentoML to match the results back to tasks that have not yet been discarded.

Allow fine-grained control of the HTTP response, CLI inference job output, etc. E.g.:

import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional

class MyService(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=False)
    def predict(self, parsed_json: JsonSerializable, task: InferenceTask) -> InferenceResult:
        if task.http_headers['Accept'] == "application/json":
            predictions = self.artifact.model.predict([parsed_json])
            return InferenceResult(
                data=predictions[0],
                http_status=200,
                http_headers={"Content-Type": "application/json"},
            )
        else:
            return InferenceError(err_msg="application/json output only", http_status=400)

Or when batch=True:

import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional

class MyService(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=True)
    def predict(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]) -> List[InferenceResult]:
        rv = []
        predictions = self.artifact.model.predict(parsed_json_list)
        for task, prediction in zip(tasks, predictions):
            if task.http_headers['Accept'] == "application/json":
                rv.append(
                    InferenceResult(
                        data=prediction,
                        http_status=200,
                        http_headers={"Content-Type": "application/json"},
                ))
            else:
                rv.append(InferenceError(err_msg="application/json output only", http_status=400))
                # or task.discard(err_msg="application/json output only", http_status=400)
        return rv

Other adapter changes:

Docker Build Improvements

  • Optimize docker image build time (#1081) kudos to @ZeyadYasser!!
  • Per python minor version base image to speed up image building #1101 #1096, thanks @gregd33!!
  • Add "latest" tag to all user-facing docker base images (#1046)

Improved pip package management

Setting pip install options in BentoService @env specification

As suggested here: #1036 (comment), Thanks @danield137 for suggesting the pip_extra_index_url option!

@env(
  auto_pip_dependencies=True,
  pip_index_url='my_pypi_host_url',
  pip_trusted_host='my_pypi_host_url',
  pip_extra_index_url='extra_pypi_index_url'
)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
  ...

BREAKING CHANGE Due to this change, we have now removed the previous docker build arg PIP_INDEX_URL and ARG PIP_TRUSTED_HOST, due to it may be conflicting with settings in base image #1036

  • Support passing a conda environment.yml file to @env, as suggested in #725 #725

  • When a version is not specified in pip_packages list, it is expected to pin to the version found in the current python session. Now it is doing the same for packages added from adapter and artifact classes

  • Support specifying package requirement range now, e.g.:

@env(pip_packages=["abc==1.3", "foo>1.2,<=1.4"])

It can be any pip version requirement specifier https://pip.pypa.io/en/stable/reference/pip_install/#requirement-specifiers

  • Renamed pip_dependencies to pip_packages and auto_pip_dependencies to infer_pip_packages, the old API still works but will eventually be deprecated.

GCS support in YataiService

Adding Google Cloud Storage (GCS) support in YataiService, as the storage backend. This is an alternative to AWS S3, MiniIO, or POSIX file system. #1017 - Thank you @Korusuke @PrabhanshuAttri for creating the GCS support!

YataiService Labels API for model management

Managed packaged models in YataiService with labels API implemented in #1064

  1. Add labels to BentoService.save
    svc = MyBentoService()
    svc.save(labels={'my_key': 'my_value', 'test': 'passed'})
  1. Add label query for CLI commands
  • bentoml get BENTO_NAME, bentoml list, bentoml deployment list, bentoml lambda list, bentoml sagemaker list, bentoml azure-functions list

  • label query supports =, !=, In, NotIn, Exists, DoesNotExists operator

    • e.g. key1=value1, key2!=value2, env In (prod, staging), Key Exists, Another_key DoesNotExist

Simple key/value label selector
Screen Shot 2020-09-03 at 5 38 21 PM

Use Exists operator
Screen Shot 2020-09-03 at 5 40 57 PM

Use DoesNotExist operator
Screen Shot 2020-09-03 at 5 41 41 PM

Use In operator
Screen Shot 2020-09-03 at 5 48 42 PM

Use multiple label query
Screen Shot 2020-09-03 at 7 07 23 PM

  1. Roadmap - add web UI for filtering and searching with labels API

New framework support: Huggingface/Transformers

#1090 #1094 thanks @vedashree29296 for contributing this!

Usage & docs: https://docs.bentoml.org/en/stable/frameworks.html#transformers

Bug Fixes:

  • Fixed #1030 - bentoml serve fails when packaged on Windows and deployed on Linux #1044
  • Handle missing region during SageMaker deployment updates #1049

Internal & Testing:

  • Re-organize artifacts related modules #1082, #1085
  • Refactoring & improvements around dependency management #1084, #1086
  • [TEST/CI] Add tests covering XgboostModelArtifact (#1079)
  • [TEST/CI] Fix AWS moto related unit tests (#1077)
  • Lock SQLAlchemy-utils version (#1078)

Contributors of 0.9.0 release

Thank you all for contributing to this release!! @danield137 @ericmand @ssakhavi @aviaviavi @dinakar29 @umihui @vedashree29296 @joerg84 @gregd33 @mayurnewase @narennadig @akainth015 @yubozhao @bojiang