BentoML-0.9.0
What's New
TLDR;
- New input/output adapter design that let's user choose between batch or non-batch implementation
- Speed up the API model server docker image build time
- Changed the recommended import path of artifact classes, now artifact classes should be imported from
bentoml.frameworks.*
- Improved python pip package management
- Huggingface/Transformers support!!
- Managed packaged models with Labels API
- Support GCS(Google Cloud Storage) as model storage backend in YataiService
- Current Roadmap for feedback: #1128
New Input/Output adapter design
A massive refactoring on BentoML's inference API and input/output adapter redesign, lead by @bojiang with help from @akainth015.
BREAKING CHANGE: API definition now requires declaring if it is a batch API or non-batch API:
from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable # type annotations are optional
@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):
@api(input=JsonInput(), batch=True)
def predict_batch(self, parsed_json_list: List[JsonSerializable]):
results = self.artifacts.classifier([j['text'] for j in parsed_json_list])
return results
@api(input=JsonInput()) # default batch=False
def predict_non_batch(self, parsed_json: JsonSerializable):
results = self.artifacts.classifier([parsed_json['text']])
return results[0]
For APIs with batch=True
, the user-defined API function will be required to process a list of input item at a time, and return a list of results of the same length. On the contrary, @api
by default uses batch=False
, which processes one input item at a time. Implementing a batch API allow your workload to benefit from BentoML's adaptive micro-batching mechanism when serving online traffic, and also will speed up offline batch inference job. We recommend using batch=True
if performance & throughput is a concern. Non-batch APIs are usually easier to implement, good for quick POC, simple use cases, and deploying on Serverless platforms such as AWS Lambda, Azure function, and Google KNative.
Read more about this change and example usage here: https://docs.bentoml.org/en/latest/api/adapters.html
BREAKING CHANGE: For DataframeInput
and TfTensorInput
users, it is now required to add batch=True
DataframeInput and TfTensorInput are special input types that only support accepting a batch of input at one time.
Input data validation while handling batch input
When the API function received a list of input, it is now possible to reject a subset of the input data and return an error code to the client, if the input data is invalid or malformated. Users can do this via the InferenceTask#discard
API, here's an example:
from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable, InferenceTask # type annotations are optional
@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):
@api(input=JsonInput(), batch=True)
def predict_batch(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]):
model_input = []
for json, task in zip(parsed_json_list, tasks):
if "text" in json:
model_input.append(json['text'])
else:
task.discard(http_status=400, err_msg="input json must contain `text` field")
results = self.artifacts.classifier(model_input)
return results
The number of tasks got discarded plus the length of the results array returned, should be equal to the length of the input list, this will allow BentoML to match the results back to tasks that have not yet been discarded.
Allow fine-grained control of the HTTP response, CLI inference job output, etc. E.g.:
import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError # type annotations are optional
class MyService(bentoml.BentoService):
@bentoml.api(input=JsonInput(), batch=False)
def predict(self, parsed_json: JsonSerializable, task: InferenceTask) -> InferenceResult:
if task.http_headers['Accept'] == "application/json":
predictions = self.artifact.model.predict([parsed_json])
return InferenceResult(
data=predictions[0],
http_status=200,
http_headers={"Content-Type": "application/json"},
)
else:
return InferenceError(err_msg="application/json output only", http_status=400)
Or when batch=True:
import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError # type annotations are optional
class MyService(bentoml.BentoService):
@bentoml.api(input=JsonInput(), batch=True)
def predict(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]) -> List[InferenceResult]:
rv = []
predictions = self.artifact.model.predict(parsed_json_list)
for task, prediction in zip(tasks, predictions):
if task.http_headers['Accept'] == "application/json":
rv.append(
InferenceResult(
data=prediction,
http_status=200,
http_headers={"Content-Type": "application/json"},
))
else:
rv.append(InferenceError(err_msg="application/json output only", http_status=400))
# or task.discard(err_msg="application/json output only", http_status=400)
return rv
Other adapter changes:
-
Added a 3 base adapters for implementing advanced adapters: FileInput, StringInput, MultiFileInput
-
Implementing new adapters that support micro-batching is a lot easier now: https://github.com/bentoml/BentoML/blob/v0.9.0.pre/bentoml/adapters/base_input.py
-
Per inference task prediction log #1089
-
More adapters support launching batch inference job from BentoML CLI run command now, see API reference for detailed examples: https://docs.bentoml.org/en/latest/api/adapters.html
Docker Build Improvements
- Optimize docker image build time (#1081) kudos to @ZeyadYasser!!
- Per python minor version base image to speed up image building #1101 #1096, thanks @gregd33!!
- Add "latest" tag to all user-facing docker base images (#1046)
Improved pip package management
Setting pip install options in BentoService @env
specification
As suggested here: #1036 (comment), Thanks @danield137 for suggesting the pip_extra_index_url
option!
@env(
auto_pip_dependencies=True,
pip_index_url='my_pypi_host_url',
pip_trusted_host='my_pypi_host_url',
pip_extra_index_url='extra_pypi_index_url'
)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
...
BREAKING CHANGE Due to this change, we have now removed the previous docker build arg PIP_INDEX_URL and ARG PIP_TRUSTED_HOST, due to it may be conflicting with settings in base image #1036
-
Support passing a conda environment.yml file to
@env
, as suggested in #725 #725 -
When a version is not specified in pip_packages list, it is expected to pin to the version found in the current python session. Now it is doing the same for packages added from adapter and artifact classes
-
Support specifying package requirement range now, e.g.:
@env(pip_packages=["abc==1.3", "foo>1.2,<=1.4"])
It can be any pip version requirement specifier https://pip.pypa.io/en/stable/reference/pip_install/#requirement-specifiers
- Renamed
pip_dependencies
topip_packages
andauto_pip_dependencies
toinfer_pip_packages
, the old API still works but will eventually be deprecated.
GCS support in YataiService
Adding Google Cloud Storage (GCS) support in YataiService, as the storage backend. This is an alternative to AWS S3, MiniIO, or POSIX file system. #1017 - Thank you @Korusuke @PrabhanshuAttri for creating the GCS support!
YataiService Labels API for model management
Managed packaged models in YataiService with labels API implemented in #1064
- Add labels to
BentoService.save
svc = MyBentoService()
svc.save(labels={'my_key': 'my_value', 'test': 'passed'})
- Add label query for CLI commands
-
bentoml get BENTO_NAME
,bentoml list
,bentoml deployment list
,bentoml lambda list
,bentoml sagemaker list
,bentoml azure-functions list
-
label query supports
=
,!=
,In
,NotIn
,Exists
,DoesNotExists
operator- e.g. key1=value1, key2!=value2, env In (prod, staging), Key Exists, Another_key DoesNotExist
Simple key/value label selector
- Roadmap - add web UI for filtering and searching with labels API
New framework support: Huggingface/Transformers
#1090 #1094 thanks @vedashree29296 for contributing this!
Usage & docs: https://docs.bentoml.org/en/stable/frameworks.html#transformers
Bug Fixes:
- Fixed #1030 - bentoml serve fails when packaged on Windows and deployed on Linux #1044
- Handle missing region during SageMaker deployment updates #1049
Internal & Testing:
- Re-organize artifacts related modules #1082, #1085
- Refactoring & improvements around dependency management #1084, #1086
- [TEST/CI] Add tests covering XgboostModelArtifact (#1079)
- [TEST/CI] Fix AWS moto related unit tests (#1077)
- Lock SQLAlchemy-utils version (#1078)
Contributors of 0.9.0 release
Thank you all for contributing to this release!! @danield137 @ericmand @ssakhavi @aviaviavi @dinakar29 @umihui @vedashree29296 @joerg84 @gregd33 @mayurnewase @narennadig @akainth015 @yubozhao @bojiang