TensorFlow Serving

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

To note a few features:

Can serve multiple models, or multiple versions of the same model simultaneously
Exposes both gRPC as well as HTTP inference endpoints
Allows deployment of new model versions without changing any client code
Supports canarying new versions and A/B testing experimental models
Adds minimal latency to inference time due to efficient, low-overhead implementation
Features a scheduler that groups individual inference requests into batches for joint execution on GPU, with configurable latency controls
Supports many servables: Tensorflow models, embeddings, vocabularies, feature transformations and even non-Tensorflow-based machine learning models

Serve a Tensorflow model in 60 seconds

# Download the TensorFlow Serving Docker image and repo
docker pull tensorflow/serving

git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"

# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \
    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
    -e MODEL_NAME=half_plus_two \
    tensorflow/serving &

# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
    -X POST http://localhost:8501/v1/models/half_plus_two:predict

# Returns => { "predictions": [2.5, 3.0, 4.5] }

End-to-End Training & Serving Tutorial

Refer to the official Tensorflow documentations site for a complete tutorial to train and serve a Tensorflow Model.

Documentation

Set up

The easiest and most straight-forward way of using TensorFlow Serving is with Docker images. We highly recommend this route unless you have specific needs that are not addressed by running in a container.

Install Tensorflow Serving using Docker (Recommended)
Install Tensorflow Serving without Docker (Not Recommended)
Build Tensorflow Serving from Source with Docker
Deploy Tensorflow Serving on Kubernetes

Use

Export your Tensorflow model

In order to serve a Tensorflow model, simply export a SavedModel from your Tensorflow program. SavedModel is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models.

Please refer to Tensorflow documentation for detailed instructions on how to export SavedModels.

Configure and Use Tensorflow Serving

Extend

Tensorflow Serving's architecture is highly modular. You can use some parts individually (e.g. batch scheduling) and/or extend it to serve new use cases.

Contribute

If you'd like to contribute to TensorFlow Serving, be sure to review the contribution guidelines.

For more information

Please refer to the official TensorFlow website for more information.

Build from source

export TF_ENABLE_XLA=1
export TF_CUDA_COMPUTE_CAPABILITIES=8.6
export TF_NCCL_VERSION=2
export TF_NEED_HDFS=0
export TF_CUDNN_VERSION=8
export TF_TENSORRT_VERSION=8
export TF_CUDA_VERSION=11.4
export TF_NEED_CUDA=1
export TF_CUBLAS_VERSION=11
export TF_CUDA_PATHS=/usr,/usr/local/cuda
export TF_NEED_TENSORRT=1
export CC_OPT_FLAGS="-march=native -mtune=native"

bazel build --config=cuda --copt="-fPIC" --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --verbose_failures tensorflow_serving/model_servers:tensorflow_model_server

cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/

bazel build --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" tensorflow_serving/tools/pip_package:build_pip_package

bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package /tmp/pip

Name		Name	Last commit message	Last commit date
Latest commit History 4,512 Commits
.github		.github
tensorflow_serving		tensorflow_serving
third_party		third_party
tools		tools
.bazelrc		.bazelrc
.gitignore		.gitignore
.gitmodules		.gitmodules
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
WORKSPACE		WORKSPACE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorFlow Serving

Serve a Tensorflow model in 60 seconds

End-to-End Training & Serving Tutorial

Documentation

Set up

Use

Export your Tensorflow model

Configure and Use Tensorflow Serving

Extend

Contribute

For more information

Build from source

About

Releases

Packages

Languages

License

TJU-NSL/nvidia-serving

Folders and files

Latest commit

History

Repository files navigation

TensorFlow Serving

Serve a Tensorflow model in 60 seconds

End-to-End Training & Serving Tutorial

Documentation

Set up

Use

Export your Tensorflow model

Configure and Use Tensorflow Serving

Extend

Contribute

For more information

Build from source

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages