✨ Helm Chart for OpenVINO vLLM #403

krish918 · 2024-09-05T14:44:55Z

Description

This PR introduces OpenVINO vLLM as a new inference generation microservice. It integrates with ChatQnA application, enhancing its capabilities to utilize either of TGI or OpenVINO vLLM as underlying inference engine. The decision to use TGI or OpenVINO vLLM can be made while installing the helm chart itself. This is achieved by turning off a flag corresponding to TGI and turning on a flag corresponding to vLLM service.

Example:

helm install chatqna chatqna --set tgi.enabled=false --set vllm.enabled=true

In above command, tgi.enabled flag is turned off which is honored by chart to avoid deploying TGI microservice. At the same time vllm.enabled flag is switched to true, which enables deployment of vLLM service.

To use the OpenVINO optimized vLLM, a separate values file is present which can be used to deploy vLLM OpenVINO microservice. When using OpenVINO values file, switching flags for tgi or vllm is not required as they are done inside the vllm-openvino-values.yaml file only.

Example:

helm -f ./chatqna/vllm-openvino-values.yaml install chatqna chatqna

Issues

No known issues yet.

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)

Dependencies

Following two docker images are required as new dependencies:

vllm:openvino : If not available, docker image can be built from official vLLM repo.
opea/llm-vllm:latest : This image is available publicly on dockerhub.

Tests

Two test pods for both the new microservices are part of newly added helm charts. These pods constituting basic curl tests for the testing connection and proper response from both the microservices. After spinning up these services, we can run following commands with appropriate values for placeholders to test the sanity of services:

helm install <release-name> <chart-name>
helm test <release-name>

lianhao

Thanks for contributing this. Seems it needs some fix to pass the CI. Please also see my comment.

Since @yongfengdu is also working on vllm related helm-chart, I'll let him comment more on this.

lianhao · 2024-09-06T00:36:28Z

helm-charts/common/vllm-openvino/templates/hpa.yaml

+    object:
+      metric:
+        # VLLM time metrics are in seconds
+        name: vllm_ov_request_latency


where does this metric comes from?

Removed this resource as this metric was not setup. Anyway, this vllm-openvino resources will not be used and will be replaced by OpenVINO specific values file.

lianhao · 2024-09-06T00:40:34Z

helm-charts/common/vllm-openvino/templates/configmap.yaml

+  labels:
+    {{- include "vllm-openvino.labels" . | nindent 4 }}
+data:
+  MODEL_ID: {{ .Values.global.LLM_MODEL_ID | quote }}


I don't think using global.LLM_MODEL_ID is a good idea. The charts in common directory is used as components to construct the e2e AI workload, so it's possible that the same vllm-openvino chart could be used twice in the e2e helm charts(e.g. chatqna, etc.) with 2 different models. So each vllm-openvino chart should have its own LLM_MODEL_ID settings.

sure. I've updated this in new commits.

lianhao · 2024-09-06T00:41:05Z

helm-charts/common/vllm-openvino/templates/configmap.yaml

+  PORT: {{ .Values.service.port | quote }}
+  HF_TOKEN: {{ .Values.global.HUGGINGFACEHUB_API_TOKEN | quote}}
+  VLLM_CPU_KVCACHE_SPACE: {{ .Values.VLLM_CPU_KVCACHE_SPACE | quote }}
+  HABANA_VISIBLE_DEVICES : {{ .Values.HABANA_VISIBLE_DEVICES | quote }}


Who is using HABANA_VISIBLE_DEVICES? On K8S, it's the habana k8s-device-plugin to allocate the habana device into the container.

Removed this variable after confirming not being used.

lianhao · 2024-09-06T00:50:00Z

helm-charts/common/vllm-openvino/templates/configmap.yaml

+  no_proxy: {{ .Values.global.no_proxy | quote }}
+  HABANA_LOGS: "/tmp/habana_logs"
+  NUMBA_CACHE_DIR: "/tmp"
+  TRANSFORMERS_CACHE: "/tmp/transformers_cache"


We don't use TRANSFORMERS_CACHE any more for tgi, but does this required for openvino. For environment in configmap, please make sure that everything you set here is actually play a role for actual application in the k8s pod

Removed this deprecated variable.

lianhao · 2024-09-06T00:51:57Z

helm-charts/common/vllm-openvino/values.yaml

+replicaCount: 1
+
+image:
+  repository: vllm


Where does this image comes from? Could you please provide the link to this image on docker hub?

This image is yet not available on Dockerhub. The URL is dockerfile is added in description: vLLM OpenVINO Dockerfile.

In that case, we should defer this until the container image is available, otherwise, people can not use the helm chart directly.

lianhao · 2024-09-06T00:58:24Z

helm-charts/common/vllm-openvino/templates/configmap.yaml

+  TRANSFORMERS_CACHE: "/tmp/transformers_cache"
+  HF_HOME: "/tmp/.cache/huggingface"
+  {{- if .Values.MAX_INPUT_LENGTH }}
+  MAX_INPUT_LENGTH: {{ .Values.MAX_INPUT_LENGTH | quote }}


Based on openvino documentation about environment variable, I don't think it recognize this MAX_INPUT_LENGTH env

Removed this variable.

lianhao · 2024-09-06T00:58:39Z

helm-charts/common/vllm-openvino/templates/configmap.yaml

+  MAX_INPUT_LENGTH: {{ .Values.MAX_INPUT_LENGTH | quote }}
+  {{- end }}
+  {{- if .Values.MAX_TOTAL_TOKENS }}
+  MAX_TOTAL_TOKENS: {{ .Values.MAX_TOTAL_TOKENS | quote }}


ditto as MAX_INPUT_LENGTH

yongfengdu

For the backend components, I just integrated the vllm helm charts one or two days ago, I think it should be able to support the vllm-openvino case with just a replacement of docker image(In this case, we can provide a vllm-openvino.yaml to specify different values there). Could you check this? (https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common/vllm)

For the microservice llm-vllm, since it's the same layer with current existing llm-uservice and providing same function, I am looking to see if that's possible to use just one. Even one further step, to use just one llm docker image to support both tgi and vllm backend (Now we have 2 images: llm-tgi and llm-vllm). This may need the llm.py changes before we can merge, so I'm ok now with one more llm-vllm-uservice.

It would be great to support more cases/scenarios with less helm charts to reduce the maintenance efforts.

yongfengdu · 2024-09-06T02:20:23Z

helm-charts/chatqna/Chart.yaml

@@ -9,9 +9,27 @@ dependencies:
  - name: tgi
    version: 0.9.0


The latest version is updated to 1.0.0, for all components.

sure. updated all the versions.

yongfengdu · 2024-09-06T02:23:08Z

helm-charts/common/llm-vllm-uservice/README.md

+
+Helm chart for deploying a microservice which facilitates connections and handles responses from OpenVINO vLLM microservice.
+
+`llm-vllm-uservice` depends on OpenVINO vLLM. You should properly set `vLLM_ENDPOINT` as the HOST URI of vLLM microservice. If not set, it will consider the default value : `http://<helm-release-name>-vllm-openvino:80`


I remember hit a limitation of 15 characters for the Chart name, maybe this is no longer a limitation.

Was able to work with longer names, so doesn't seem like a limitation any more.

yongfengdu · 2024-09-06T02:24:22Z

helm-charts/common/vllm-openvino/README.md

@@ -0,0 +1,68 @@
+# OpenVINO vLLM 
+
+Helm chart for deploying OpenVINO optimized vLLM Inference service.


As mentioned earlier, please try using the vllm helm charts with vllm-openvino.yaml values.

added a separate values file for OpenVINO.

Signed-off-by: Krishna Murti <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Krishna Murti <[email protected]>

lianhao · 2024-09-11T00:59:08Z

@krish918 Can you confirm that the vllm container image used by `vllm-openvino' chart is publicly available? Otherwise I don't think it's the right time to add a helm chart which is can NOT be installed. We should either ask the vllm upstream to release their container image, or our OPEA community to release the corresponding container image. K8S is not like docker-compose, we normally don't control the node where the vllm-openvino will be run, so we can't do the manual container image build process like the docker-compose usage scenario, unless we login into all the k8s cluster nodes and build the container image on every node.

yongfengdu · 2024-09-18T07:47:49Z

@krish918 Please refer to these 2 commit for my suggestions of reusing vllm and llm-uservice helm charts for openvino support.
yongfengdu@b991659
yongfengdu@4cdb585

for more information, see https://pre-commit.ci

Signed-off-by: Krishna Murti <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Krishna Murti <[email protected]>

chensuyue · 2024-10-09T09:42:49Z

Uploaded the images into local registry and retriggered the test.

Signed-off-by: Krishna Murti <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Krishna Murti <[email protected]>

krish918 · 2024-10-10T05:48:50Z

@lianhao @yongfengdu @chensuyue opea/vllm-openvino Image is now available and accessible. All the CI checks are passing. Also, a separate openvino-values file are added to chatqna as well as to vllm subchart for independent deployment. Please have a look.

yongfengdu

Refer to #473
The CI was modified to test only with ci-*values.yaml files.
By default the new vllm-openvino-values.yaml will not be tested. It's up to you to decide whether to enable vllm-openvino test or not. If you want to enable it, add a link with "ln -s vllm-openvino-values.yaml ci-vllm-openvino-values.yaml"

For llm-ctrl-uservice, We have plan to merge this with llm-uservice(And also merge tei/teirerank), but I'm OK to add this for this time.

yongfengdu · 2024-10-18T05:24:30Z

helm-charts/common/llm-ctrl-uservice/Chart.yaml

+  - name: vllm
+    version: 1.0.0
+    repository: file://../vllm
+    condition: autodependency.enabled


autodependency.enabled is no longer used, use vllm.enabled instead.

yongfengdu · 2024-10-18T05:25:12Z

helm-charts/common/llm-ctrl-uservice/README.md

+export https_proxy=<your_https_proxy>
+
+helm dependency update
+helm install llm-ctrl-uservice . --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set LLM_MODEL_ID=${MODELNAME} --set vllm.LLM_MODEL_ID=${MODELNAME} --set autodependency.enabled=true --set global.http_proxy=${http_proxy} --set global.https_proxy=${https_proxy} --wait


vllm.enabled

yongfengdu · 2024-10-18T05:26:06Z

helm-charts/common/llm-ctrl-uservice/values.yaml

+# This is a YAML-formatted file.
+# Declare variables to be passed into your templates.
+
+autodependency:


s/autodependency/vllm/

Signed-off-by: Krishna Murti <[email protected]>

for more information, see https://pre-commit.ci

krish918 · 2024-10-29T04:36:03Z

@yongfengdu - enabled CI for new values file.

Signed-off-by: Krishna Murti <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Krishna Murti <[email protected]>

eero-t · 2024-11-26T10:49:28Z

helm-charts/chatqna/Chart.yaml

+  - name: llm-uservice
+    version: 1.0.0
+    repository: "file://../common/llm-uservice"
+    condition: tgi.enabled
+  - name: llm-ctrl-uservice
+    version: 1.0.0
+    repository: "file://../common/llm-ctrl-uservice"
+    condition: vllm.enabled


Why you're adding wrappers?

They were removed over month ago for v1.1 (#474), are unnecessary, and LLM wrapper uses a langserve component with a problematic license (opea-project/GenAIComps#264).

eero-t · 2024-11-26T10:49:49Z

helm-charts/chatqna/README.md

+For LLM inference, two more microservices will be required. We can either use [TGI](https://github.com/huggingface/text-generation-inference) or [vLLM](https://github.com/vllm-project/vllm) as our LLM backend. Depending on that, we will have following microservices as part of dependencies for ChatQnA application.
+
+1. For using **TGI** as an inference service, following 2 microservices will be required:
+
+   - [llm-uservice](../common/llm-uservice/README.md)
+   - [tgi](../common/tgi/README.md)
+
+2. For using **vLLM** as an inference service, following 2 microservices would be required:
+
+   - [llm-ctrl-uservice](../common/llm-ctrl-uservice/README.md)


Ditto, why add wrappers?

This PR is from 1.0 release time, so with some old code.
I think it's better to merge with #610 , or just simple changes to support openvino after 610 get merged.

Sounds good, but note that I'm testing my PR only with vLLM Gaudi version.

I.e. currently both CPU and GPU/Openvino support need to be added / tested after it.

That PR has also quite a few comment TODOs about vLLM options where some feedback would be needed / appreciated.

eero-t · 2024-11-26T10:51:56Z

helm-charts/chatqna/README.md

+   - [llm-ctrl-uservice](../common/llm-ctrl-uservice/README.md)
+   - [vllm](../common/vllm/README.md)
+
+> **_NOTE :_** We shouldn't have both inference engine deployed. It is required to only setup either of them. To achieve this, conditional flags are added in the chart dependency. We will be switching off flag corresponding to one service and switching on the other, in order to have a proper setup of all ChatQnA dependencies.


Why there could not be multiple inferencing engines?

ChatQnA has 4 inferencing subservices for which it is already using 2 inferencing engines, TEI and TGI.

And I do not see why it could not use e.g. TEI for embed + rerank, TGI for guardrails, and vLLM for LLM.

Please rephrase.

eero-t · 2024-11-26T10:55:16Z

helm-charts/chatqna/README.md

+2. Please set `http_proxy`, `https_proxy` and `no_proxy` values while installing chart, if you are behind a proxy.
+


IMHO duplicating general information to application READMEs is not maintainable, there are too many of them. Instead you could include link to general options (helm-charts/README.md).

eero-t · 2024-11-26T11:54:04Z

helm-charts/chatqna/README.md

 curl http://localhost:8888/v1/chatqna \
+    -X POST \


Why add redundant POST? -d already implies that (see man curl).

eero-t · 2024-11-26T12:18:25Z

helm-charts/common/vllm/openvino-values.yaml

+
+image:
+  repository: opea/vllm-openvino
+  pullPolicy: IfNotPresent


Drop the value, it breaks CI testing for latest tag (see #587).

eero-t · 2024-11-26T12:19:41Z

helm-charts/common/llm-ctrl-uservice/values.yaml

+
+image:
+  repository: opea/llm-vllm
+  pullPolicy: IfNotPresent


Drop the value, it breaks CI testing for latest tag (see #587).

eero-t · 2024-11-26T12:20:30Z

helm-charts/chatqna/vllm-openvino-values.yaml

+  openvino_enabled: true
+  image:
+    repository: opea/vllm-openvino
+    pullPolicy: IfNotPresent


Drop the value, it breaks CI testing for latest tag (see #587).

eero-t · 2024-11-26T12:22:17Z

helm-charts/common/llm-ctrl-uservice/values.yaml

+  # We usually recommend not to specify default resources and to leave this as a conscious
+  # choice for the user. This also increases chances charts run on environments with little
+  # resources, such as Minikube. If you do want to specify resources, uncomment the following
+  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.


This comment is obsolete. Resource request should match the actual usage (+ some space for growth), but there are some complications. See discussion in #431.

eero-t · 2024-11-26T12:24:27Z

helm-charts/chatqna/README.md

 helm install chatqna chatqna --set global.HUGGINGFACEHUB_API_TOKEN=${HFTOKEN} --set global.modelUseHostPath=${MODELDIR} --set tgi.LLM_MODEL_ID=${MODELNAME}
+```
+
+```bash
 # To use Gaudi device


Now that there's support for both TGI and vLLM, all these comments here could state which one is used, e.g. like this:

Suggested change

# To use Gaudi device

# To use Gaudi device for TGI

krish918 requested review from yongfengdu and lianhao as code owners September 5, 2024 14:44

lianhao reviewed Sep 6, 2024

View reviewed changes

yongfengdu reviewed Sep 6, 2024

View reviewed changes

yongfengdu requested changes Sep 6, 2024

View reviewed changes

krish918 force-pushed the chart/vllm-ov branch from e357dfc to c06f613 Compare September 6, 2024 12:42

krish918 added 7 commits September 7, 2024 01:22

✨ Added chart for vllm-openvino

df8c195

Signed-off-by: Krishna Murti <[email protected]>

✨ Added charts for llm-vllm microservice

d339c74

Signed-off-by: Krishna Murti <[email protected]>

➕ Updated chatqna to have conditional dependency on tgi and vllm

c8a420c

Signed-off-by: Krishna Murti <[email protected]>

🧪 Added tests for verifying pod sanity

21be6c9

Signed-off-by: Krishna Murti <[email protected]>

📝 Added docs for instruction to setup chatqna with vllm

25528c9

Signed-off-by: Krishna Murti <[email protected]>

🔥 removed unsupported env vars

140d1b5

Signed-off-by: Krishna Murti <[email protected]>

♻️ Removed global Model ID var | resolved readme conflicts

815c51b

Signed-off-by: Krishna Murti <[email protected]>

krish918 force-pushed the chart/vllm-ov branch from 4f88ece to 815c51b Compare September 6, 2024 19:55

krish918 and others added 4 commits September 7, 2024 01:26

Merge branch 'main' into chart/vllm-ov

2621fa3

[pre-commit.ci] auto fixes from pre-commit.com hooks

4ac8fb0

for more information, see https://pre-commit.ci

📌 Bumped up the chart version

5fffdd0

Signed-off-by: Krishna Murti <[email protected]>

🔥 Removed unused vars and resources

7497322

Signed-off-by: Krishna Murti <[email protected]>

pre-commit-ci bot and others added 10 commits September 18, 2024 09:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

4154f02

for more information, see https://pre-commit.ci

🔧 added openvino values files

027923c

Signed-off-by: Krishna Murti <[email protected]>

Merge branch 'main' into chart/vllm-ov

1f513a4

🩹 minor fixes

8b911f5

Signed-off-by: Krishna Murti <[email protected]>

🩹 renamed chart llm-vllm-uservice to avoid conflict

2ba4c8f

Signed-off-by: Krishna Murti <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

207d2bd

for more information, see https://pre-commit.ci

Merge branch 'main' into chart/vllm-ov

b36ac56

Merge branch 'main' into chart/vllm-ov

20670b7

updated vllm-openvino image

01eb2b4

Signed-off-by: Krishna Murti <[email protected]>

🔖 updated tags for llm-vllm and ctrl-uservice

738ff59

Signed-off-by: Krishna Murti <[email protected]>

krish918 and others added 4 commits October 9, 2024 17:15

📝 formatting fixes in readme files

05a2be2

Signed-off-by: Krishna Murti <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f0dae33

for more information, see https://pre-commit.ci

🎨 prettier formatting fixes

a8b85d7

Signed-off-by: Krishna Murti <[email protected]>

🎨 prettier formatting fixes for chatqna readme

294f1a0

Signed-off-by: Krishna Murti <[email protected]>

krish918 force-pushed the chart/vllm-ov branch from a74e68e to 294f1a0 Compare October 9, 2024 18:45

krish918 added 2 commits October 10, 2024 01:08

retrigger CI checks

9b12618

Signed-off-by: Krishna Murti <[email protected]>

📝 minor updates in readme files

e890448

Signed-off-by: Krishna Murti <[email protected]>

krish918 force-pushed the chart/vllm-ov branch from 6598362 to e890448 Compare October 10, 2024 03:48

retrigger CI checks

ef59964

Signed-off-by: Krishna Murti <[email protected]>

krish918 requested review from lianhao and yongfengdu October 10, 2024 05:48

yongfengdu reviewed Oct 18, 2024

View reviewed changes

krish918 and others added 3 commits October 29, 2024 08:59

💚 enabled ci checks for new values files

acd9a47

Signed-off-by: Krishna Murti <[email protected]>

Merge branch 'main' into chart/vllm-ov

afc3d45

[pre-commit.ci] auto fixes from pre-commit.com hooks

cbb8d65

for more information, see https://pre-commit.ci

krish918 and others added 4 commits October 30, 2024 01:33

🩹 fixed vllm charts multiple installation

d570861

Signed-off-by: Krishna Murti <[email protected]>

Merge branch 'main' into chart/vllm-ov

26562a9

[pre-commit.ci] auto fixes from pre-commit.com hooks

03b7d26

for more information, see https://pre-commit.ci

increased helm rollout timeout in ci

9e1cdf1

Signed-off-by: Krishna Murti <[email protected]>

krish918 requested review from daisy-ycguo and mkbhanda as code owners October 30, 2024 07:32

krish918 added 2 commits November 5, 2024 02:30

💚 fixes to enable ci for openvino-vllm

df8261e

Signed-off-by: Krishna Murti <[email protected]>

triggering CI checks

ac341ac

Signed-off-by: Krishna Murti <[email protected]>

yongfengdu mentioned this pull request Nov 26, 2024

Sync vLLM support from Examples repo k8s manifests to Helm charts #608

Open

poussa removed the request for review from daisy-ycguo November 26, 2024 07:55

eero-t suggested changes Nov 26, 2024

View reviewed changes

eero-t mentioned this pull request Nov 26, 2024

WIP: Add vLLM support to ChatQnA + DocSum Helm charts #610

Draft

2 tasks


		Helm chart for deploying a microservice which facilitates connections and handles responses from OpenVINO vLLM microservice.

		`llm-vllm-uservice` depends on OpenVINO vLLM. You should properly set `vLLM_ENDPOINT` as the HOST URI of vLLM microservice. If not set, it will consider the default value : `http://<helm-release-name>-vllm-openvino:80`

		@@ -0,0 +1,68 @@
		# OpenVINO vLLM

		Helm chart for deploying OpenVINO optimized vLLM Inference service.

		2. Please set `http_proxy`, `https_proxy` and `no_proxy` values while installing chart, if you are behind a proxy.

✨ Helm Chart for OpenVINO vLLM #403

Are you sure you want to change the base?

✨ Helm Chart for OpenVINO vLLM #403

Conversation

krish918 commented Sep 5, 2024 • edited by mkbhanda Loading

Description

Issues

Type of change

Dependencies

Tests

lianhao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yongfengdu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lianhao commented Sep 11, 2024 • edited Loading

yongfengdu commented Sep 18, 2024

chensuyue commented Oct 9, 2024

krish918 commented Oct 10, 2024

yongfengdu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krish918 commented Oct 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krish918 commented Sep 5, 2024 •

edited by mkbhanda

Loading

lianhao commented Sep 11, 2024 •

edited

Loading

eero-t Nov 26, 2024 •

edited

Loading