-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add vLLM support to ChatQnA + DocSum Helm charts #610
base: main
Are you sure you want to change the base?
Conversation
Setting as draft. I have tested that DocSum with Gaudi vLLM works, and that ChatQnA Helm chart can be installed, but due to v1.1 image pulls currently taking so long in my test node, I haven't been able test ChatQnA with Gaudi vLLM properly yet. vLLM CPU version testing would also be needed before merging this (I'm hoping somebody else here could check at least DocSum with CPU vLLM). |
CI issues:
|
This overlaps partly with #403. |
Signed-off-by: Eero Tamminen <[email protected]>
Otherwise service throws an exception due to None variable value. Signed-off-by: Eero Tamminen <[email protected]>
Added HPA support for ChatQnA / vLLM. |
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
For now vLLM replaces just TGI, but as it supports also embedding, also TEI-embed/-rerank may be replaceable later on. Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Description
Add vLLM support to ChatQnA + DocSum Helm app charts.
Similarly to how it's already done in Agent component, these have now
tgi.enabled
andvllm.enabled
flags for selecting which LLM will be used.Notes:
llm-docsum-vllm
wrapper image from GenAIComps repo, as that image is currently missing from DockerHubIssues
Fixes #608 partially.
Type of change
New dependencies
opea/llm-docsum-vllm:latest
image is currently missing from CI & DockerHub registries: https://github.com/opea-project/GenAIComps/tree/main/comps/llms/summarization/vllm/langchain/Tests
Manual testing on top of "main" HEAD / v1.1 images.