Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add vLLM support to ChatQnA + DocSum Helm charts #610

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Commits on Nov 29, 2024

  1. Add monitoring support for the vLLM component

    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    45c17a2 View commit details
    Browse the repository at this point in the history
  2. Fix llm-uservice: LLM_MODEL => LLM_MODEL_ID

    Otherwise service throws an exception due to None variable value.
    
    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    312a1b7 View commit details
    Browse the repository at this point in the history
  3. Add vLLM support for DocSum

    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    79a1b22 View commit details
    Browse the repository at this point in the history
  4. Example for checking that vLLM metrics work too

    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    e851b19 View commit details
    Browse the repository at this point in the history
  5. Initial vLLM support for ChatQnA

    For now vLLM replaces just TGI, but as it supports also embedding,
    also TEI-embed/-rerank may be replaceable later on.
    
    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    d9498ec View commit details
    Browse the repository at this point in the history
  6. Fix HPA comments in tgi/tei/tererank values files

    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    ff58fae View commit details
    Browse the repository at this point in the history
  7. Add HPA scaling support for ChatQnA / vLLM

    Signed-off-by: Eero Tamminen <[email protected]>
    eero-t committed Nov 29, 2024
    Configuration menu
    Copy the full SHA
    1c56d39 View commit details
    Browse the repository at this point in the history