Running with gemma 2 in vllm gives a chat template error #1384

juanjuanignacio · 2024-08-05T10:54:59Z

juanjuanignacio
Aug 5, 2024

Hello,

In running chat ui and trying some models, with phi3 and llama i had no problem but when I run gemma2 in vllm Im not able to make any good api request,
in env.local:
{
"name": "google/gemma-2-2b-it",
"id": "google/gemma-2-2b-it",
"chatPromptTemplate": "{{#each messages}}{{#ifUser}}<start_of_turn>user\n{{#if @FIRST}}{{#if @root.preprompt}}{{@root.preprompt}}\n{{/if}}{{/if}}{{content}}<end_of_turn>\n<start_of_turn>model\n{{/ifUser}}{{#ifAssistant}}{{content}}<end_of_turn>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"temperature": 0.1,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 2048,
"stop": ["<end_of_turn>"]
},
"endpoints": [
{
"type": "openai",
"baseURL": "http://127.0.0.1:8000/v1",

    }
  ]

}

and I always have the same response in vllm server:

ERROR 08-05 12:39:06 serving_chat.py:118] Error in applying chat template from request: System role not supported
INFO: 127.0.0.1:42142 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

do someone know if I have to change and how do change the chat template ? is it a vllm problem or a chat ui problem?

Thank U!

Shadab-Devnagri · 2024-08-26T19:59:55Z

Shadab-Devnagri
Aug 26, 2024

Hey were you able to solve it.

0 replies

nsarrazin · 2024-08-26T22:36:53Z

nsarrazin
Aug 26, 2024
Maintainer

Relevant issue: #1386

0 replies

flefevre · 2024-10-22T08:28:09Z

flefevre
Oct 22, 2024

I am facing same issue, any idea?

0 replies

nsarrazin · 2024-11-07T21:39:50Z

nsarrazin
Nov 7, 2024
Maintainer

Could you try adding "systemRoleSupported": false in your model config? that should fix it

0 replies

Shadab-Devnagri · 2024-11-08T05:20:23Z

Shadab-Devnagri
Nov 8, 2024

Use vllm v0.6.2 something like.

docker run \
  --runtime nvidia \
  --gpus all \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -e VLLM_ATTENTION_BACKEND=FLASHINFER \
  -p 8000:8000 \
  --env "HUGGING_FACE_HUB_TOKEN=your_hf_token" \
  --env "VLLM_ALLOW_LONG_MAX_MODEL_LEN=1" \
  --ipc=host \
  --log-opt max-size=10m \
  --log-opt max-file=3 \
  vllm/vllm-openai:v0.6.2 \
  --model "hugging-quants/gemma-2-9b-it-AWQ-INT4" \
  --max-model-len 8192 \
  --enable-chunked-prefill True \
  --max-num-batched-tokens 256 \
  --max-num-seqs 8 \
  --gpu-memory-utilization 0.9

FYI gemma doesn't support ssytem prompt. only user prompt

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running with gemma 2 in vllm gives a chat template error #1384

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Running with gemma 2 in vllm gives a chat template error #1384

juanjuanignacio Aug 5, 2024

Replies: 5 comments

Shadab-Devnagri Aug 26, 2024

nsarrazin Aug 26, 2024 Maintainer

flefevre Oct 22, 2024

nsarrazin Nov 7, 2024 Maintainer

Shadab-Devnagri Nov 8, 2024

juanjuanignacio
Aug 5, 2024

Shadab-Devnagri
Aug 26, 2024

nsarrazin
Aug 26, 2024
Maintainer

flefevre
Oct 22, 2024

nsarrazin
Nov 7, 2024
Maintainer

Shadab-Devnagri
Nov 8, 2024