[Bug]: 使用 Xinference vLLM 启动 qwen2.5-32b-instruct 推理结果都是感叹号 #1038

andylzming · 2024-10-28T08:49:37Z

Model Series

Qwen2.5

What are the models used?

Qwen2.5-32B-Instruct

What is the scenario where the problem happened?

Xinference

Is this a known issue?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find an answer there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

System Info / 系統信息

(xinference) [root@gpu-server xinference_160]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0
(xinference) [root@gpu-server xinference_160]# python --version
Python 3.10.6
(xinference) [root@gpu-server xinference_160]# pip list | grep torch
torch                             2.3.0+cu121
torchaudio                        2.3.0+cu121
torchvision                       0.18.0+cu121
(xinference) [root@gpu-server xinference_160]# pip list | grep vllm
vllm                              0.4.2
vllm-nccl-cu12                    2.18.1.0.4.0
(xinference) [root@gpu-server xinference_160]# pip list | grep transformer
ctransformers                     0.2.27
sentence-transformers             2.7.0
transformers                      4.43.1
transformers-stream-generator     0.0.4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

(xinference) [root@gpu-server xinference_160]# pip list | grep xinference
xinference                        0.16.0
xinference-client                 0.16.0

The command used to start Xinference / 用以启动 xinference 的命令

nohup xinference-local -H 172.22.149.188 -p 59997 &

Reproduction / 复现过程

通过 vLLM 启动 qwen2.5-32b-instruct 模型，在 xinference 提供的测试页面对话，推理结果如截图：

Expected behavior / 期待表现

正常推理结果。

Log output

推理结果输出 ！！！！

Description

System Info / 系統信息

(xinference) [root@gpu-server xinference_160]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0
(xinference) [root@gpu-server xinference_160]# python --version
Python 3.10.6
(xinference) [root@gpu-server xinference_160]# pip list | grep torch
torch                             2.3.0+cu121
torchaudio                        2.3.0+cu121
torchvision                       0.18.0+cu121
(xinference) [root@gpu-server xinference_160]# pip list | grep vllm
vllm                              0.4.2
vllm-nccl-cu12                    2.18.1.0.4.0
(xinference) [root@gpu-server xinference_160]# pip list | grep transformer
ctransformers                     0.2.27
sentence-transformers             2.7.0
transformers                      4.43.1
transformers-stream-generator     0.0.4

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

(xinference) [root@gpu-server xinference_160]# pip list | grep xinference
xinference                        0.16.0
xinference-client                 0.16.0

The command used to start Xinference / 用以启动 xinference 的命令

nohup xinference-local -H 172.22.149.188 -p 59997 &

Reproduction / 复现过程

通过 vLLM 启动 qwen2.5-32b-instruct 模型，在 xinference 提供的测试页面对话，推理结果如截图：

Expected behavior / 期待表现

正常推理结果。

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-10-29T03:05:10Z

not following issue template.

vllm version is too old. try disabling custom reduce if it is enabled and you are using PCIE cards.

andylzming · 2024-10-29T09:16:22Z

not following issue template.

vllm version is too old. try disabling custom reduce if it is enabled and you are using PCIE cards.

The VLLM version has been upgraded to 0.5.1, but the issue still persists.

(xinference) [root@gpu-server ~]# pip list | grep vllm
vllm                              0.5.1
vllm-flash-attn                   2.5.9
vllm-nccl-cu12                    2.18.1.0.4.0

jklj077 · 2024-10-29T11:14:48Z

can you please follow the issue template? what's your driver version? what's your card? did you use multiple cards? how did you start vllm? and so on. why does xinference show custom-qwen25-32-instruct? how to actually reproduce?

jklj077 mentioned this issue Nov 19, 2024

[Bug]: vllm infer Qwen2.5-32B-Instruct-AWQ with 2 * Nvidia-L20, output repeat !!!! #1090

Open

4 tasks

jklj077 added the help wanted Extra attention is needed label Nov 19, 2024

jklj077 mentioned this issue Nov 26, 2024

[Bug]: Qwen2.5-32b-int4用vllm跑好像只会生成感叹号 #1103

Open

4 tasks

jklj077 added the duplicate This issue or pull request already exists label Nov 26, 2024

jklj077 mentioned this issue Nov 26, 2024

[Bug]: Qwen2.5-32B-GPTQ-Int4 inference !!!!! vllm-project/vllm#10656

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: 使用 Xinference vLLM 启动 qwen2.5-32b-instruct 推理结果都是感叹号 #1038

[Bug]: 使用 Xinference vLLM 启动 qwen2.5-32b-instruct 推理结果都是感叹号 #1038

andylzming commented Oct 28, 2024

jklj077 commented Oct 29, 2024

andylzming commented Oct 29, 2024 •

edited

Loading

jklj077 commented Oct 29, 2024

[Bug]: 使用 Xinference vLLM 启动 qwen2.5-32b-instruct 推理结果都是感叹号 #1038

[Bug]: 使用 Xinference vLLM 启动 qwen2.5-32b-instruct 推理结果都是感叹号 #1038

Comments

andylzming commented Oct 28, 2024

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this a known issue?

Information about environment

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

Log output

Description

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

jklj077 commented Oct 29, 2024

andylzming commented Oct 29, 2024 • edited Loading

jklj077 commented Oct 29, 2024

andylzming commented Oct 29, 2024 •

edited

Loading