Help with basic api implementation running unstable #2636

brauliobo · 2024-11-23T23:19:37Z

Hi all, I've modified the quantized-t5 example to provide a very basic API to be used by other applications.
The code was done here brauliobo@d579b3b

The issue comes when using the API with with multiple serial calls. It more or less alternates the output into a correct or an empty output. See below:

braulio @ whitebeast ➜  candle git:(main) ✗  nice cargo run --release --example quantized-t5 --features cuda -- --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0

   Compiling candle-examples v0.8.0 (/srv/candle/candle-examples)
    Finished `release` profile [optimized] target(s) in 3.64s
     Running `target/release/examples/quantized-t5 --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0`
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.86 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.74 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
 O que você está fazendo?
7 tokens generated (20.01 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.98 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
 O que você está fazendo?
7 tokens generated (20.01 token/s)

This output is triggered by the following curl calls:

braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%                                                                                                                                                                                                          
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%

I guess the issue is due to reusing the model or the tokenizer? Any help is appreciated

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with basic api implementation running unstable #2636

Help with basic api implementation running unstable #2636

brauliobo commented Nov 23, 2024 •

edited

Loading

Help with basic api implementation running unstable #2636

Help with basic api implementation running unstable #2636

Comments

brauliobo commented Nov 23, 2024 • edited Loading

brauliobo commented Nov 23, 2024 •

edited

Loading