Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with basic api implementation running unstable #2636

Open
brauliobo opened this issue Nov 23, 2024 · 0 comments
Open

Help with basic api implementation running unstable #2636

brauliobo opened this issue Nov 23, 2024 · 0 comments

Comments

@brauliobo
Copy link

brauliobo commented Nov 23, 2024

Hi all, I've modified the quantized-t5 example to provide a very basic API to be used by other applications.
The code was done here brauliobo@d579b3b

The issue comes when using the API with with multiple serial calls. It more or less alternates the output into a correct or an empty output. See below:

braulio @ whitebeast ➜  candle git:(main) ✗  nice cargo run --release --example quantized-t5 --features cuda -- --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0

   Compiling candle-examples v0.8.0 (/srv/candle/candle-examples)
    Finished `release` profile [optimized] target(s) in 3.64s
     Running `target/release/examples/quantized-t5 --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0`
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.86 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.74 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
 O que você está fazendo?
7 tokens generated (20.01 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...

1 tokens generated (12.98 token/s)

Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
 O que você está fazendo?
7 tokens generated (20.01 token/s)

This output is triggered by the following curl calls:

braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%                                                                                                                                                                                                          
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%                                                                                                                                                                                                                                   
braulio @ whitebeast ➜  candle git:(main) ✗  curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%  

I guess the issue is due to reusing the model or the tokenizer? Any help is appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant