You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I've modified the quantized-t5 example to provide a very basic API to be used by other applications.
The code was done here brauliobo@d579b3b
The issue comes when using the API with with multiple serial calls. It more or less alternates the output into a correct or an empty output. See below:
braulio @ whitebeast ➜ candle git:(main) ✗ nice cargo run --release --example quantized-t5 --features cuda -- --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0
Compiling candle-examples v0.8.0 (/srv/candle/candle-examples)
Finished `release` profile [optimized] target(s) in 3.64s
Running `target/release/examples/quantized-t5 --model-id google/madlad400-7b-mt-bt --weight-file madlad400-7b-bt-model-q4k.gguf --temperature 0`
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
1 tokens generated (12.86 token/s)
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
1 tokens generated (12.74 token/s)
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
O que você está fazendo?
7 tokens generated (20.01 token/s)
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
1 tokens generated (12.98 token/s)
Generating text with prompt: <2pt> What are you doing?
Encoding input tokens...
O que você está fazendo?
7 tokens generated (20.01 token/s)
This output is triggered by the following curl calls:
braulio @ whitebeast ➜ candle git:(main) ✗ curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%
braulio @ whitebeast ➜ candle git:(main) ✗ curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%
braulio @ whitebeast ➜ candle git:(main) ✗ curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%
braulio @ whitebeast ➜ candle git:(main) ✗ curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":""}%
braulio @ whitebeast ➜ candle git:(main) ✗ curl -X POST localhost:10201/completions -H "Content-Type: application/json" -d '{"prompt": "<2pt> What are you doing?"}'
{"content":" O que você está fazendo?"}%
I guess the issue is due to reusing the model or the tokenizer? Any help is appreciated
The text was updated successfully, but these errors were encountered:
Hi all, I've modified the quantized-t5 example to provide a very basic API to be used by other applications.
The code was done here brauliobo@d579b3b
The issue comes when using the API with with multiple serial calls. It more or less alternates the output into a correct or an empty output. See below:
This output is triggered by the following curl calls:
I guess the issue is due to reusing the model or the tokenizer? Any help is appreciated
The text was updated successfully, but these errors were encountered: