Eval Time Llama cpp vs Ollama. (80token/s. vs 800/token/s) #10495
Unanswered
michaellin99999
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Could be because one is built for vulkan and the other not ? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Why is there such a difference given that Ollama is a wrapper around Llamacpp. With the context legnth both set at 2048, I am able to run Ollama with eval time (prompt eval ) of 800ish tokens/s whereas LLamacpp only 80 per second.
Ollama is running on CPU (AMD Ryzen AI 9). and Llama cpp is running on iGPU. both on windows. is there something I'm missing?
Beta Was this translation helpful? Give feedback.
All reactions