Eval Time Llama cpp vs Ollama. (80token/s. vs 800/token/s) #10495

michaellin99999 · 2024-11-25T15:22:01Z

michaellin99999
Nov 25, 2024

Why is there such a difference given that Ollama is a wrapper around Llamacpp. With the context legnth both set at 2048, I am able to run Ollama with eval time (prompt eval ) of 800ish tokens/s whereas LLamacpp only 80 per second.
Ollama is running on CPU (AMD Ryzen AI 9). and Llama cpp is running on iGPU. both on windows. is there something I'm missing?

ExtReMLapin · 2024-11-25T16:11:26Z

ExtReMLapin
Nov 25, 2024

Could be because one is built for vulkan and the other not ?

2 replies

michaellin99999 Nov 25, 2024
Author

im able to build llamacpp with vulkan and frfom what i see vulkan is supported. Ollama i have to hack it in. yet it is faster?
These are the settings i have for both:

Batch size 128, Context length 2048 Offload to iGPU 28 layers KVcache is fp16.

michaellin99999 Nov 25, 2024
Author

are there any settings that I am missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval Time Llama cpp vs Ollama. (80token/s. vs 800/token/s) #10495

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Eval Time Llama cpp vs Ollama. (80token/s. vs 800/token/s) #10495

michaellin99999 Nov 25, 2024

Replies: 1 comment · 2 replies

ExtReMLapin Nov 25, 2024

michaellin99999 Nov 25, 2024 Author

michaellin99999 Nov 25, 2024 Author

michaellin99999
Nov 25, 2024

Replies: 1 comment 2 replies

ExtReMLapin
Nov 25, 2024

michaellin99999 Nov 25, 2024
Author

michaellin99999 Nov 25, 2024
Author