Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: flash-attn can't use #10378

Open
Tangzhongyi834 opened this issue Nov 18, 2024 · 2 comments
Open

Bug: flash-attn can't use #10378

Tangzhongyi834 opened this issue Nov 18, 2024 · 2 comments
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

Comments

@Tangzhongyi834
Copy link

What happened?

I want to quantize KV cache in the form of q8_0, but the following error occurs:

llama_new_context_with_model: V cache quantization requires flash_attn
common_init_from_params: failed to create context with model '/home/albert/work/code/models/chatglm4-9B.guff'
main: error: unable to load model

After installing the flash-attn package, this error still occurs

How to deal this problem?

Name and Version

command : ./llama-cli -m ~/work/code/models/chatglm4-9B.guff -b 1024 -ctk q8_0 -ctv q8_0 -ngl 256 -p 给我讲个笑话吧
torch version:2.5.1
cuda version:12.4
flash-attn version:2.7.0.post2

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@Tangzhongyi834 Tangzhongyi834 added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Nov 18, 2024
@wooooyeahhhh
Copy link

llama.cpp doesn't use python. Try the -fa argument to enable flash attention

@Tangzhongyi834
Copy link
Author

llama.cpp doesn't use python. Try the -fa argument to enable flash attention

Got it ,thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Projects
None yet
Development

No branches or pull requests

2 participants