We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks for the great work!
https://github.com/cuda-mode/ring-attention/blob/main/ring-llama/test.ipynb
I can load the model with LlamaRingFlashAttention and move to the device but I've seen
LlamaRingFlashAttention
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
when I run y = model.generate
y = model.generate
What did I miss? Thanks in advance!
The text was updated successfully, but these errors were encountered:
Also I was seeing
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 32 but got size 4 for tensor number 1 in the list.
with another model llama based model "01-ai/Yi-6B-200K"with LlamaRingFlashAttention it was fine with LlamaFlashAttention2
"01-ai/Yi-6B-200K"
LlamaFlashAttention2
Sorry, something went wrong.
No branches or pull requests
Thanks for the great work!
https://github.com/cuda-mode/ring-attention/blob/main/ring-llama/test.ipynb
I can load the model with
LlamaRingFlashAttention
and move to the device but I've seenwhen I run
y = model.generate
What did I miss? Thanks in advance!
The text was updated successfully, but these errors were encountered: