Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong parameter configuration #307

Open
Zeyu-W opened this issue Nov 12, 2024 · 0 comments
Open

wrong parameter configuration #307

Zeyu-W opened this issue Nov 12, 2024 · 0 comments

Comments

@Zeyu-W
Copy link

Zeyu-W commented Nov 12, 2024

The line 376 in the file "cudaTensorCoreGemm.cu" :
"float *tile_ptr = shmem_warp_tile_ptr + i * SHMEM_STRIDE * K + j * N;"
should be modified to "float *tile_ptr = shmem_warp_tile_ptr + i * SHMEM_STRIDE * M + j * N;"
This is applied to the tf32 and double precision.
For the result matrix, there is nothing about the K dimension when streaming from the fragment to the shared memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant