wrong parameter configuration #307

Zeyu-W · 2024-11-12T10:52:00Z

The line 376 in the file "cudaTensorCoreGemm.cu" :
"float *tile_ptr = shmem_warp_tile_ptr + i * SHMEM_STRIDE * K + j * N;"
should be modified to "float *tile_ptr = shmem_warp_tile_ptr + i * SHMEM_STRIDE * M + j * N;"
This is applied to the tf32 and double precision.
For the result matrix, there is nothing about the K dimension when streaming from the fragment to the shared memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong parameter configuration #307

wrong parameter configuration #307

Zeyu-W commented Nov 12, 2024

wrong parameter configuration #307

wrong parameter configuration #307

Comments

Zeyu-W commented Nov 12, 2024