Skip to content

Latest commit

 

History

History

2-transpose

矩阵转置(Matrix Transpose)

实验环境

  • GPU: NVIDIA GeForce RTX 3080
  • CPU: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz x 40
  • CUDA/NVCC: 11.7
  • OS: Ubuntu 20.04
  • Host Compiler: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    • Compiler Options: -O3 -std=c++17

性能数据

输入数据尺寸:[10240, 1024]

版本 耗时(us) 加速比
transpose_cpu 75764 1
transpose_naive_read_coalesced 480 157.8
transpose_naive_write_coalesced 228 332.3
transpose_tiled 234 323.7
transpose_no_bank_conflict 175 432.8

算法说明

TODO

参考