forked from LLNL/SW4CK
-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL.txt
103 lines (75 loc) · 3.67 KB
/
INSTALL.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
SW4CK : SW4 Curvilinear Kernels
CUDA Build & Run:
with nvcc and recent g++ in the path, do:
make -f Makefile.cuda
lrun -T1 ./sw4ck sw4ck.in ( On LC machines )
HIP Build:
with hipcc in the path and rocm>=3.9.0
make -f Makefile.hip
srun -n1 ./sw4ck sw4ck.in
Building using Cmake ( Experimental)
CUDA:
mkdir build && cd build
cmake -C ../host-configs/lassen.cmake -DENABLE_CUDA=On ../
lrun -T1 bin/sw4ck ../src/sw4ck.in
CUDA + RAJA:
mkdir build && cd build
cmake -C ../host-configs/lassen.cmake -DENABLE_CUDA=On -DENABLE_RAJA=On -DCUDA_ARCH=sm_70 ../
lrun -T1 bin/sw4ck ../src/sw4ck.in
HIP:
mkdir build && cd build
cmake -C ../host-configs/corona.cmake -DENABLE_HIP=On ../
make -j 4
srun -n 1 -p mi60 bin/sw4ck ../src/sw4ck.in
HIP + RAJA :
mkdir build && cd build
cmake -C ../host-configs/corona.cmake -DENABLE_HIP=On -DENABLE_RAJA=On ../
make -j 4
srun -n 1 -p mi60 bin/sw4ck ../src/sw4ck.in
***** BUILDING ON RZWHAMO ***************
CUDA:
mkdir build && cd build
cmake -C ../host-configs/whamo_cuda.cmake -DENABLE_CUDA=On ../
make -j 4
srun -n 1 -p nvidia bin/sw4ck ../src/sw4ck.in
CUDA + RAJA :
mkdir build && cd build
cmake -C ../host-configs/whamo_cuda.cmake -DENABLE_CUDA=On -DENABLE_RAJA=On -DCUDA_ARCH=sm_70 ../
make -j 4
srun -n 1 -p nvidia bin/sw4ck ../src/sw4ck.in
HIP:
mkdir build && cd build
cmake -C ../host-configs/whamo_hip.cmake -DENABLE_HIP=On ../
make -j 4
srun -n 1 -p amd bin/sw4ck ../src/sw4ck.in
HIP + RAJA :
mkdir build && cd build
cmake -C ../host-configs/whamo_hip.cmake -DENABLE_HIP=On -DENABLE_RAJA=On ../
make -j 4
srun -n 1 -p amd bin/sw4ck ../src/sw4ck.in
****** MEASURING REGISTER USAGE AND SPILLAGE ****************************8
CUDA + RAJA :
mkdir build && cd build
cmake -C ../host-configs/whamo_cuda.cmake -DENABLE_CUDA=On -DENABLE_RAJA=On -DCUDA_ARCH=sm_70 -DENABLE_SPILL_WARNS=On ..
make
Sample output during build:
ptxas info : Used 254 registers, 616 bytes cmem[0]
ptxas info : Compiling entry function '_ZN4RAJA8internal23CudaKernelLauncherFixedILm256ENS0_8LoopDataIN4camp5tupleIJNS_4SpanINS_9Iterators16numeric_iteratorIllPlEElEESA_SA_EEENS4_IJEEEJZ17curvilinear4sg_ciiiiiiiPdSD_SD_SD_SD_SD_PiSD_SD_SD_SD_SD_SD_SD_icEUliiiE2_EEENS0_25CudaStatementListExecutorISG_NS3_4listIJNS_9statement4TileILl0ENS_10tile_fixedILl4EEENS_19cuda_block_xyz_loopILi2EEEJNSK_ILl1ESM_NSN_ILi1EEEJNSK_ILl2ENSL_ILl16EEENSN_ILi0EEEJNSJ_3ForILl0ENS_22cuda_thread_xyz_directILi2EEEJNSS_ILl1ENST_ILi1EEEJNSS_ILl2ENST_ILi0EEEJNSJ_6LambdaILl0EJEEEEEEEEEEEEEEEEEEEEEEEENS0_9LoopTypesINSI_IJvvvEEES17_EEEEEEvT0_' for 'sm_70'
ptxas info : Function properties for _ZN4RAJA8internal23CudaKernelLauncherFixedILm256ENS0_8LoopDataIN4camp5tupleIJNS_4SpanINS_9Iterators16numeric_iteratorIllPlEElEESA_SA_EEENS4_IJEEEJZ17curvilinear4sg_ciiiiiiiPdSD_SD_SD_SD_SD_PiSD_SD_SD_SD_SD_SD_SD_icEUliiiE2_EEENS0_25CudaStatementListExecutorISG_NS3_4listIJNS_9statement4TileILl0ENS_10tile_fixedILl4EEENS_19cuda_block_xyz_loopILi2EEEJNSK_ILl1ESM_NSN_ILi1EEEJNSK_ILl2ENSL_ILl16EEENSN_ILi0EEEJNSJ_3ForILl0ENS_22cuda_thread_xyz_directILi2EEEJNSS_ILl1ENST_ILi1EEEJNSS_ILl2ENST_ILi0EEEJNSJ_6LambdaILl0EJEEEEEEEEEEEEEEEEEEEEEEEENS0_9LoopTypesINSI_IJvvvEEES17_EEEEEEvT0_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
HIP+RAJA:
mkdir build && cd build
cmake -C ../host-configs/whamo_hip.cmake -DENABLE_HIP=On -DENABLE_RAJA=On -DENABLE_SPILL_WARNS=On ../
make (Build will fail at link )
grep vgpr_ src/curvilinear4sgc-hip-amdgcn-amd-amdhsa-gfx906.s
Sample output:
.vgpr_count: 256
.vgpr_spill_count: 37
.vgpr_count: 256
.vgpr_spill_count: 648
.vgpr_count: 256
.vgpr_spill_count: 772
.vgpr_count: 256
.vgpr_spill_count: 624
.vgpr_count: 256
.vgpr_spill_count: 42