-
Notifications
You must be signed in to change notification settings - Fork 5
Timing Tables
The timing tables below show processor layouts and sample timings from runs of the coupled FV3GFS-MOM6-CICE5 system. Note that the system is largely unoptimized and is not load balanced and threading has not been enabled, so these numbers are only initial baselines that will be improved during the optimization phase.
In the tables below, layout is the domain decomposition of each tile of the FV3 cubed sphere grid. For example, a layout of 8,12 means to decompose each of the 6 tiles into 8x12 chunks. Total FV3 forecast tasks in this case would be 8x12x6=576. Additional tasks are provided to FV3 for the asynchronous write component and these must be included in the total PEs assigned to the atmosphere.
Used software environment: intel/19.0.2, mpt/2.19, netcdf-mpi/4.6.1, pnetcdf/1.11.0, optimized version of ESMF 8.0.0 Beta Snapshot 32 (compiled with intel/19.0.2 and mpt/2.19).
Run ID | REF | A50 | A100 | A100_O50 | A100_WR50 | |
---|---|---|---|---|---|---|
Description | baseline | 50% increase in ATM PET | 100% increase in ATM PET | 100% increase in ATM PET and 50% increase in OCN PET | 50% increase in write tasks | |
PE Layout | NTASKS in ATM/OCN/ICE order | 336 360 360 | 480 360 360 | 624 360 360 | 624 540 360 | 648 360 360 |
ROOT PE | ROOTPE in ATM/OCN/ICE order | 0 336 696 | 0 480 840 | 0 624 984 | 0 624 1164 | 0 648 1008 |
# Nodes | count | 30 | 34 | 38 | 43 | 38 |
Layout | 6,8 | 6,12 | 8,12 | 8,12 | 8,12 | |
Write Tasks | 48 | 48 | 48 | 48 | 72 | |
Cost | pe-hours/simulated_years | 134882.51 | 125445.73 | 126210.27 | 139095.31 | 125343.2 |
Throughput | simulated_years/day | 0.19 | 0.23 | 0.26 | 0.27 | 0.26 |
Init | s | 112.092 | 105.705 | 113.107 | 115.48 | 117.376 |
Run | s | 1231.804 | 1010.844 | 909.952 | 886.24 | 903.7 |
Finalize | s | 85.181 | 87.868 | 87.472 | 87.76 | 48.389 |
CPL | s | 55.804 | 60.557 | 57.507 | 60.761 | 61.33 |
ATM | s | 802.943 | 565.16 | 466.63 | 468.548 | 463.049 |
ICE | s | 92.018 | 103.581 | 106.74 | 118.715 | 105.496 |
OCN | s | 251.659 | 253.444 | 252.385 | 216.019 | 253.800 |
Relative Speed-up against REF | (only run phase) | - | 0.18 | 0.26 | 0.28 | 0.27 |
Detailed information about the NCAR's Cheyenne supercomputing system can be found in here.
Used software environment: intel/18.0.2, impi/18.0.2, netcdf/4.6.2 optimized version of ESMF 8.0.0 Beta Snapshot 38 (compiled with intel/18.0.2 and impi/18.0.2).
Run ID | REF | A50 | A100 | A100_O50 | A100_WR50 | |
---|---|---|---|---|---|---|
Description | baseline | 50% increase in ATM PET | 100% increase in ATM PET | 100% increase in ATM PET and 50% increase in OCN PET | 50% increase in write tasks | |
PE Layout | NTASKS in ATM/OCN/ICE order | 336 360 360 | 480 360 360 | 624 360 360 | 624 540 360 | 648 360 360 |
ROOT PE | ROOTPE in ATM/OCN/ICE order | 0 336 696 | 0 480 840 | 0 624 984 | 0 624 1164 | 0 648 1008 |
# Nodes | count | 22 | 25 | 28 | 32 | 29 |
Layout | 6,8 | 6,12 | 8,12 | 8,12 | 8,12 | |
Write Tasks | 48 | 48 | 48 | 48 | 72 | |
Cost | pe-hours/simulated_years | 123128.03 | 112318.32 | 112561.58 | 116570.2 | 112596.74 |
Throughput | simulated_years/day | 0.21 | 0.26 | 0.29 | 0.32 | 0.30 |
Init | s | 102.997 | 102.909 | 99.95 | 104.418 | 100.769 |
Run | s | 1150.013 | 923.164 | 826.039 | 748.524 | 797.804 |
Finalize | s | 54.516 | 54.366 | 53.989 | 55.755 | 54.221 |
CPL | s | 104.495 | 20.503 | 64.245 | 54.457 | 102.791 |
ATM | s | 747.12 | 521.465 | 402.536 | 403.285 | 408.365 |
ICE | s | 97.894 | 100.396 | 99.376 | 83.09 | 85.31 |
OCN | s | 182.298 | 181.046 | 182.321 | 140.239 | 182.432 |
Relative Speedup against REF | (only run phase) | - | 0.20 | 0.28 | 0.35 | 0.31 |
Detailed information about the TACC's Stampede2 supercomputing system can be found in here.
The test case that is used for the benchmark simulations is the 1-day long run with January 2012 initial conditions. The reference simulation can be created using following commands:
./create_newcase --compset UFS_S2S --res C384_t025 --case ufs.s2s.c384_t025.jan --driver nuopc --run-unsupported
cd ufs.s2s.c384_t025.jan/
./case.setup
./xmlchange DOUT_S=FALSE
./xmlchange STOP_N=1
./xmlchange RUN_REFDATE=2012-01-01
./xmlchange RUN_STARTDATE=2012-01-01
./xmlchange JOB_WALLCLOCK_TIME=00:30:00
qcmd -- ./case.build
# Edit user_nl_cice and add following line to use correct ice initial condition.
ice_ic = "$ENV{UGCSINPUTPATH}/cice5_model.res_2012010100.nc"
./case.submit
For example, to create test with different PE Layout configuration (A100_WR50 run in the benchmark tables):
# The following commands doubles number of used PETs for the ATM component
./xmlchange NTASKS_CPL=648
./xmlchange NTASKS_ATM=648
./xmlchange NTASKS_OCN=360
./xmlchange NTASKS_ICE=360
./xmlchange ROOTPE_CPL=0
./xmlchange ROOTPE_ATM=0
./xmlchange ROOTPE_OCN=648
./xmlchange ROOTPE_ICE=1008
# Add following line to user_nl_fv3gfs (in the case directory) to increase used IO tasks %50
write_tasks_per_group = 72
The CIME (Common Infrastructure for Modeling the Earth) calculates and modifies the FV3GFS layout namelist parameter (input.nml) automatically.