Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing tests for izumi_nag with mpi-serial in cesm3_0_alpha02a #2654

Open
ekluzek opened this issue Jul 17, 2024 · 3 comments
Open

Failing tests for izumi_nag with mpi-serial in cesm3_0_alpha02a #2654

ekluzek opened this issue Jul 17, 2024 · 3 comments
Assignees
Labels
bug something is working incorrectly done Issues whose closing PR is done but not yet merged (pending test re-run ok) investigation Needs to be verified and more investigation into what's going on. priority: low Background task that doesn't need to be done right away.

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jul 17, 2024

Brief summary of bug

@fischer-ncar found the following two tests failing in cesm3_0_alpha02a (which uses ctsm5.2.009 with ccs_config_cesm1.0.0 and cime6.1.0)

ERS_D_Ld5_Mmpi-serial.1x1_vancouverCAN.I1PtClm50SpRs.izumi_nag.clm-CLM1PTStartDate
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA

General bug information

CTSM version you are using: ctsm5.2.009

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: mpi-serial with nag on Izumi

Details of bug

cesm3_0_alpha02a uses ctsm5.2.009 with:
ccs_config_cesm1.0.0
MPIserial_2.5.0
cime6.1.0

Note, these two tests pass in the ctsm5.2.009 baselines and tests since then

I also replicated the result that Chris saw in cesm3_0_alpha02a

Important output or errors that show the problem

mpi-serial.bld.log:

gmake --output-sync -f /scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/Tools/Makefile  -C /scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial  CIME_MODEL=cesm  SMP=FALSE CASEROOT="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h" CASETOOLS="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/Tools" CIMEROOT="/scratch/cluster/erik/cesm3_0_alpha02a/cime" SRCROOT="/scratch/cluster/erik/cesm3_0_alpha02a" COMP_INTERFACE="nuopc" COMPILER="nag" DEBUG="TRUE" EXEROOT="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/bld" RUNDIR="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/run" INCROOT="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/bld/lib/include" LIBROOT="/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/bld/lib" MACH="izumi" MPILIB="mpi-serial" NINST_VALUE="c1a1l1" OS="LINUX" PIO_VERSION=2 SHAREDLIBROOT="/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h" BUILD_THREADED="FALSE" USE_ESMF_LIB="TRUE" USE_MOAB="FALSE" COMP_ATM="datm" COMP_ICE="sice" COMP_GLC="sglc" COMP_LND="clm" COMP_OCN="socn" COMP_ROF="srof" COMP_WAV="swav" USE_TRILINOS="FALSE" USE_ALBANY="FALSE" USE_PETSC="FALSE"  COMP_NAME=mpi-serial /scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial/Makefile.conf
gmake: Entering directory '/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial'
SHAREDLIBROOT |/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h| SHAREDPATH |nag/mpi-serial/debug/nothreads|
/scratch/cluster/erik/SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA.20240717_091501_ir4w0h/Tools/Makefile:635: recipe for target '/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial/Makefile.conf' failed
gmake: Leaving directory '/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial'
/bin/sh: line 1: /scratch/cluster/erik/cesm3_0_alpha02a/libraries/mpi-serial/configure: No such file or directory
gmake: *** [/scratch/cluster/erik/sharedlibroot.20240717_091501_ir4w0h/nag/mpi-serial/debug/nothreads/mpi-serial/Makefile.conf] Error 127
ERROR: /bin/sh: line 1: /scratch/cluster/erik/cesm3_0_alpha02a/libraries/mpi-serial/configure: No such file or directory
@ekluzek ekluzek added investigation Needs to be verified and more investigation into what's going on. bug something is working incorrectly labels Jul 17, 2024
@ekluzek ekluzek added this to the cesm3_0_beta02 milestone Jul 17, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 17, 2024

#2545 has mpi-serial izumi_nag tests passing with the same ccs_config version and cime6.0.246

But, it's using mpi-serial from under mct MCT_2.11.0 and the contents are different between the two. But, the MPIserial_2.5.0 tag is also reasonably aged hailing from last December.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 5, 2024

I had forgotten about this but ran into in #2853

@jedwards4b figured out the fix in mpi-serial here:

ESMCI/mpi-serial#30

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 8, 2024

Jim found that another fix to ccs_config is required:

diff --git a/machines/izumi/izumi.cmake b/machines/izumi/izumi.cmake
index 8894b6a..386fd48 100644
--- a/machines/izumi/izumi.cmake
+++ b/machines/izumi/izumi.cmake
@@ -6,4 +6,4 @@ if (MPILIB STREQUAL mvapich2)
   set(MPI_LIB_NAME "mpich")
 endif()
 set(NETCDF_PATH "$ENV{NETCDF_PATH}")
-string(APPEND SLIBS " -L${NETCDF_PATH}/lib -lnetcdff -lnetcdf")
+string(APPEND SLIBS " -L${NETCDF_PATH}/lib -lnetcdff -lnetcdf -Wl,-Wl,,-rpath,$(NETCDF_PATH)/lib")

I'm having trouble testing it, as there are some issues with the autoconf version on Izumi.

@ekluzek ekluzek added the done Issues whose closing PR is done but not yet merged (pending test re-run ok) label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly done Issues whose closing PR is done but not yet merged (pending test re-run ok) investigation Needs to be verified and more investigation into what's going on. priority: low Background task that doesn't need to be done right away.
Projects
Status: Done
Development

No branches or pull requests

1 participant