Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update submodules to ones that are based on cesm3_0_alpha04a #2853

Merged
merged 33 commits into from
Nov 21, 2024

Conversation

ekluzek
Copy link
Collaborator

@ekluzek ekluzek commented Oct 30, 2024

Description of changes

Update the submodules to something close to cesm3_0_alpha04a. I needed to update cime and ccs_config beyond to get the PF Unit testing working.

Specific notes

Remove mct from submodules
Add mpi-serial to submodules
Update the PF unit testing to use the full ESMF library (which will enable wider testing), this also required bringing in NetCDF and PIO libraries which we shouldn't actively use but may allow us to do fewer stub modules for I/O.

Contributors other than yourself, if any: @jedwards4b

CTSM Issues Fixed (include github issue #):
Fixes #2640
Fixes #2375
Fixes #2654
Fixes #2871
Finishes resolving #2294

Are answers expected to change (and if so in what way)?
I'm actually not sure yet, I think possibly compsets with active CISM might

Any User Interface Changes (namelist or namelist defaults changes)? No

Does this create a need to change or add documentation? Did you do so? No No

Testing performed, if any: will do regular and ctsm_sci
So far done PF UNIT testing and testing of two simple cases
I haven't tested this for LILIC and I wonder if it will fail

@ekluzek ekluzek added enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations code health improving internal code structure to make easier to maintain (sustainability) labels Oct 30, 2024
@ekluzek ekluzek added this to the cesm3_0_beta05 milestone Oct 30, 2024
@ekluzek ekluzek self-assigned this Oct 30, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 30, 2024

@billsacks and @jedwards4b could you review this for the cmake changes I made for the PF unit testing? I learned more about cmake as a result of getting this to work, but I'd like to have it reviewed by the two of you with more knowledge/skill in using cmake.

Copy link
Contributor

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the lines in CMakeLists.txt that you have commented out, but otherwise LGTM.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this working @ekluzek! A couple of questions here....

src/CMakeLists.txt Outdated Show resolved Hide resolved
src/CMakeLists.txt Show resolved Hide resolved
@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 30, 2024

OK, my suspicion about LILAC was correct, running the LILAC test I get a fail:

    Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac.20241030_152038_a3kydm
    Errors were:
        Building test for LILACSMOKE in directory /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac.20241030_152038_a3kydm
        Traceback (most recent call last):
          File "/glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac.20241030_152038_a3kydm/./case.build", line 267, in <module>
            _main_func(__doc__)
          File "/glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac.20241030_152038_a3kydm/./case.build", line 226, in _main_func
            test = find_system_test(testname, case)(case)
          File "/glade/work/erik/ctsm_worktrees/external_updates/cime/CIME/utils.py", line 2272, in find_system_test
            mod = import_module(path)
          File "/glade/u/apps/derecho/23.09/opt/._view/yazo4iwystz7p2hxu5ukzrw3xa24ksen/lib/python3.10/importlib/__init__.py", line 126, in import_module
            return _bootstrap._gcd_import(name[level:], package, level)
          File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
          File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
          File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
          File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
          File "<frozen importlib._bootstrap_external>", line 883, in exec_module
          File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
          File "/glade/work/erik/ctsm_worktrees/external_updates/cime_config/SystemTests/lilacsmoke.py", line 28, in <module>
            from CIME.utils import run_cmd, run_cmd_no_fail, symlink_force, new_lid, safe_copy, append_testlog
        ImportError: cannot import name 'append_testlog' from 'CIME.utils' (/glade/work/erik/ctsm_worktrees/external_updates/cime/CIME/utils.py)

Waiting for tests to finish
FAIL LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac (phase SHAREDLIB_BUILD)
    Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac.20241030_152038_a3kydm
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 7.439677476882935 seconds

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 31, 2024

I've fixed the LILAC problem between a minor update, and something I need to add to cime:

ESMCI/cime#4703

Now, I'm wondering what will happen with run_neon?

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanations to my questions, @ekluzek ! I'm satisfied with this now.

@jedwards4b
Copy link
Contributor

Hi Erik,

I found that the configure script was not included in the mpi-serial distribution, I have added it in ESMCI/mpi-serial#30. I think that this will allow you to update to the latest tags.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 5, 2024

Excellent, thanks @jedwards4b!

Yeah, that is one of the things I saw, I thought it might be generated as part of the build process. But, obviously not. Thanks for figuring that out. I"ll try with that branch and make sure it works.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 5, 2024

I'm rerunning all of the tests that failed with the new branch. But, a few of them have built and pass now with the mpi-serial update. So it's looking good so far.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 6, 2024

The mpi-serial update gets more mpi-serial tests to pass the following 17:

ERS_D_Ld7_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropRs.izumi_intel.clm-decStart1851_noinitial
ERS_Lm20_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-cropMonthlyNoinitial
ERS_Lm40_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_gnu.clm-cropMonthlyNoinitial
ERS_Lm54_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_intel.clm-cropMonthlyNoinitial
ERS_Ly20_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_intel.clm-cropMonthlyNoinitial
ERS_Ly20_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_intel.clm-cropMonthlyNoinitial--clm-matrixcnOn
ERS_Ly3_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianRs.izumi_gnu.clm-cropMonthOutput
ERS_Ly5_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-ciso_monthly
ERS_Ly5_Mmpi-serial.1x1_smallvilleIA.I1850Clm50BgcCrop.izumi_gnu.clm-ciso_monthly--clm-matrixcnOn
ERS_Ly6_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianRs.izumi_intel.clm-cropMonthOutput
ERS_Ly6_Mmpi-serial.1x1_smallvilleIA.IHistClm50BgcCropQianRs.izumi_intel.clm-cropMonthOutput--clm-matrixcnOn_ignore_warnings
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsRLA
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_gnu.clm-ptsROA
SMS_D_Ly6_Mmpi-serial.1x1_smallvilleIA.IHistClm45BgcCropQianRs.izumi_intel.clm-cropMonthOutput
SMS_Ld5_Mmpi-serial.1x1_brazil.IHistClm60Bgc.izumi_gnu.clm-mimics
SMS_Ly3_Mmpi-serial.1x1_numaIA.I2000Clm50BgcDvCropQianRs.izumi_gnu.clm-ignor_warn_cropMonthOutputColdStart
SMS_Ly5_Mmpi-serial.1x1_brazil.IHistClm50BgcQianRs.izumi_intel.clm-newton_krylov_spinup

While the following 8 are failing:

ERS_D_Ld5_Mmpi-serial.1x1_vancouverCAN.I1PtClm50SpRs.izumi_nag.clm-CLM1PTStartDate (NLCOMP RUN)
ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.izumi_nag.clm-FatesCold (NLCOMP RUN)
ERS_Lm13.f10_f10_mg37.I1850Clm60Bgc.izumi_intel.clm-monthly_matrixcn_fast_spinup (NLCOMP RUN)
SMS_D_Ld1_Mmpi-serial.f45_f45_mg37.I2000Clm50SpRs.izumi_nag.clm-ptsRLA (NLCOMP RUN)
SMS_D_Mmpi-serial_Ld5.5x5_amazon.I2000Clm60FatesRs.izumi_nag.clm-FatesCold (NLCOMP RUN)
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60SpRs.izumi_nag.clm-default--clm-NEON-TOOL (NLCOMP RUN)
SMS_Ly5_Mmpi-serial.1x1_smallvilleIA.IHistClm60BgcCropQianRs.izumi_gnu.clm-gregorian_cropMonthOutput (NLCOMP MODEL_BUILD)
SSPMATRIXCN_Ly5_Mmpi-serial.1x1_numaIA.I2000Clm50BgcCropQianRs.izumi_intel.clm-ciso_monthly (SHAREDLIB_BUILD NLCOMP SUBMIT)

There's both DEBUG and production tests in both lists, as well as iintel and gnu compilers in both as well as both combinations of compiler and DEBUG or production for both of those as well. The nag compiler fails consistently though, and we only do nag with DEBUG on.

The fails have to do with using the shared NetCDF library as so...

/scratch/cluster/erik/tests_ctsm539erikb4bacl/SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60SpRs.izumi_nag.clm-default--clm-NEON-TOOL.GC.ctsm539erikb4bacl_nag/bld/cesm.exe: error while loading shared libraries: libnetcdf.so.13: cannot open shared object file: No such file or directory

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 8, 2024

Jim provided an update to mpi-serial that fixes some of this. We also found that Izumi has some nodes with different versions of autoconf so I've opened a ticket to straighten that out. There's also a change to ccs_config that needs to come in as well.

I put in 36 year tests rather than 3 years.
One ERP test I made for 762 days rather than 3 years to shorten it.
762 allows even months for one year being a leap year.
@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 20, 2024

OK, good news. The aux_clm testing passes as expected on both Izumi and Derecho, and the ctsm_sci test on Derecho similarily.

…th days so that they won't be on an exact year/month boundary
@ekluzek ekluzek merged commit b8a59c3 into ESCOMP:cesm3_0_beta04_changes Nov 21, 2024
2 checks passed
@ekluzek ekluzek deleted the cesm30b04submodules branch November 21, 2024 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code health improving internal code structure to make easier to maintain (sustainability) enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations
Projects
Status: Done
3 participants