Zoltan ERROR #1387
-
My latest attempt at running Prototype-P8 failed with the following: [349] Zoltan ERROR in Zoltan_RB_Box_Assign (line 97 of /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/box_assign.c): No RCB tree saved; Must set parameter KEEP_CUTS to 1. @rsdunlapiv @climbfuji - Since this is in the spack-stack ESMF module I was hoping one of you might have some insight into what happened here and where this parameter KEEP_CUTS might be set. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 9 replies
-
@benjamin-cash Can you please send the ESMF PET log files and/or stack trace for the error? Also, please send the hash of the ufs-weather-model. |
Beta Was this translation helpful? Give feedback.
-
Yes, that is the correct way.
…On 8/26/22 11:18, benjamin-cash wrote:
Is it just setting logKindFlag: ESMF_LOGKIND_MULTI in nems.configure?
—
Reply to this email directly, view it on GitHub
<https://github.com/orgs/ufs-community/discussions/1387#discussioncomment-3485997>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABC2OMDZUWJX6K6TDETEC5DV3EC6PANCNFSM57XBLB3A>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Well, it isn't exactly that same error, but it did fail with a message about Zoltan again. Oddly enough there is nothing in the PET logs that I have been able to find so far. Zoltan_Realloc (from /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/shared.c,87) No space on proc 406 - number of bytes requested = 589832 I am running the Prototype-P8 tag of the weather model. |
Beta Was this translation helpful? Give feedback.
-
I wouldn’t expect it to be caught and written to the logs. This part was written before the mesh was integrated into the ESMF error logging system and it looks like the person who wrote it didn’t even connect the Zoltan error returns into the old mesh error handling (maybe because Zoltan does it’s own error output?). We should definitely check the return codes and connect them to ESMF log error.
… On Aug 26, 2022, at 2:03 PM, Rocky Dunlap ***@***.***> wrote:
The Zoltan lib inside ESMF has its own internal error handling, and that error handler writes to stdout and then returns an error code. It might be that the error code is not caught within ESMF and propagated to the ESMF PET logs. @oehmke <https://github.com/oehmke> should be able to say whether he expects the error to be caught at the ESMF level and written to the log. Ideally it would be.
—
Reply to this email directly, view it on GitHub <https://github.com/orgs/ufs-community/discussions/1387#discussioncomment-3486536>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE6A7U35JPQ776FBKCK4W2TV3EPKDANCNFSM57XBLB3A>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
-
Error vanished upon switching to intel compiler, so I will not pursue this further and am marking it answered. |
Beta Was this translation helpful? Give feedback.
-
Is this a particularly large grid or mesh that you’re regridding? We usually don’t see errors like this from Zoltan.
- Bob
… On Aug 26, 2022, at 12:57 PM, benjamin-cash ***@***.***> wrote:
Well, it isn't exactly that same error, but it did fail with a message about Zoltan again. Oddly enough there is nothing in the PET logs that I have been able to find so far.
Zoltan_Realloc (from /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/shared.c,87) No space on proc 406 - number of bytes requested = 589832
[406] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/shared.c): Insufficient memory.
[406] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c): Error returned from Zoltan_RB_Build_Structure.
[406] Zoltan ERROR in rcb_fn (line 440 of /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/rcb.c): Error returned from Zoltan_RCB_Build_Structure.
[406] Zoltan ERROR in Zoltan_LB (line 388 of /work2/02441/bcash/frontera/spack-stack/cache/build_stage/spack-stage-esmf-8.3.0b09-ggdjsv3qppwk5n2bedpyvhrznl2evmfm/spack-src/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c): Partitioning routine returned code -2.
I am running the Prototype-P8 tag of the weather model.
—
Reply to this email directly, view it on GitHub <https://github.com/orgs/ufs-community/discussions/1387#discussioncomment-3486241>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE6A7UZ7ZOYUZYAM5PHXVD3V3EHQPANCNFSM57XBLB3A>.
You are receiving this because you were mentioned.
|
Beta Was this translation helpful? Give feedback.
Error vanished upon switching to intel compiler, so I will not pursue this further and am marking it answered.