-
Notifications
You must be signed in to change notification settings - Fork 864
WeeklyTelcon_20230214
Geoffrey Paulsen edited this page Feb 28, 2023
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Attendance not captured.
- Geoffrey Paulsen (IBM)
- Jeff Squyres (Cisco)
- Austen Lauria (IBM)
- Brendan Cunningham (Cornelis Networks)
- Brian Barrett (Amazon)
- Edgar Gabriel (AMD)
- Josh Fisher (Cornelis Networks)
- Josh Hursey (IBM)
- Luke Robison (Amazon)
- Matthew Dosanjh (Sandia)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tomislav Janjusic (nVidia)
- William Zhang (AWS)
- Howard Pritchard (LANL)
- Joseph Schuchart (UTK)
- David Bernholdt
-
Mellanox CI is failing. May be similar to a configure Edgar is seeing an issue where PRRTE is trying to build external, but there's none installed on the machine.
- Sometimes this happens if there's one in the prefix for OMPI.
- Edgar will debug a bit on his side, and Tommy will
-
New - 32bit (64bit came out 20 years ago)
- Debbian noticed that Open MPI fails to build on 32bit build in configure.
- This breaks a bunch of other packages that can't be built.
- But are there real users? Or just inertia?
- Looks like Inertia, but for example Boost library could just turn off MPI support for 32bit builds.
- They're sticking with Open MPI v4.1.x for immediate need.
- Lets check back in a week on estimate for 32bit scoping.
- We do have 32bit testing that's turned off. So if we decide to test it's easy to reenable.
- Debbian noticed that Open MPI fails to build on 32bit build in configure.
-
Issue #11347 Versioning is wrong in v5.0.x
- We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
- Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
- Did we figure out the Fortran ABI break?
- Memory: Yes we did break Fortran ABI.
- Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
- Two other things that may or maynot break ABI.
- Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
- Unsure if this affects ABI.
- ABI mostly just care about C and mpif.h
- Fortran library has different .so versioning.
- Blocker for next v5.0.0rc - get library versioning correct.
- When we talk about ABI - Fortran will be nuanced.
- We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
-
Comm Size Issue. Issue #11373.
- Edgar created a PR to fix Comm Size to be same as v4.1.x to maintain backward compatibility for v5.0.0 from v4.1.x built apps.
-
Austen said he'd try to find time to run the
- Some GNU ABI checker tool might help.
- Need to pull in a PMIx v3.1.
- Fix cuda issue, due to a bad cherry-pick from earlier.
- Reworking a PR, in progress.
- Made a minor change for another rc. Trying to get rc built.
-
Romio issue not
-
RC from last week, got pushed to this week.
- Still waiting on https://github.com/open-mpi/ompi/issues/11354
- may be enable dso option?
- Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
- Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
- Another variable if CUDA was initialized.
- Should also be on
main
(comment saying otherwise
- Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
- Howard said after the call that this isn't a blocker for rc10
-
Howard has had an issue using external compilers with the accelerator
- Issue #11354
-
Cuda Framework #11354 - Howard is working on it.
- SM-Cuda if you disable building it, the problem doesn't occur.
- --enable-so don't see this.
- Dont see if app initializes cuda before MPI_Init (maybe)
- Takes a number of factors to see this.
- If application is actually using CUDA - then everything works.
- Problem is that app doesn't use CUDA, but sm-cuda then initializes (even though application doesn't need cuda)
- Calls into framework, to
- At Finalize makes calls into the accelerator, it gets cuda runtime errors.
- We think want SM-CUDA if running on a single node.
- Was it just the IPC or also something else? Believe it was IPC stuff.
- No IPC support to Accelerator framework. Just direct dependency on cuda.
- For collective cuda - never directly uses Cuda buffers, just checks and then memcopies into host.
- All of this does use accelerator framework.
- These three components added a direct CUDA dependency because they call CUDA directly, instead of calling through framework.
- BTL-sM-cuda
- Rcache-somethign-sm
- Rcache-gpu-sm
-
ROMIO isn't included in packaging properly.
- Issue #11364 Austen is taking a look. Might have missed something.
-
Waiting on PMIx and PRRTE submodule update.
- Ralph pestered us to please merge it. - just merged on
main
. - Merged, will make rc10
- Ralph pestered us to please merge it. - just merged on
-
Need documentation for v5.0.0
-
Manpages need an audit before release.
- Double check
--prefix
behavior - Not the same behavior as v4.1.x
- Double check
-
What is status of HAN?
- Priority bump of HAN PR #11362 to main, need one to v5.0.x
- Joseph pushed a bunch of data, but not on the call. Go read this.
- Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
- Comparing HAN with shared Mem component.
- How many ppr? Between 2ppr and 64ppr
- Better numbers, would be good to document this.
- In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
- We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
- Like to include instructions on how to reproduce as well for users.
- document in ECP -
- Our current resolution is to enable it as is, and fix current regressions in future releases.
- What else is needed to enable it by default?
- Just need to flip a switch.
- The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
- And it relies on xpmem to be available.
- So for now just enable HAN for collectives we have, and later enable for other collectives.
- George would like to re-use what tuned does, without reimplementing everything, but a shared memory component is a better choice, but with more work.
- If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
- The trade offs lean toward turning it on and fixing whatever problems might be there.
- There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
- Need to start moving forward, rather than doing more analysis.