WeeklyTelcon_20200714

Open MPI Weekly Telecon ---

Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

Jeff Squyres (Cisco)
Artem Polyakov (nVidia/Mellanox)
Aurelien Bouteiller (UTK)
Austen Lauria (IBM)
Barrett, Brian (AWS)
Brendan Cunningham (Intel)
Christoph Niethammer (HLRS)
Edgar Gabriel (UH)
Geoffrey Paulsen (IBM)
George Bosilca (UTK)
Howard Pritchard (LANL)
Joseph Schuchart
Josh Hursey (IBM)
Joshua Ladd (nVidia/Mellanox)
Matthew Dosanjh (Sandia)
Noah Evans (Sandia)
Ralph Castain (Intel)
Naughton III, Thomas (ORNL)
Todd Kordenbrock (Sandia)
Tomislav Janjusic
William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

Akshay Venkatesh (NVIDIA)
Brandon Yates (Intel)
Charles Shereda (LLNL)
David Bernhold (ORNL)
Erik Zeiske
Geoffroy Vallee (ARM)
Harumi Kuno (HPE)
Mark Allen (IBM)
Matias Cabral (Intel)
Michael Heinz (Intel)
Nathan Hjelm (Google)
Scott Breyer (Sandia?)
Shintaro iwasaki
William Zhang (AWS)
Xin Zhao (nVidia/Mellanox)
mohan (AWS)

New

Annual review of OMPI

Jeff will send out

MPI Forum was last week.

Sessions is now in in.
Partition communication voted in.

Thread local storage issue

OpalTSDCreate - takes a thread storage local key that would be tracked locally in opal.
- But when we go to delete, it's not being deleted.
- But want flexibility to destroy on our own or explicitly
- George thinks the mode we have today, since tracking all keys to be released by main thread.
- George thinks Artem's approach is the correct approach.
Would have to change the way that keys are USED, and different components are using it in a different way.
Something similar should be done in different places.
If you do it just for UCX, then others can see how you did it and check for their code.
So we think current PR is good, but it leaves old API and new API.
- But it might be better to remove OLD way and make broken components do SOMETHING to update their code.
- Should be easy for components to add explicit cleanup calls
Master branch only.
Opened a new PUll Request yesterday that addresses the problem as discussed last week.
Tracking of TLS in common code.
- Have a low level thread specific keys (very simple based on thread implementation)
- Tracked key, probably what you want to use if you want to ensure all TLS is accounted and released at destruction of key.
- Tommy chaged all of the places in OMPI where those keys are used. Just use tracked key instead of regular key.
- Changed set_specific and get_specific to just set and get.
- Please review and give suggestions.
Does it even make sense to do TLS in OPAL at all?
- May indicate that we have an abstraction wrong somewhere.
- If MPI depends on this in OPAL, then it depends on them in PMIx and other layers?
- Not sure if there is a problem, but at a high level, sounds problematic.
Baking in pthread assumptions in general is not a good idea.
- That's what this PR does is abstract pthread semantics.
May be some confusion, no problem with porting this API anywhere.
- Issue raised before is that if you're relying on a certain type of thread in MPI layer.
- But we don't, because there's a framework.
- But Application is linked against PMIx and libevent and to use other threading models is dangerous.
  - To make this work, you have to make changes to event polling, etc.
Not saying we shouldn't take these patches, these make things better.
- But we do have a problem that other thread components just aren't going to "just work", because PMIx and libevent with uses pthreads conflict with other threading models.
  - argobots actually uses pthreads, not sure about qthreads.
  - Working on a way to configure libevent to make this combo work.

There was a PR made that made a change

Austen will revert.

C11 atomic usage is a mess

Last week:
- George needs some input on PR
- We don't need _atomic_ in most cases just need volatile
- patch linked to the issue PR7914
- We're not breaking things, we just get alot of valid complaints from intel compiler.
  - STDOUT of make is ~16 MB due to all intel compiler warnings without this fix
There is a PR pending

Discuss Open-MPI binding when direct-launched

Schizo SLURM binding detection - Might not need a solution on v4.0.x
PRs have gone into v4.0.x and v4.1.x

Release Branches

Blockers All Open Blockers

Review v4.0.x Milestones v4.0.5

Discussing CUDA init in UCX PML PR 7898
- Looks like a bugfix, so should be okay to put into a release branch.
- Is there a better place to initialize the CUDA hooks?
- If we request a BTL or PML to be loaded, if configured with cuda
- CUDA library is loaded by BTL that requires it.
- Some questions about possibly making it more generic for all PMLs that use CUDA.
  - Don't want to load cuda if using only using TCP or Shared Mem
- We'll take this PR once it passes CI and is reviewed.
v4.0.5 schedule: End of July
- Will create RC1 today after PR7898 goes in.
- Two potential drivers for a quick v4.0.5 turn-around.
  - OSC RDMA Bug - May drive a v4.0.5 release.
  - Program Aborts on detach.

Review v4.1.x Milestones v4.1.0

Schedule: Want to release end-of-July
Posted a v4.1.0 rc1 to go through mechanisms to ensure we can release.
Release Engineers: Brian (AWS) Jeff Squyres (Cisco)
George found an SM BTL issue at Init on master. Jeff filed Issue 7937
- Cacheline size is set very late after modex, everything that uses cacheline before modex.
- Because we align some structs based on that, but
  - It would be associated with getting the topology (but not retreived until after the modex)
  - Only cuda btl calls the function directly, everyone else extracts from PMIx.
    - What we ought to do, no harm in getting topology earlier, just need to ensure PMIx is intialized.
    - On v4.1, we don't get the topology before someone requests it much later.
      - Must also affect v4.0.x
  - George put a fix into master, but making a better change to load it as soon as PMIx is intialized, would be much better.
    - Con is that if we're not in a PMIx environment to share this pointer, then every process will go do this discovery, even if they don't need it later.
    - Problem is that the process that creates the backing file, creates it very early.
- Someone should review all the branches to Look to see if we got topology before someone uses the cacheline size.
- George saw it in SM BTL structures. Deadlock.
- This isn't tested by our CI infrastructure.
Still want:
- George's Collectives
  - George is still working on master version of coll
  - Next thing he's working on today.
- Tunings for tuned coll
  - Nothing to discuss today.
- AVX
  - Went in this morning.
- UCX PRs awaiting review.
Past: We've come to consensus for a v4.1.0 release
- Need include/exclude selection, worried about consistent selection.
- Alot of PRs outstanding, but can't merge until
  - Patch for OFI stuff messed up v4.1.x branch.
  - Howard has a fix PR, Jeff is looking at.
- Howard changed new OFI BTL parameters to be consistent with MTL
- Not breaking ABI or backwards compatibility.
- v4.1.x branch, branched from v4.0.4 tag.
- NOT touching runtime!!!
- Not going to be pulling in a new PMIx version.
All MTT is online on v4.1.x branch
Not compiling under SLURM EFA test. (OFI BTL issue)

Review v5.0.0 Milestones v5.0.0

No update this week other than master discussion.
Need to put OSC pt2pt
- OS RDMA requires a single BTL that can contact every single process.
  - This didn't use to be the case. (Comment in the code)
We can't use the OSC pt2pt.
- It is not thread safe. Doesn't conform to MPI4 standard. Not safe.
- This is just a testing falicy. Could add tests to show this, but still at same boat.
- Either product A or B is broken and we need to fix it.
RDMA Onesided should fall back to "my atomics" because TCP will never have rdma atomics.
- The idea was to put the atomics into the BTL base, which could do all of the one-sided atomics under the covers.
Jeff will close the PR, and
Jeff will Nathan will fetching, get, compare and swap.
Two new PRs for MPI4.0 Error handling - new PRs from Aurelien Bouteiller.
Does UCX support iWarp?
- Does libFabric support iWarp via verbs provider?
- https://github.com/openucx/ucx/issues/2507 suggest it doesn't.
- Brian thinks that libFabric
- OFI can support iWarp, just need to specify the provider in the include list.
- This person who's asking is a partner not a customer
PMIX
- Working on PMIx v4.0.0 which is what Open MPI v5.0 will use.
- Sessions needs something from PMIx v4
- ULFM - not sure if it needs PMIx, think it needs PRRTE changes.
- PPN scaling issue - simple algorithmic issue in this function
  - PMIX talked about it. Artem might know someone who might be interested in working on it.
  - Algorithm behind one of the interfaces doesn't scale well.
  - Not a regression. Above ~ 4K nodes, becomes quadratic.
PRRTE
- Nothing's happening there.

master

Mostly discussed above.

Face to face

Many companies are not allowing a face to face travel until 2021 due to COVID19.
- Instead lets do a series of virtual-face to face?
Yes this summer to discuss for v5.0
- Maybe we can do it by topic?
- Maybe not 4 or 8 hour things.
Different topics on different days.
Do a doodle poll of least-worse days in late July/August.
- August 10th-14th - 3 hour block of time 8-11 Pacific time.
- Jeff will do another doodle for days of the week (vote for 2)
Start a list of topics.

Super Computing Birds-of-a-feather

George and Jeff will help plan and come to community.
- Done / Submitted.
May not have Super Computing conference at ALL this year.
Many other projects are doing a virtual state of the union type meeting to try to cover what they'd usually do in a Birds of a feather meeting.
Then this works pretty well, and do this a couple of times a year.
Not constrained to Super Computing
Almost certain that it will be virtual
- Not sure the cost.
- Ralph and Jeff have been doing ABCs of Open MPI - SO many people. Done 2 of 3 sessions (each went 1.5 hours, lots of questions)
  - Slides and Youtube are on website, and will send link to userlist.
  - Part 3 is August 5th
- Also want an indept walk through of PMIx initialization / wireup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeeklyTelcon_20200714

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

not there today (I keep this for easy cut-n-paste for future notes)

New

Annual review of OMPI

MPI Forum was last week.

Thread local storage issue

There was a PR made that made a change

C11 atomic usage is a mess

Discuss Open-MPI binding when direct-launched

Release Branches

Blockers All Open Blockers

Review v4.0.x Milestones v4.0.5

Review v4.1.x Milestones v4.1.0

Review v5.0.0 Milestones v5.0.0

master

Face to face

Super Computing Birds-of-a-feather

Infrastructure

Review Master Master Pull Requests

CI status

Depdendancies

PMIx Update

ORTE/PRRTE

MTT

Back to 2020 WeeklyTelcon-2020

Clone this wiki locally