-
Notifications
You must be signed in to change notification settings - Fork 864
WeeklyTelcon_20180807
Geoffrey Paulsen edited this page Jan 15, 2019
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Jeff Squyres
- Howard
- Nathan Hjelm
- Geoff Paulsen
- Peter Gottesman (Cisco)
- Thomas Naughton
- Todd Kordenbrock
- Xin Zhao
- Brian
- akshay
- Geoffroy Vallee
- Matthew Dosanjh
- Ralph Castain
- Joshua Ladd
- Josh Hursey
- Matias Cabral
- Howard.
- Edgar Gabriel
- Akvenkatesh (nVidia)
- Howard Pritchard
- Dan Topa (LANL)
- David Bernholdt
- Dan Topa (LANL)
-
NEW: info.c warning - Jeff thought we'd fixed, but ralph saw on Cray.
-
Nathan is requestiong Comments on
- C11 integration into master. PR5445
- eliminate all of our atomic for C11 atomics.
- ACTION: Please review and comment on code.
-
ORTE discussion went well, Geoffroy Vallee wrote up summary and posted to devel-core on Jul 24th.
- ACTION: Everyone please read and reply to devel-core with your thoughts.
Review All Open Blockers
Review v2.x Milestones v2.1.4
- v2.1.4 - put out an RC1 v2.1.4
- Always used to have a src RPM as part of RC.
- Jeff had some problems using Python scrypt to upload 2.1.4 tarballs built on aws to s3.
- Type-o fix for PMIx (MB prefix), but not upgrading because 2.1.4 is end of 2.x stream
- Peter filed an Issue 5520
- Thread Multiple warnings when exit on an error. Doesn't block.
- Aug 10th is release date.
- Test RC, get feedback back.
Review v3.0.x Milestones v3.0.3
- Schedule:
- PR 5437 - George reviewed.
- PR 5484 - want into RC1, but Giles on vacation. - Nathan can test
- Want RC1 next week.
- v3.0.3 - targeting Sept 1st (more start RCs when 2.1 wraps up.
- Anticipate RC1 after Aug 10th release of v2.1.4 releases.
- Got good progress in reviews.
Review v3.1.x Milestones v3.1.0
- v3.1.2 release process, starts after Sept 1st release of v3.0.3
- Lots of PRs multiple 5485
- ucx segfault
- 5083 - we just need some update. Xin Zhao will update issue.
- Schedule: branch: July 18. release: Sept 17
- Date for first RC - Aug 13 (after sunset of 2.1.4)
- Cuda support:
- Does nVidia want if --with-cuda, then openib included by default?
- Yes, because at this moment UCX is not on par, but still want to migrate to ucx cuda.
- Warning message will mention deficate openib vs ucx
- Has this work been done???
- Does nVidia want if --with-cuda, then openib included by default?
- NEWS - Depricate MPIR message for NEWs - Ralph can help with this.
- PR 5497 - ROMIO wait for Giles to review.
- PR 5472 - joint effort of 4 commits - Jeff to review
- PR 5504 - Please ensure bug fixes only, and seperate commits to allow us to consider seperately.
- Geoff and Howard will build test suites with v3.1.x and run with master/v4.0 to see if anything breaks.
- ORTE/PRTE - Geoffroy Vallee sent out document with summary to core-devel. Everyone please read and reply.
- Want to make sure that there are very good alternatives to whatever orte is turning into that will use PMIx.
- Replacing framework and calling PMIx directly is a really good idea.
- Will mess up if there is no native support for PMIx.
- in Open MPI v5.0.x timeframe.
- From last week:
- MTT License discussion - MTT needs to be de-GPL-ified.
- All go try the python. - All the GPL is in the perl modules (using python works around that).
- Ralph started a PR, and now in limbo. Need to get this done by end of 2018
- Main concern is python is in a repo with no GPL code.
- Could delete perl alltogether, but may need to just move perl to different repo for a period of time, until everyone can move off of python.
- Has cisco found an alternative to perl funclets?
- Python ini execution is different than perls.
- Cisco has one perl ini for each branch, and under than 20-30 mpi installs.
- Probably will go with a template and stamp out 20-30 times
- MTT License discussion - MTT needs to be de-GPL-ified.
Review Master Master Pull Requests
- PR for setting VERSION on master Have we broken any VERSIONs
Review Master MTT testing
-
Hope to have better Cisco MTT in a week or two
- Peter is going through, and he found a few failures, which some have been posted.
- one-sided - nathan's looking at.
- some more coming.
- OSC_pt2pt will exclude yourself in a MT run.
- One of Cisco MTTs runs with env to turn all MPI_Init to MPI_Thread_init (even though single threaded run).
- Now that osc_pt2pt is ineligible, many tests fail.
- on Master, this will fix itself 'soon'
- BLOCKER for v4.0 for this work so we'll have vader and something for osc_pt2pt.
- Probably an issue on v3.x also.
- Did this for release branches, Nathan's not sure if on Master. - v4.0.x has RMA capable vader. Once
- One of Cisco MTTs runs with env to turn all MPI_Init to MPI_Thread_init (even though single threaded run).
- Peter is going through, and he found a few failures, which some have been posted.
-
OSHMEM v1.4 - cleanup work
- How do we look for test coverage of this? Right now just basic API tests.
-
Next Face to Face?
- When? Double dip with MPI Forum early December. Oct, Nov, 1st week of dec 3.
- Where? San Jose - Cisco yes and maybe depending on date
Albuquerque - Sandia (believe it's okay, but need to verify) - ACTION: Geoff will Doodle this
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu
- Amazon,
- Cisco, ORNL, UTK, NVIDIA