Skip to content

WeeklyTelcon_20180424

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen

  • Jeff Squyrese

  • Brian

  • Joshua Ladd

  • Nathan Hjelm

  • Thomas Naughton

  • Todd Kordenbrock

  • Xin Zhao

  • Murali Emani

  • Josh Hursey

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.4

  • v2.1.4 - Targeting Oct 15th,
  • lower priority to v3.0 and v3.1
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • Schedule:
    • Quick turnaround on this, Shooting for May 1st.
  • v3.0.2 open for bugfixes.
    • Will pre-emptively fix PMIx compatibility pieces to pickup PMIx v1.2.5 clients.
    • This will bring in PMIx compatibility with OMPI client (mpirun/orted/libmpi) from OMPI v2.1.3
  • memkind disable needs to get into v3.0.2, Either taken care of or waiting to be taken care of.

Review v3.1.x Milestones v3.1.0

  • Schedule - ASAP - but blockers keep getting filed.
    • No one seems particularly eager to get it out.
    • Not getting any
  • blockers
    • Issue 5048

      • Feature that works in v3.0 that doesn't work in v3.1
      • also broken in master.
      • Giles confirmed this is not a problem in the 3.0.x series.
      • This is not mpirun/orte differences. Just a library
      • Giles fix does not go back to pmix v1.2
      • Any use cases for who wants this to work?
        • We made statements about compatibility
        • We're not sure what the container folks want/need.
      • Suggestion with OMPI v3.1.x you should switch to PMIx v2.x
      • Suggestion we rename this to v4.0, but then we know fall release is v5.0
    • UCX OSC https://github.com/open-mpi/ompi/issues/5083

      • Mellanox has a fix for this, will review and have a PR in next few days.
      • IBM will review this when posted.
    • Cisco SLURM dies in weird ways for both v3.0.x and v3.1.x

    • SLURM 16.15 OPMI v3.1 with external PMIx, this won't work.

      • Broken on master also
    • No one seems to have resources to fix this.

Review Master Master Pull Requests

  • OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
    • Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.

Other topics

  • v4.0 release manager
    • Howard and Geoff have volunteered, but we can have other volunteers.
    • Start talking about now. Plan to branch mid-july.
    • Voted for Geoff and Howard for release manager for v4.0 mid-July branch, release mid-Sept?
  • OMPI testing of PMIx compatibility
    • Late August would be a good time to have this OMPI testing of PMIx compatibility come online.
    • Open MPI should test to keep upstream PMIx compatibility to a high bar.
    • We need to test cross compatibility PMIx coverage.
      • What to test? Launch, simple communication, abort.
      • Other PMIx features? Debugger attach, others?

MTT / Jenkins Testing Dev

  • Get copy of perl JSON, and put it on MTT.

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally