-
Notifications
You must be signed in to change notification settings - Fork 864
WeeklyTelcon_20240130
- Dialup Info: (Do not post to public mailing list or public wiki)
- Tommy Janjusic (NVIDIA)
- Christoph Niethammer (HLRS)
- Joseph Schuchart (UTK)
- Todd Kordenbrock
- Luke Robison (AWS)
- Jeff Squyres (CISCO)
- Edgar Gabriel (AMD)
- Thomas Huber (Cornelis)
- Wenduo Wang (AWS)
- David Bernholdt (ORNL)
- Thomas Naughton (ORNL)
-
v4.1.x Issues; v4.1.x Questions
- 12270: btl smcuda hang in v4.1.5
- v4.1.x Open PRs
- Next release end of Q1.
- New issues since v5.0.1
- Open PRs since v5.0.1
- Closed/Merged PRs since v5.0.1
-
v5.0.x Open PRs
- Non-critical changes will be held back until 5.0.2 release
- 12135, will merge after 5.0.2.
- v5.0.x Issues; v5.0.x Questions
- v5.0.2 delayed
- Jeff: Progress on testing using automation to close tickets where we need user to reply, but they don't reply
- Use two different bots:
- Requirements:
- Only do this automation on github issues where we apply a specific label (e.g., "needs-reply")
- If anyone replies who does not have write perms to the repo, automation remotes the "needs-reply" label
- If X time goes by with no replies, bot adds a polite comment "We're still waiting for a reply; if Y more time goes by with no reply, we'll close this issue"
- If Y more time goes by with no replies, bot adds a polite comment "We're assuming this issue has been abandoned, and will close it", and then actually closes the issue.
- Testing on Jeff's fork.
- Issue with no label, so it won't go stale/close
- Issue with label, so it should get stale and eventually close
-
Issue showing automation remove the label when user replied to the issue -- should go stale/auto-close because I re-applied the
needs-reply
label
- Luke: Looking for ideas on how we can cache SMSC endpoints for use in hierarchy-aware collectives. Draft PR
- Edgar: face-to-face engineering meeting in Spring in Austin?
- Need to discuss MPI 4.1
- Need to discuss MPI 4.0
- Discuss set v5.0.2 timeline
- Discuss v5.1.x feature list and timeline
- Dig into libcudadir
- F2F
Slow MPI_Group_difference · Issue #12286 · open-mpi/ompi (github.com) Short explanation no promises
V4.1 btl_smcuda hang - reproduce in main? Luke is looking into it next week Maybe Edgar can also reproduce, will check
[v4.1.x] fix atomic operations opal_atomic_compare_exchange_strong_ in arm64 by yuncliu · Pull Request #12005 · open-mpi/ompi (github.com) keep open until resolution on main
^ critical, but not marked as blocker yet
Config/opal_check_cuda check for with-cuda-libdir
smsc/xpmem: Refactor reg-cache to use tree's find() instead of iterate() by gkatev · Pull Request #11358 · open-mpi/ompi (github.com) Probably needs merged, and maybe solve: WIP: Add smsc endpoints to HAN by lrbison · Pull Request #12272 · open-mpi/ompi (github.com) Luke will take a look at smsc reg