- 
                Notifications
    You must be signed in to change notification settings 
- Fork 928
WeeklyTelcon_20200804
        Geoffrey Paulsen edited this page Jan 19, 2021 
        ·
        2 revisions
      
    - Dialup Info: (Do not post to public mailing list or public wiki)
Did not capture attendance accurately -- this may not be fully correct. I put a "yes" next to the people I know were there today.
- NOT-YET-UPDATED
Blockers All Open Blockers
Review v4.0.x Milestones v4.0.5
- 
Still waiting on blocker (also v4.1): cache line stuff - Why is this a correctness issue (not just a performance optimization)?
- We align the data in the shared memory stuff to be on cache line sizes
- We start the ring every 128 bytes (i.e., local rank 0)
- Other processes then find out the real cache line size of 64.
- Then other processes attach to shared memory, and use the cache line size/alignment of 64.
- First message will get sent, but then the 2nd message will never be received (and/or it's reading corrupt data because it's reading at offset 64 instead of 128).
 
- How is this not happening anywhere else?
- Previously, cache line size was setup very, very late (after all the shmem stuff was setup -- even the non-local-process-rank-0). I.e., we got lucky.
- I.e., we brought the hwloc initialization forward at some point and broke this.
- This only happens in smcudaBTL (and possibly only in single-node runs, because other BTLs/PMLs may have been selected).
- The plain smandvaderBTLs do this differently.
- Meaning: this is a very specific corner case.
 
- Solutions?
- Trivial fix: just have everyone use a fixed value (e.g., 128 or 64).
- Pretty simple: modex-send the size to be used from local rank 0 to the others. The others modex recv the value and use it.
- A little more complicated: also add code to smcudato read the Linux /proc / /sys / whatever to get the cache line size.
 
- There's a PR for master that does the fix -- but in a way that will kill scalability.
- Once Brian's configury fixes are in, this is easy to fix on master.
- Or it could be done the "A little more complicated" way, above. Neither of which are difficult.
 
- For 4.0 and 4.1: George will make one-liner patch to make everyone use a fixed value.
- This clears the blocker.
 
 
- Why is this a correctness issue (not just a performance optimization)?
- 
https://github.com/open-mpi/ompi/issues/7968: added something to README for v4.0: there's a known issue when using UCX with very, very old IB hardware (pre-Connect X) -- it'll segv. According to Mellanox, UCX 1.10 will fix this issue. 
Review v4.1.x Milestones v4.1.0
- 
Same cache line blocker as v4.0. 
- 
https://github.com/open-mpi/ompi/issues/7982: OFI BTL and FI_DELIVERY_COMPLETE. This only matters for MPI one-sided. - EFA and other providers are misbehaving
- 
https://github.com/open-mpi/ompi/pull/7973: PR for fix: Disable EFA provider
- ...but then later discovered that other providers also misbehave in the same way.
 
- AWS proposal: extend #7973 to exclude other providers that misbehave.
- Meaning: if you're using libfabric over verbs, the OFI BTL won't be used.
- In v4.0x, there is no OFI BTL. So this is not an issue.
- In v4.1 this is a minor inconvenience because we still have osc/pt2pt. I.e., OMPI will automatically fall back to osc/pt2pt.
- This is unfortunately a big problem for master/v5.0. Need to figure this out -- i.e., coordinate with libfabric community.
- NOTE: This is a different code path than the MPI-one-sided problem Cisco MTT discovered when we removed osc/rdma (and all MPI_WIN_CREATE operations failed).
- Looks like Cisco MTT is still failing one-sided tests -- need to follow up with Nathan.
 
 
- Howard asks: how can I see this problem?
- Anything with MPI_PUT. E.g., IBM one-sided tests.
 
 
- 
ADAPT / HAN. - Need to test and produce some documentation for ADAPT and HAN.
 
Review v5.0.0 Milestones v5.0.0
- No update this week other than master discussion.
- 
osc/pt2pt removal on master - George: There are many machines where osc/pt2pt is the only mechanism, and it was the most performant.
- Brian: osc/pt2pt wasn't removed because it wasn't needed, it was removed because it's very buggy (to include no good path to becoming multi-thread safe) and "unrecoverably broken" (Brian's words! And he wrote it!) and no one will take ownership of fixing it.
- ...so if someone wants to take ownership of fixing it, they can!
 
- 
Ralph points out: - AWS MTT builds for SLURM, need to fix up the compiles for external hwloc/libevent. Brian+William will talk internally.
- Java: builds failing from Aurelien PR. He'll have a look.
 
- It's after July, so Jeff will go de-activate people.
- Brian will go do it today.
 
- Agenda items for next week.
- Talk through MPI-4 features.  Howard will make a list of big-ticket MPI-4 features (from MPI-4 changelog).
- Sessions
- Default error handler
- ...etc.
 
- Walk through PRRTE issues.
- Figure out: which are blockers for v5.0? (etc.)
 
- With these two, we're good enough for Monday's meeting.
- Please add any other items to the wiki.
- We'll evaluate if we still need Tuesday's meeting.
 
 
- Talk through MPI-4 features.  Howard will make a list of big-ticket MPI-4 features (from MPI-4 changelog).