Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4: implement dynamic netmod av entry size #7204

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Nov 9, 2024

Pull Request Description

With support of multiple nics and multiple vcis at build time, each av (address vector) entry is expanded into [MAX_NICS][MAX_VCIS] even though most applications don't utilize all the nics and all vcis per process, especially in high PPN cases. This may cause significant memory pressure when launch jobs with high PPN and large number of nodes.

To avoid wasting memory and adapt the av entry size to runtime variables, such as MPIR_CVAR_CH4_OFI_MAX_NICS and MPIR_CVAR_CH4_NUM_VCIS, let netmod report the needed av_entry_size during "MPIDI_NM_init_local` and use pointer arithmetic to lookup av entries.

Also, set the default MPIR_CVAR_CH4_OFI_MAX_NICS to 1 to avoid initializing extra nics in each process. Apparently, libfabric may use a lot of memory for its av table, more so than MPICH's av table. Avoiding initializing extra nics greatly saves memory.

Concern

AV look up is in the hot path. This patch adds a memory load for MPIDI_global.av_entry_size.

TODO:

  • Measure the performance impact.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou
Copy link
Contributor Author

hzhou commented Nov 10, 2024

test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Nov 22, 2024

test:mpich/custom
netmod: ch4:ofi
config: nohwloc

Let netmod set the size of MPIDI_av_entry_t during init_local. This
allows the entry size to be adapted to the maximum number of nics and
vcis that user set during runtime.

This avoids wasting memory in very large jobs due to too large
MAX_NICS and MAX_VCIS set at build time.
Wrap the access to the actual av entry in macro MPIDI_OFI_AV_ADDR.
Add MPIDI_OFI_global.max_vcis that reflecs the max vcis from runtime
CVARs such as MPIR_CVAR_CH4_NUM_VCIS.

This prepares switching to dynamic av entry size in the next commit.
Declare MPIDI_OFI_addr_t as dest[1] and set av_entry_size at init time
based on runtime settings.
Assuming the typical way for apps to utilize multiple nic is to launch
multiple processes per node, each bind to different nic. Thus set
MPIR_CVAR_CH4_OFI_MAX_NICS to 1 should be sufficient and saves init time
and resource.

For applications that want to utilize the striping mode to send very large
messages should manually set MPIR_CVAR_CH4_OFI_MAX_NICS to higher value.
@hzhou
Copy link
Contributor Author

hzhou commented Nov 22, 2024

test:mpich/ch4/most

test:mpich/custom
netmod: ch4:ofi
config: nohwloc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant