-
Notifications
You must be signed in to change notification settings - Fork 73
Add a sample multi-GPU benchmark #5753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
!test |
|
Review updated until commit 5da9fb0 Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
| ||||
| Tests |
| ||||
| Configuration changes |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Constructor ordering
|
Greptile SummaryRefactored test infrastructure by extracting Key changes:
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant TestRunner as Test Runner
participant MDT as MultiDeviceTest
participant NVFT as NVFuserTest
participant MDF as MultiDeviceFixture
participant Comm as Communicator
TestRunner->>MDT: Construct MultiDeviceTest
MDT->>NVFT: Call NVFuserTest()
NVFT-->>MDT: Base initialized
MDT->>MDF: Call MultiDeviceFixture()
MDF->>Comm: getInstance()
Comm-->>MDF: communicator instance
MDF->>MDF: Setup tensor_options_
MDF->>MDF: Setup debug_print
MDF-->>MDT: Fixture initialized
MDT->>MDT: Setup disable_skip
MDT-->>TestRunner: Test object ready
TestRunner->>MDT: SetUp()
MDT->>NVFT: NVFuserTest::SetUp()
NVFT-->>MDT: Base setup complete
MDT->>MDT: Check communicator availability
MDT-->>TestRunner: Setup complete
TestRunner->>MDT: Run test
Note over MDT,MDF: Test uses communicator_,<br/>tensor_options_ from fixture
MDT-->>TestRunner: Test complete
TestRunner->>MDT: Destroy MultiDeviceTest
MDT->>MDF: Call ~MultiDeviceFixture()
MDF->>Comm: barrier() if available
MDF-->>MDT: Cleanup complete
MDT-->>TestRunner: Destroyed
|
``` $ mpirun -np 2 -output-filename /tmp/test_multidevice bin/test_multidevice --benchmarks=all $ cat /tmp/test_multidevice/1/rank.0/stdout ----------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------------------------- MultiDeviceBenchmark/Reduction/4/iterations:10 20128420 ns 16788148 ns 10 MultiDeviceBenchmark/Reduction/8/iterations:10 100694 ns 100708 ns 10 ```
| testing::InitGoogleTest(&argc, argv); | ||
| testing::AddGlobalTestEnvironment(new nvfuser::MultiDeviceTestEnvironment()); | ||
|
|
||
| if (wantsBenchmarks(argc, argv)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks tend to run longer and don't need to run as frequently as tests, so it's worth separating benchmarks from (correctness) tests.
The question though is how.
- In this version, the benchmarks are defined in the same set of files as tests, and I'm reusing the same main function which detects flags like
--benchmarks. - Alternatively, I could write two main functions (one for tests and the other for benchmarks) and link them to different binaries (test_multidevice vs benchmark_multidevice).
- Furthermore, I could even split the test files and the benchmark files. It's harder to reuse code this way. For example, a FusionDefinition needs to be DRY'ed in order to be both tested and benchmarked.
We've been adding C++ multi-GPU benchmarks organically, e.g.,
Fuser/benchmarks/cpp/p2p_communication.cpp
Line 25 in d395676
Fuser/tests/cpp/test_multidevice_lower_communication_cuda.cpp
Line 80 in d395676
This PR adds a sample multi-GPU benchmark based on https://github.com/google/benchmark, which nvFuser already uses for single-GPU.
Below is the repro: