Remote Python Support for Benchmark Runner #1174

SujeethJinesh · 2025-01-17T01:00:31Z

Description

Adds a recipe for testing and eventually benchmarking the Pathways Remote Python feature.

This is an initial PR that will set the precedent for future recipes. It is designed to be shareable and have common functionality (like flag parsing) that can be shared across recipes, but flexible and standard so that multiple team members can simply run the file with minimal configuration changes. It's designed to avoid superfluous flag adding and continue pythonically using configs.

This recipe in particular is for testing functionality for remote python.

Tests

Tested this change on a test remote python sidecar server on a v6e and a v5e shared test cluster.

Sibling XPK change: AI-Hypercomputer/xpk#326

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

benchmarks/recipes/pw_remote_python_recipe.py

Obliviour · 2025-01-30T18:56:22Z

benchmarks/maxtext_trillium_model_configs.py

+        "enable_checkpointing": False,
+        # "profiler": "xplane",
+
+        # Additional tuning params for pathways long running test.


I am wondering if we can create a programmatic way of adding pathways params without needing to duplicate the config - for example default_basic_1 is sharable with mcjax and pathways and when the user has a pathways_config applied, it sets the pathways specific flags? I think the main goal to help solve is share a config between mcjax and pathways - and avoid the need to duplicate updates.

Curious what you think of this idea?

For right now there are only a few flags luckily that are pathways specific (e.g. enable_pathways_goodput), but we do often test using a lot more features enabled. A rough ideas I have is to separate the specific model tuning params from the workload params - e.g. like goodput, checkpointing, metrics, etc. That way the models can be the same, but the workload params can be suited to the specific type. Maybe worth expanding on if you think it's a good initial idea?

Remote Python Support for Benchmark Runner Add Remote Python Recipe Clean up Add Delete Capabilities Minor edits Fix Fix problems Changes Changes Fixes Undo Weird Merge Save work Break out args helper Fix file imports Move around stuff cleanup Fix Clean up comment whitespace Fix issues Fix lint

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 3 times, most recently from fd538eb to 2f8124f Compare January 23, 2025 23:41

SujeethJinesh changed the title ~~[Draft] Remote Python Support for Benchmark Runner~~ Remote Python Support for Benchmark Runner Jan 24, 2025

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from c9df323 to 443586b Compare January 25, 2025 01:08

SujeethJinesh marked this pull request as ready for review January 25, 2025 01:08

SujeethJinesh requested review from gobbleturk, khatwanimohit, bvandermoon, vipannalla and RissyRan as code owners January 25, 2025 01:08

SujeethJinesh requested review from shauryagup and Obliviour January 25, 2025 01:08

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 60a5bd5 to 443586b Compare January 26, 2025 18:05

SujeethJinesh mentioned this pull request Jan 26, 2025

[DRAFT] Add PW/McJAX Benchmark Comparison & Long Running Test Recipes #1201

Closed

4 tasks

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 3 times, most recently from 059e78e to 75167f0 Compare January 28, 2025 22:52

shauryagup reviewed Jan 29, 2025

View reviewed changes

benchmarks/recipes/pw_remote_python_recipe.py Outdated Show resolved Hide resolved

benchmarks/recipes/pw_remote_python_recipe.py Show resolved Hide resolved

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 75167f0 to 6427792 Compare January 29, 2025 01:10

shauryagup approved these changes Jan 30, 2025

View reviewed changes

Obliviour reviewed Jan 30, 2025

View reviewed changes

Obliviour approved these changes Jan 30, 2025

View reviewed changes

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 2 times, most recently from ddfe758 to 895e7c7 Compare January 30, 2025 21:20

gobbleturk approved these changes Jan 30, 2025

View reviewed changes

github-actions bot added the pull ready label Jan 30, 2025

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 895e7c7 to e268cc4 Compare January 30, 2025 23:28

SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from e268cc4 to f18828a Compare January 30, 2025 23:29

copybara-service bot merged commit 26fa593 into main Jan 30, 2025
16 checks passed

copybara-service bot deleted the sujinesh/remote_python_benchmark_runner branch January 30, 2025 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Python Support for Benchmark Runner #1174

Remote Python Support for Benchmark Runner #1174

SujeethJinesh commented Jan 17, 2025 •

edited

Loading

Obliviour Jan 30, 2025

SujeethJinesh Jan 30, 2025

Remote Python Support for Benchmark Runner #1174

Remote Python Support for Benchmark Runner #1174

Conversation

SujeethJinesh commented Jan 17, 2025 • edited Loading

Description

Tests

Checklist

Obliviour Jan 30, 2025

Choose a reason for hiding this comment

SujeethJinesh Jan 30, 2025

Choose a reason for hiding this comment

SujeethJinesh commented Jan 17, 2025 •

edited

Loading