Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Python Support for Benchmark Runner #1174

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

SujeethJinesh
Copy link
Collaborator

@SujeethJinesh SujeethJinesh commented Jan 17, 2025

Description

Adds a recipe for testing and eventually benchmarking the Pathways Remote Python feature.

This is an initial PR that will set the precedent for future recipes. It is designed to be shareable and have common functionality (like flag parsing) that can be shared across recipes, but flexible and standard so that multiple team members can simply run the file with minimal configuration changes. It's designed to avoid superfluous flag adding and continue pythonically using configs.

This recipe in particular is for testing functionality for remote python.

Tests

Tested this change on a test remote python sidecar server on a v6e and a v5e shared test cluster.

Sibling XPK change: AI-Hypercomputer/xpk#326

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 3 times, most recently from fd538eb to 2f8124f Compare January 23, 2025 23:41
@SujeethJinesh SujeethJinesh changed the title [Draft] Remote Python Support for Benchmark Runner Remote Python Support for Benchmark Runner Jan 24, 2025
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from c9df323 to 443586b Compare January 25, 2025 01:08
@SujeethJinesh SujeethJinesh marked this pull request as ready for review January 25, 2025 01:08
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 60a5bd5 to 443586b Compare January 26, 2025 18:05
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 3 times, most recently from 059e78e to 75167f0 Compare January 28, 2025 22:52
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 75167f0 to 6427792 Compare January 29, 2025 01:10
"enable_checkpointing": False,
# "profiler": "xplane",

# Additional tuning params for pathways long running test.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can create a programmatic way of adding pathways params without needing to duplicate the config - for example default_basic_1 is sharable with mcjax and pathways and when the user has a pathways_config applied, it sets the pathways specific flags? I think the main goal to help solve is share a config between mcjax and pathways - and avoid the need to duplicate updates.

Curious what you think of this idea?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For right now there are only a few flags luckily that are pathways specific (e.g. enable_pathways_goodput), but we do often test using a lot more features enabled. A rough ideas I have is to separate the specific model tuning params from the workload params - e.g. like goodput, checkpointing, metrics, etc. That way the models can be the same, but the workload params can be suited to the specific type. Maybe worth expanding on if you think it's a good initial idea?

@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch 2 times, most recently from ddfe758 to 895e7c7 Compare January 30, 2025 21:20
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from 895e7c7 to e268cc4 Compare January 30, 2025 23:28
Remote Python Support for Benchmark Runner

Add Remote Python Recipe

Clean up

Add Delete Capabilities

Minor edits

Fix

Fix problems

Changes

Changes

Fixes

Undo Weird Merge

Save work

Break out args helper

Fix file imports

Move around stuff

cleanup

Fix

Clean up comment

whitespace

Fix issues

Fix lint
@SujeethJinesh SujeethJinesh force-pushed the sujinesh/remote_python_benchmark_runner branch from e268cc4 to f18828a Compare January 30, 2025 23:29
@copybara-service copybara-service bot merged commit 26fa593 into main Jan 30, 2025
16 checks passed
@copybara-service copybara-service bot deleted the sujinesh/remote_python_benchmark_runner branch January 30, 2025 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants