-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote Python Support for Benchmark Runner #1174
Remote Python Support for Benchmark Runner #1174
Conversation
fd538eb
to
2f8124f
Compare
c9df323
to
443586b
Compare
60a5bd5
to
443586b
Compare
059e78e
to
75167f0
Compare
75167f0
to
6427792
Compare
"enable_checkpointing": False, | ||
# "profiler": "xplane", | ||
|
||
# Additional tuning params for pathways long running test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we can create a programmatic way of adding pathways params without needing to duplicate the config - for example default_basic_1 is sharable with mcjax and pathways and when the user has a pathways_config applied, it sets the pathways specific flags? I think the main goal to help solve is share a config between mcjax and pathways - and avoid the need to duplicate updates.
Curious what you think of this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For right now there are only a few flags luckily that are pathways specific (e.g. enable_pathways_goodput), but we do often test using a lot more features enabled. A rough ideas I have is to separate the specific model tuning params from the workload params - e.g. like goodput, checkpointing, metrics, etc. That way the models can be the same, but the workload params can be suited to the specific type. Maybe worth expanding on if you think it's a good initial idea?
ddfe758
to
895e7c7
Compare
895e7c7
to
e268cc4
Compare
Remote Python Support for Benchmark Runner Add Remote Python Recipe Clean up Add Delete Capabilities Minor edits Fix Fix problems Changes Changes Fixes Undo Weird Merge Save work Break out args helper Fix file imports Move around stuff cleanup Fix Clean up comment whitespace Fix issues Fix lint
e268cc4
to
f18828a
Compare
Description
Adds a recipe for testing and eventually benchmarking the Pathways Remote Python feature.
This is an initial PR that will set the precedent for future recipes. It is designed to be shareable and have common functionality (like flag parsing) that can be shared across recipes, but flexible and standard so that multiple team members can simply run the file with minimal configuration changes. It's designed to avoid superfluous flag adding and continue pythonically using configs.
This recipe in particular is for testing functionality for remote python.
Tests
Tested this change on a test remote python sidecar server on a v6e and a v5e shared test cluster.
Sibling XPK change: AI-Hypercomputer/xpk#326
Checklist
Before submitting this PR, please make sure (put X in square brackets):