-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add FLUX_JOB_RANKS
environment variable to prolog and epilog
#6670
Conversation
3555765
to
d742f91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one minor nit.
@@ -870,6 +881,8 @@ static struct perilog_proc *procdesc_run (flux_t *h, | |||
return proc; | |||
error: | |||
idset_destroy (ranks); | |||
idset_destroy (job_ranks); | |||
free (rank_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap free in ERRNO_SAFE_WRAP or equivalent? Looks like callers may expect errno to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This also made me realize job_ranks
and rank_str
are leaked in the success case! 🤦
Fixed and forced a push.
Problem: A return statement in procdesc_run() should have returned -1, but due to a typo the statement is `return 01;`. Fix the typo so the function returns -1 on failure.
Problem: The prolog and epilog scripts may need to determine which node within the job they're running on, but this requires fetching the job's R or data from the job-list module, which is not a scalable operation. Provide the ranks which the job was assigned in a `FLUX_JOB_RANKS` environment variable to the prolog and epilog. This contains enough information to derive other attributes of the current node without fetching data from a central location (e.g. by using the locally available hostlist attribute) Fixes flux-framework#6668
Problem: No tests ensure the perilog plugin adds a FLUX_JOB_RANKS environment variable to the prolog/epilog environment. Add a test to t2274-manager-perilog-per-rank.t that ensures this environment variable has the expected value in the prolog.
Problem: FLUX_JOB_RANKS, as set in the prolog and epilog environment, is not documented. Document this environment variable in flux-environment(7).
Thanks! setting MWP. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6670 +/- ##
=======================================
Coverage 83.86% 83.86%
=======================================
Files 534 534
Lines 88929 88939 +10
=======================================
+ Hits 74577 74586 +9
- Misses 14352 14353 +1
|
As noted in #6668 it would be useful to supply the job prolog and epilog with enough information for these scripts to determine on what node of the job they're running.
This PR adds a
FLUX_JOB_RANKS
environment variable for this purpose. While there's no flux utility for manipulating idsets,FLUX_JOB_RANKS
can be used along with thehostlist
attribute to get the job hostlist, from which the script can test if it is running on a specific rank. (I plan to add a-F, --find=HOST
option toflux hostlist
soon to make this a bit more useful.