Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FLUX_JOB_RANKS environment variable to prolog and epilog #6670

Merged
merged 4 commits into from
Mar 2, 2025

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Feb 27, 2025

As noted in #6668 it would be useful to supply the job prolog and epilog with enough information for these scripts to determine on what node of the job they're running.

This PR adds a FLUX_JOB_RANKS environment variable for this purpose. While there's no flux utility for manipulating idsets, FLUX_JOB_RANKS can be used along with the hostlist attribute to get the job hostlist, from which the script can test if it is running on a specific rank. (I plan to add a -F, --find=HOST option to flux hostlist soon to make this a bit more useful.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one minor nit.

@@ -870,6 +881,8 @@ static struct perilog_proc *procdesc_run (flux_t *h,
return proc;
error:
idset_destroy (ranks);
idset_destroy (job_ranks);
free (rank_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap free in ERRNO_SAFE_WRAP or equivalent? Looks like callers may expect errno to be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This also made me realize job_ranks and rank_str are leaked in the success case! 🤦

Fixed and forced a push.

grondo added 4 commits March 1, 2025 15:41
Problem: A return statement in procdesc_run() should have returned -1,
but due to a typo the statement is `return 01;`.

Fix the typo so the function returns -1 on failure.
Problem: The prolog and epilog scripts may need to determine which
node within the job they're running on, but this requires fetching
the job's R or data from the job-list module, which is not a
scalable operation.

Provide the ranks which the job was assigned in a `FLUX_JOB_RANKS`
environment variable to the prolog and epilog. This contains enough
information to derive other attributes of the current node without
fetching data from a central location (e.g. by using the locally
available hostlist attribute)

Fixes flux-framework#6668
Problem: No tests ensure the perilog plugin adds a FLUX_JOB_RANKS
environment variable to the prolog/epilog environment.

Add a test to t2274-manager-perilog-per-rank.t that ensures
this environment variable has the expected value in the prolog.
Problem: FLUX_JOB_RANKS, as set in the prolog and epilog environment,
is not documented.

Document this environment variable in flux-environment(7).
@grondo
Copy link
Contributor Author

grondo commented Mar 1, 2025

Thanks! setting MWP.

@mergify mergify bot merged commit 7c6746e into flux-framework:master Mar 2, 2025
34 of 35 checks passed
Copy link

codecov bot commented Mar 2, 2025

Codecov Report

Attention: Patch coverage is 66.66667% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.86%. Comparing base (5ee36b4) to head (b8001f0).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/job-manager/plugins/perilog.c 66.66% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6670   +/-   ##
=======================================
  Coverage   83.86%   83.86%           
=======================================
  Files         534      534           
  Lines       88929    88939   +10     
=======================================
+ Hits        74577    74586    +9     
- Misses      14352    14353    +1     
Files with missing lines Coverage Δ
src/modules/job-manager/plugins/perilog.c 82.64% <66.66%> (-0.04%) ⬇️

... and 9 files with indirect coverage changes

@grondo grondo deleted the perilog-ranks branch March 2, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants