Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add seff to Toil Slurm? #4501

Open
adamnovak opened this issue Jun 15, 2023 · 5 comments
Open

Add seff to Toil Slurm? #4501

adamnovak opened this issue Jun 15, 2023 · 5 comments

Comments

@adamnovak
Copy link
Member

adamnovak commented Jun 15, 2023

The seff command can get statistics about how much CPU and memory a job actually used.

It might be useful for Toil to expose this information in its logs somehow when running on Slurm, either to diagnose OOM or to diagnose excessively large job resource allocations.

┆Issue is synchronized with this Jira Story
┆Epic: Improve debugging experience
┆Issue Number: TOIL-1349

@DailyDreaming
Copy link
Member

Toil could maybe have an alert system when resource usage is too high maybe? Making sure that the spurious to useful alerts wouldn't be too high, though this may be difficult? We should discuss this later, but it might be low priority for now.

@mr-c
Copy link
Contributor

mr-c commented Jun 29, 2023

Yes, this would be good to have; and report as part of RO-Crate WorkflowRun provenance

@adamnovak
Copy link
Member Author

This might also be useful for Dockstore as part of a workflow analytics report-back feature/a bigger system.

@unito-bot
Copy link

➤ Adam Novak commented:

And it would be good for Toil to warn you if your jobs are over-provisioned and wasting cluster CPU and memory, rather than you having to manually guess which Slurm job is which and ask Slurm about them all.

@unito-bot
Copy link

➤ Adam Novak commented:

We should put this into toil stats or figure out if it is redundant with stuff toil stats already does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants