Closed
Description
ATM whenever something doesn't go right in SLURM commands, babs would crash with smth like
df_all_job_status = request_all_job_status(self.type_system)
File "/home/asmacdo/devel/babs/babs/utils.py", line 1896, in request_all_job_status
return _request_all_job_status_slurm()
File "/home/asmacdo/devel/babs/babs/utils.py", line 1937, in _request_all_job_status_slurm
squeue_out_df = _parsing_squeue_out(std)
File "/home/asmacdo/devel/babs/babs/utils.py", line 1977, in _parsing_squeue_out
raise Exception("error in the `squeue` output,"
Exception: error in the `squeue` output, expected jobid and got squeue:
which leaves you guessing what is happening. In this case it was due to us lacking the original user inside the slurm podman container
[root@slurmctl /]# squeue -u blah
squeue: error: Invalid user: blah
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[root@slurmctl /]# squeue -u blah 2>/dev/null
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[root@slurmctl /]# echo $?
0
so it is coming to stderr not stdout which is what we want to parse. I think it would make more sense to
- parse only stdout
- if there is stderr for squeue -- take it as indication of error and output entire stderr for the user to digest, and use not just
Exception
but some more descriptive one, e.g.RuntimeError
@asmacdo might want to prep a quick PR
Metadata
Metadata
Assignees
Labels
No labels