Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Batch job submission failed: Invalid job array specification #31

Open
swgu98 opened this issue Dec 13, 2021 · 3 comments
Open

Batch job submission failed: Invalid job array specification #31

swgu98 opened this issue Dec 13, 2021 · 3 comments

Comments

@swgu98
Copy link

swgu98 commented Dec 13, 2021

Hi, when I run "python -m cc_net", this error happened:

Submitting _hashes_shard in a job array (1600 jobs)
sbatch: error: Batch job submission failed: Invalid job array specification
subprocess.CalledProcessError: Command '['sbatch', '/data/gsw/test/cc_net/data/logs/submission_file_479eba35e148432da4432891c1191887.sh']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/gsw/test/cc_net/cc_net/main.py", line 18, in
main()
File "/data/gsw/test/cc_net/cc_net/main.py", line 14, in main
func_argparse.parse_and_call(cc_net.mine.get_main_parser())
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/func_argparse/init.py", line 72, in parse_and_call
return command(**parsed_args)
File "/data/gsw/test/cc_net/cc_net/mine.py", line 632, in main
all_files = mine(conf)
File "/data/gsw/test/cc_net/cc_net/mine.py", line 335, in mine
hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem))
File "/data/gsw/test/cc_net/cc_net/mine.py", line 263, in hashes
ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs))
File "/data/gsw/test/cc_net/cc_net/execution.py", line 89, in map_array_and_wait
jobs = ex.map_array(function, *args)
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/core.py", line 701, in map_array
return self._internal_process_submissions(submissions)
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions
return self._executor._internal_process_submissions(delayed_submissions)
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions
first_job: core.Job[tp.Any] = array_ex._submit_command(self._submitit_command_str)
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/core.py", line 864, in _submit_command
output = utils.CommandFunction(command_list, verbose=False)() # explicit errors
File "/home/gsw/anaconda3/envs/test_p/lib/python3.9/site-packages/submitit/core/utils.py", line 350, in call
raise FailedJobError(stderr) from subprocess_error
submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid job array specification

@gwenzek
Copy link
Contributor

gwenzek commented Dec 29, 2021

This seems to be an issue with you SLURM cluster.
Can you share the "submission_file.sh" created by submitit ?
Does your SLURM cluster support job arrays ?

@swgu98
Copy link
Author

swgu98 commented Jan 3, 2022

This seems to be an issue with you SLURM cluster. Can you share the "submission_file.sh" created by submitit ? Does your SLURM cluster support job arrays ?

Sorry,Slurm is not installed on my computer.I think this may be the reason.

@peter-ch
Copy link

I installed and configured Slurm, but I still get this error:

Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/md0/cc_net/cc_net/main.py", line 18, in
main()
File "/mnt/md0/cc_net/cc_net/main.py", line 14, in main
func_argparse.parse_and_call(cc_net.mine.get_main_parser())
File "/usr/local/lib/python3.8/dist-packages/func_argparse/init.py", line 72, in parse_and_call
return command(**parsed_args)
File "/mnt/md0/cc_net/cc_net/mine.py", line 631, in main
all_files = mine(conf)
File "/mnt/md0/cc_net/cc_net/mine.py", line 334, in mine
hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem))
File "/mnt/md0/cc_net/cc_net/mine.py", line 263, in hashes
ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs))
File "/mnt/md0/cc_net/cc_net/execution.py", line 89, in map_array_and_wait
jobs = ex.map_array(function, *args)
File "/usr/local/lib/python3.8/dist-packages/submitit/core/core.py", line 771, in map_array
return self._internal_process_submissions(submissions)
File "/usr/local/lib/python3.8/dist-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions
return self._executor._internal_process_submissions(delayed_submissions)
File "/usr/local/lib/python3.8/dist-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions
first_job: core.Job[tp.Any] = array_ex._submit_command(self._submitit_command_str)
File "/usr/local/lib/python3.8/dist-packages/submitit/core/core.py", line 934, in _submit_command
output = utils.CommandFunction(command_list, verbose=False)() # explicit errors
File "/usr/local/lib/python3.8/dist-packages/submitit/core/utils.py", line 352, in call
raise FailedJobError(stderr) from subprocess_error
submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid job array specification

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants