Implementation of SGE interface #43

QuantumChemist · 2024-08-02T12:55:58Z

Will close Matgenix/jobflow-remote#159 in jobflow-remote.

QuantumChemist · 2024-08-07T15:44:38Z

Hi @ml-evs 😃
I have finished the first version of the implementation but it's very hacky,

because I relied a lot on ChatGPT,

so I would be really glad about some feedback! 😁

And as none of our clusters use SGE, I wanted to at least check if the code produces the correct submission script, before I pass it on for testing.
So, I wanted to ask if there is something else I have to adjust in jobflow-remote?

Because when I make a submission test using:

from jobflow_remote.utils.examples import add
from jobflow_remote import submit_flow
from jobflow import Flow

job1 = add(1, 2)
job2 = add(job1.output, 2)

flow = Flow([job1, job2])

resources = {"cwd": "",
            "model": "smp 128",
            "job_name": "Autoplex_jf_test",
            "qout_path": "$JOB_ID.log",
            "qerr_path": "$JOB_ID.err",
            "device": "cpu",
            "soft_walltime": "05:00:00"}

print(submit_flow(flow, worker="auto_worker", resources=resources, project="auto"))

I run into the following problem:

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│     created_on = '2024-08-07 16:45'                                                                                                                        │
│          db_id = '2632'                                                                                                                                    │
│          index = 1                                                                                                                                         │
│       metadata = {}                                                                                                                                        │
│           name = 'add'                                                                                                                                     │
│        parents = []                                                                                                                                        │
│ previous_state = 'UPLOADED'                                                                                                                                │
│       priority = 0                                                                                                                                         │
│         remote = {                                                                                                                                         │
│                      'step_attempts': 3,                                                                                                                   │
│                      'retry_time_limit': '2024-08-07 17:11',                                                                                               │
│                      'error': Traceback (most recent call last):                                                                                           │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/jobs/jobcontroller.py", line 3333, in             │
│                  lock_job_for_update                                                                                                                       │
│                      **kwargs,                                                                                                                             │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/jobs/runner.py", line 538, in advance_state       │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/jobs/runner.py", line 685, in submit              │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/remote/queue.py", line 155, in submit             │
│                      options=options,                                                                                                                      │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/remote/queue.py", line 186, in                    │
│                  write_submission_script                                                                                                                   │
│                      options=options,                                                                                                                      │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/jobflow_remote/remote/queue.py", line 104, in                    │
│                  get_submission_script                                                                                                                     │
│                      return ""                                                                                                                             │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/qtoolkit/io/base.py", line 57, in get_submission_script          │
│                      if header := self.generate_header(options):                                                                                           │
│                    File "/home/certural/miniconda3/envs/auto/lib/python3.10/site-packages/qtoolkit/io/base.py", line 88, in generate_header                │
│                      raise ValueError(msg)                                                                                                                 │
│                  ValueError: The following keys are not present in the template: cwd, device, model, soft_walltime                                              │
│                                                                                                                                                            │
│                  }                                                                                                                                         │
│        run_dir = '/home/certural/Calcs_new/67/c5/8f/67c58fcb-d847-4fa1-8d29-9262ce0efa92_1'                                                                │
│          state = 'REMOTE_ERROR'                                                                                                                            │
│     updated_on = '2024-08-07 17:11'                                                                                                                        │
│           uuid = '67c58fcb-d847-4fa1-8d29-9262ce0efa92'                                                                                                    │
│         worker = 'bam_worker'                                                                                                                              │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Thank you a lot for your help and feedback!

ml-evs · 2024-08-07T16:17:23Z

Hi @QuantumChemist, nice work! I'll try to find the time to give some feedback before the end of the week. Thanks!

ml-evs

Fantastic work @QuantumChemist! Sorry for the delay giving you some feedback. We'll have to wait for @gpetretto to have a proper look I think but this is already in great shape.

Only one request (and a few minor comments below): could you add a test for submission script generation, similar to what I did for slurm in #45? Ideally this would use the same "maximalist" qresources but will probably support different features.

I guess (or hope!) you are adding this because you would like to use jobflow-remote with SGE, are you able to test this version of QToolkit with your current jobflow setup or do we need some patches in jobflow remote too? If you are, I can carry on the work I started to test hopefully both PBS and SGE directly inside the jobflow-remote CI.

src/qtoolkit/io/sge.py

ml-evs · 2024-08-16T10:16:32Z

src/qtoolkit/io/sge.py

+        supported += [
+            "njobs",
+            "memory_per_thread",
+            "time_limit",
+            "processes",
+            "processes_per_node",
+            "process_placement",
+            "nodes",
+            "threads_per_process",
+            "email_address",
+            "scheduler_kwargs",
+            "gpus_per_job",
+        ]


Not a problem for this PR, but since #44 I've been wondering if we can programmatically enforce that the fields listed here are definitely qresources members, e.g., specifying an abstract reference to QResources.njobs and so on, rather than unchecked strings.

I'm happy to implement it that way :)

Good point! I agree that it can be for another PR, since all the objects should be adapted.

@ml-evs @gpetretto so shall I rather leave this for another PR?

I added this now :)

Sorry, I am not sure, was something changed here?

sorry, that was a bit confusing. I clicked through the PRs and have added this
https://github.com/QuantumChemist/qtoolkit/blob/573641ec772508c71f1f02101902976bf0e1ab80/tests/io/test_sge.py#L240
for now.

QuantumChemist · 2024-08-16T15:25:21Z

Fantastic work @QuantumChemist! Sorry for the delay giving you some feedback. We'll have to wait for @gpetretto to have a proper look I think but this is already in great shape.

Only one request (and a few minor comments below): could you add a test for submission script generation, similar to what I did for slurm in #45? Ideally this would use the same "maximalist" qresources but will probably support different features.

I guess (or hope!) you are adding this because you would like to use jobflow-remote with SGE, are you able to test this version of QToolkit with your current jobflow setup or do we need some patches in jobflow remote too? If you are, I can carry on the work I started to test hopefully both PBS and SGE directly inside the jobflow-remote CI.

Hey @ml-evs! 😃
thank you so much for your feedback.
I'm totally all for adding a test as in #45

So far, I couldn't test it with jobflow-remote as although I put sge as scheduler_type, I still have the problem as described above and I didn't have time yet to investigate it further.

I also have a few days off and will try to make the changes when I'm back!

Co-authored-by: Matthew Evans <[email protected]>

gpetretto

Thanks a lot for adding SGE support.
I have left some comments below that might need some update.
The main issue is that if SGE really does not allow to query the jobs by list of ids we will need to update jobflow-remote, since it currently expects to be able to use that functionality. If you can confirm that this is the case I will start thinking how to adapt it. It will either ask for all the jobs (which would likely be inefficient) or I would need to add an option in jobflow-remote to query the jobs based on the user name.
Unfortunately I don't have access to a cluster with SGE so I will not be able to make tests easily. @ml-evs do you have a docker with SGE?

Concerning your initial error in jobflow-remote, it is not obvious guessing what could go wrong. Maybe I would first check if it really using the SGEIO object.

src/qtoolkit/io/sge.py

gpetretto · 2024-08-18T14:20:39Z

src/qtoolkit/io/sge.py

+        supported += [
+            "njobs",
+            "memory_per_thread",
+            "time_limit",
+            "processes",
+            "processes_per_node",
+            "process_placement",
+            "nodes",
+            "threads_per_process",
+            "email_address",
+            "scheduler_kwargs",
+            "gpus_per_job",
+        ]


Good point! I agree that it can be for another PR, since all the objects should be adapted.

src/qtoolkit/io/sge.py

QuantumChemist · 2024-08-21T12:24:24Z

Hi @gpetretto .
Thank you a lot for having a look at my PR. I think I will need quite a while to properly work through everything.
One problem that I have is that I don't have access to a system with SGE (as I'm not implementing this for myself) but I can ask people to test it. Therefore I just started with what I have found in aiida and SGE manuals.

ml-evs · 2024-08-21T12:39:15Z

Hi @gpetretto . Thank you a lot for having a look at my PR. I think I will need quite a while to properly work through everything. One problem that I have is that I don't have access to a system with SGE (as I'm not implementing this for myself) but I can ask people to test it. Therefore I just started with what I have found in aiida and SGE manuals.

It's taking a bit longer than expected to generalise the test docker container in jobflow-remote for other queues, but I'll keep you posted here too.

gpetretto · 2024-08-21T14:48:47Z

Thanks both.
@QuantumChemist in the meanwhile can you check with someone that have access to SGE if it is actually possible to query with qstat using a list of job ids? I have sketched an implementation in jobflow-remote that relies on the username and it seems feasible. If it is not possible to query by job ids I will finilize it.

QuantumChemist · 2024-09-18T12:18:57Z

Thanks both. @QuantumChemist in the meanwhile can you check with someone that have access to SGE if it is actually possible to query with qstat using a list of job ids? I have sketched an implementation in jobflow-remote that relies on the username and it seems feasible. If it is not possible to query by job ids I will finilize it.

Sorry @gpetretto , this message kind of slipped my attention and I have just seen it now! I will ask someone to try this! Thank you!

QuantumChemist · 2024-09-19T09:07:42Z

Thanks both. @QuantumChemist in the meanwhile can you check with someone that have access to SGE if it is actually possible to query with qstat using a list of job ids? I have sketched an implementation in jobflow-remote that relies on the username and it seems feasible. If it is not possible to query by job ids I will finilize it.

Hi @gpetretto 👋🏻

so I asked someone if it's possible to query with a list of job-IDs and it's not directly possible, only with a bash script as a workaround (job_ids=(202354 202353); for job_id in "${job_ids[@]}"; do qstat -j $job_id; done) and querying a single job-ID looks like:

gpetretto · 2024-09-19T09:20:28Z

Thanks @QuantumChemist.
I will then add the option based on the username to jobflow-remote. Otherwise even with this implementation, SGE would not be usable.

QuantumChemist · 2024-09-19T09:25:57Z

Thanks @QuantumChemist. I will then add the option based on the username to jobflow-remote. Otherwise even with this implementation, SGE would not be usable.

Thank you a lot! :)

… a FINISHED job state and added another link for SGE documnetation

gpetretto

Thanks @QuantumChemist for updating the code.
I left a few more comments.

Concerning #43 (comment) I think we can indeed address this in another PR. I will mark it as resolved.

For the TestSlurmIO::test_submission_script, I think that the problem is in the fact that the maximalist_qresources fixture is defined with scope="session", but then it is modified inside the tests (both SGE and SLURM). Maybe just remove the scope and let it be scoped for each function as by default. I suppose this should fix the issue.

gpetretto · 2024-09-25T09:56:19Z

src/qtoolkit/io/pbs_base.py

+        raise ValueError(f"Cannot query by user and job(s) in {self.get_system_name()}")
+
+    @abc.abstractmethod
+    def _get_base_command(self) -> list[str]:


Maybe this could have a more explicit name? Something like _get_status_base_command or _get_list_base_command. Any other option that clarifies the usage will be fine. Because at first it is not clear what the "base" is for.

sure, I will rename it!

gpetretto · 2024-09-25T10:02:58Z

src/qtoolkit/io/pbs_base.py

+
+        return v * (1024 ** power_labels[units.lower()])
+
+    _qresources_mapping = {


In moving this here I see that the line
"account": "account"
was lost.
If the list of keys is different between PBE and SGE move _qresources_mapping to the subclasses with their own list

thank you for catching that, that was not intended, I will add the account key back to PBS

gpetretto · 2024-09-25T11:24:23Z

src/qtoolkit/io/pbs_base.py

+        """Add soft_walltime if required by child classes (SGE)."""
+
+    @abc.abstractmethod
+    def get_system_name(self) -> str:


maybe this can be a class attribute, instead of a method?

gpetretto · 2024-09-25T11:59:51Z

src/qtoolkit/io/pbs_base.py

+        power_labels = self.get_power_labels()
+
+        if not units:
+            units = self.get_default_unit()


It looks like this is used in the base class, but there is no abstract method for it.
Can maybe be a class attribute?

QuantumChemist · 2024-09-25T14:01:47Z

Hey!
so this is how it looks like when my colleague uses my qtoolkit repo and the ml-evs/generalize-integration-tests jfr repo:

Did we maybe take the wrong jobflow-remote branch?

gpetretto · 2024-09-25T14:15:27Z

Are you using the ml-evs/generalize-integration-tests branch because you are using the SGE docker to test? Otherwise, if you needed to filter the SGE jobs by username you probably wanted to use this branch Matgenix/jobflow-remote#187 (I actually just merged it, so you can take the develop one if you want)

In any case, I would say that the error is not related to any update that was done in these branches of jobflow remote. The error seems to suggest that it is trying to use PBEIO rather than SGEIO. Although I admit it is trivial, the first thing that I would suggest to check is that the scheduler_type value in the configuration file has been switched to SGE.

QuantumChemist · 2024-09-25T14:24:54Z

Are you using the ml-evs/generalize-integration-tests branch because you are using the SGE docker to test? Otherwise, if you needed to filter the SGE jobs by username you probably wanted to use this branch Matgenix/jobflow-remote#187 (I actually just merged it, so you can take the develop one if you want)

In any case, I would say that the error is not related to any update that was done in these branches of jobflow remote. The error seems to suggest that it is trying to use PBEIO rather than SGEIO. Although I admit it is trivial, the first thing that I would suggest to check is that the scheduler_type value in the configuration file has been switched to SGE.

I think I simply used the wrong jobflow-remote branch. We will try with the one you just mentioned again. Thank you a lot!

QuantumChemist · 2024-09-25T15:13:58Z

Are you using the ml-evs/generalize-integration-tests branch because you are using the SGE docker to test? Otherwise, if you needed to filter the SGE jobs by username you probably wanted to use this branch Matgenix/jobflow-remote#187 (I actually just merged it, so you can take the develop one if you want)

In any case, I would say that the error is not related to any update that was done in these branches of jobflow remote. The error seems to suggest that it is trying to use PBEIO rather than SGEIO. Although I admit it is trivial, the first thing that I would suggest to check is that the scheduler_type value in the configuration file has been switched to SGE.

It seems we also forgot to change the scheduler type.

Now when we want to switch the scheduler type to sge, we cannot restart the daemon . Do you have an idea what we could try?

QuantumChemist · 2024-09-25T16:27:11Z

@gpetretto I have now implemented the last changes 😃
let me know if more adjustments are needed!

gpetretto · 2024-09-27T11:58:05Z

Hi @QuantumChemist, thanks for the updates. Still, I have seen that you have moved the _qresources_mapping and the other methods to be attributes of the instances rather than of the classes. I know it will not change much from a practical point of view, but can you make them class attributes, for uniformity with the other ones?

Now when we want to switch the scheduler type to sge, we cannot restart the daemon . Do you have an idea what we could try?

If it is not a problem with the configuration file (i.e. you can run the jf project check successfully), I would suggest looking at the logs in ~/.jfremote/PROJECT_NAME/log/runner.log to see if you can find any error and/or trying running the runner with jf runner run`. This will start the runner in the foreground. It is not really meant for standard execution, but it is a quick way to get the stacktrace if the runner fails for whatever reason.

QuantumChemist · 2024-09-30T16:12:19Z

Hi @gpetretto ! Sorry for confusing those two things (I'm still learning a lot about Python object oriented programming).
Is it ok like this?

gpetretto · 2024-10-02T13:06:52Z

Thanks @QuantumChemist. It should be fine now.
Did you manage to test this successfully together with jobflow-remote? Or are you still experiencing issues?

QuantumChemist · 2024-10-02T13:10:10Z

Thanks @QuantumChemist. It should be fine now. Did you manage to test this successfully together with jobflow-remote? Or are you still experiencing issues?

thank you, too!
My colleague's server (with SGE) is down this week and I have some days off until next week, so I couldn't test it yet.

ml-evs · 2024-10-11T14:38:41Z

Matgenix/jobflow-remote#160 (comment)

QuantumChemist · 2024-10-11T14:55:48Z

Matgenix/jobflow-remote#160 (comment)

I might to check more closely with my colleague if there might be a problem with output writing.

ml-evs · 2024-10-11T14:58:06Z

Thanks, there's almost certainly still problems at my end, but it would be good to verify that it's working for you before I dive into debugging again!

QuantumChemist · 2024-10-11T15:08:08Z

Thanks, there's almost certainly still problems at my end, but it would be good to verify that it's working for you before I dive into debugging again!

I will let you know when I have checked this with my colleague. I didn't have the time yet to continue this with her.

QuantumChemist · 2024-10-24T16:27:00Z

Hi @gpetretto and @ml-evs 😃 ,

I have some good news!
Today I was finally able to extensively check my implementation with my colleague via videocall and in principle the SGE implementation works. We are able to successfully run a "add" workflow for testing and everything completes fine (i.e. all job states are "COMPLETED" in the end).
But when we want to run a DoubleRelax VASP wf, we are stuck at the following step:

The strange thing about this is that the VASP job is finishing fine and there is also no output in queue.err. There is also no other file that would indicate towards a segmentation fault etc. when executing VASP. We were not able to find out why exactly the job was killed. All the jobflow-remote related json files look perfectly fine from what I can tell.
It's a bit hard to say where this issue could be rooted in, whether it's the cluster, qtoolkit or jobflow-remote.
Do you have any ideas? Thank you a lot!

ml-evs · 2024-10-24T16:56:25Z

I have some good news! Today I was finally able to extensively check my implementation with my colleague via videocall and in principle the SGE implementation works. We are able to successfully run a "add" workflow for testing and everything completes fine

Thanks for this, I might be seeing the same issue in my testing PR. Let me push a bit further with your latest changes and I'll try to get back to you!

QuantumChemist · 2024-10-24T16:58:06Z

I have some good news! Today I was finally able to extensively check my implementation with my colleague via videocall and in principle the SGE implementation works. We are able to successfully run a "add" workflow for testing and everything completes fine

Thanks for this, I might be seeing the same issue in my testing PR. Let me push a bit further with your latest changes and I'll try to get back to you!

Alright 👍🏻

gpetretto · 2024-10-25T08:36:47Z

Hi @QuantumChemist, thanks for testing the new implementation.
I should say I don't have an idea about why such an error could happen only when switching to atomate2 jobs. Does SGE have any command that would provide details on a terminated job? Just to know if it could be indeed possible that SGE killed the job, or not.
Just a few words to clarify the meaning of the error: jobflow-remote writes the output in a file Store in the run folder: for atomate2 you will expect to have a remote_job_data.json and a additional_store_data.json files in the folder. The rest of the response is written in the jfremote_out.json file. However, the jfremote_out.json is written first at the beginning, with just the starting time of the execution of the job: https://github.com/Matgenix/jobflow-remote/blob/82457776ed3ffa5fabca74daa4ee7af24cbaab4e/src/jobflow_remote/jobs/run.py#L40. The response is written at the end: https://github.com/Matgenix/jobflow-remote/blob/82457776ed3ffa5fabca74daa4ee7af24cbaab4e/src/jobflow_remote/jobs/run.py#L82.
If the VASP calculation completed, but the jfremote_out.json does not contain the response, the problem should have happened between the simulation end and the dumping of the file.
A few things that could be checked are:

does the remote_job_data.json contain the TaskDoc with output of the vasp job?
does the jfremote_out.json contain indeed only the start_time? Or also something else, but not the response?
Could it be that the final jfremote_out.json was written somewhere else, for example in the home? (I hardly believe so but just to check)
have the files like queue.err, queue.out, and the jsons been zipped by the atomate2 job?

QuantumChemist · 2024-10-25T15:26:48Z

Hi @gpetretto ,

thank you for your feedback! I will check the things you pointed out together with my colleague!

QuantumChemist · 2024-10-30T06:47:55Z

Hi @gpetretto ,
my colleague has investigated the problem further with your tips and here is what she says:

Hi Chris,

Just gone through the comments:

SGE command to get detailed job information: qstat -explain c -j JOB_ID, used this to track the jobs whilst running but showed no weird signs as far as I can tell, and once the job is completed this no longer shows any information.

remote_job_data.json does seem to contain the TaskDoc he mentions, with the outputs e.g. free energy, time elapsed etc within it

I’m pretty sure the jfremote_out.json has not been written anywhere else, not sure how I’d extensively check this but can’t see any files anywhere else

jfremote_out.json does contain the start_time and the response

The queue.err, queue.out, and json files have not been zipped. All the VASP outputs have been zipped. (In the successful “add” workflows the err, out and json files were not zipped, so I assume this is a good sign)

However, have been doing some more experimenting this evening, and have managed to get both Python and VASP test runs working. I kept on getting different errors today, looking like this:

And I think the issue here is that it was taking a very long time for jfremote_out.json to be written to the output folder after a calculation was completed, so it would be marked as a REMOTE_ERROR, then the out file would appear sometimes half an hour later. Running Jf job retry -s REMOTE_ERROR after that seemed to fix the workflow.
It all seems a bit inconsistent though, and maybe dependent on how heavily Epsilon is being used at the time and so how long it takes to write the outputs.

From what I have seen when we tested it together, it looked to me like there was some delay in writing the files. I can't say if this comes from SGE or the cluster Epsilon. Do you have an idea how to handle the delay?

gpetretto · 2024-10-30T10:28:47Z

Hi @QuantumChemist,

thanks a lot for the detailed reply.
From what you report, it indeed seems related to a delay between the end of the SGE job and the availability of the files on the file system of the front end of the cluster. I suppose this might be caused by a slow/overloaded NFS for example.
The output files like jfremote_out.json are written by jobflow inside the SGE job. Jobflow-remote's runner will just check if the SGE job is completed and then proceed with the download of the files. So here you are hitting two different kinds of issues:

if the job is very short (like in the last error), the jfremote_out.json is not even present, getting you that error while trying to download the files
for a more normal job, the file will be created at the beginning of the job and since it takes some time to be executed, it will be synchronized. But if when the job ends it takes some time to synchronize the updated version, the file will be downloaded, thus completing correctly the "download" phase. And then it will fail at the "complete" step because the response is missing.

If that is the problem, I can propose 2 solutions:

in the project configuration you can add a post_run for the worker. You can add a sleep there. This will be executed after the jfremote_out.json has been created, so it will give the system the time to synchronize it. I have no clue how long this sleep needs to be, as this should probably be based on an reasonable estimate of how long the system takes. This will waste some resources, since the sleep will be in the SGE job. However, if this is just a few seconds and it seems acceptable, then it could be the solution.
If the sychronization could istead take some time and adding a sleep for all the jobs may end up wasting a lot of computational resources, I think we can consider adding a delay_download option to the worker configuration that will postpone the start of the download by a specified amount of seconds. I have an idea about how to do that, but I will need to check that it really works fine.

I would ask you to try first with option 1. This should confirm that the issue is indeed the FS delay and probably give you a rough estimate of the waiting time required. If this is a use case, we can then consider implementing the option. Would that be fine?

In any case, this seems definitely a jobflow-remote issue and not related to your SGE implementation. If you confirm that the problem is the delay, I will open an issue there to further discuss it. This should not be a reason to hold off merging this PR.
@ml-evs any update on your side about the tests?

Initial commit for SGE interface

6fd2c7c

ml-evs mentioned this pull request Aug 2, 2024

Same ruff linting as jf-remote #42

Open

QuantumChemist added 4 commits August 2, 2024 20:54

adding extremely generic code

e4a57ef

adding SGEIO and SGEState to init

f602de7

add very generic tests based on the slurm tests (not all do pass)

a119af5

finalized first implemetation

d77800e

QuantumChemist marked this pull request as ready for review August 7, 2024 15:33

ml-evs self-requested a review August 15, 2024 22:58

ml-evs reviewed Aug 16, 2024

View reviewed changes

ml-evs requested a review from gpetretto August 16, 2024 10:21

Update src/qtoolkit/io/sge.py

a1911e3

Co-authored-by: Matthew Evans <[email protected]>

gpetretto reviewed Aug 18, 2024

View reviewed changes

QuantumChemist added 4 commits September 19, 2024 11:51

made nodes indeterminated

982627d

replaced not working links

d782fb0

removed queue states

28db648

removed queue states

6f58cd2

gpetretto mentioned this pull request Sep 19, 2024

Allow list of jobs from scheduler using username Matgenix/jobflow-remote#187

Merged

QuantumChemist added 3 commits September 19, 2024 12:17

remapped some job states to QState.FAILED and QState.SUSPENDED, added…

a8ed691

… a FINISHED job state and added another link for SGE documnetation

changed OutputParsingError to CommandFailedError

e153da1

set cpus and nodes to None in case of ValueError

ae01b4f

gpetretto reviewed Sep 25, 2024

View reviewed changes

QuantumChemist added 2 commits September 25, 2024 16:52

renamed _get_base_command to _get_qstat_base_command

e34a169

moved _qresources_mapping to the subclasses

ad8077f

QuantumChemist added 4 commits September 25, 2024 17:17

moved _qresources_mapping to the subclasses

3fae987

moved _qresources_mapping to the subclasses

a46aebd

moverd system_name, default_unit and power_labels to class attributes

74d1867

remove scope=session from maximalist_qresources

e968913

made instance attributes to class attributes

5b672ad

QuantumChemist added 2 commits October 24, 2024 16:48

Merge branch 'Matgenix:develop' into sge

c85b3ea

Update sge.py

383aada


		return v * (1024 ** power_labels[units.lower()])

		_qresources_mapping = {

Implementation of SGE interface #43

Are you sure you want to change the base?

Implementation of SGE interface #43

Conversation

QuantumChemist commented Aug 2, 2024

QuantumChemist commented Aug 7, 2024 • edited Loading

ml-evs commented Aug 7, 2024

ml-evs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuantumChemist commented Aug 16, 2024

gpetretto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuantumChemist commented Aug 21, 2024

ml-evs commented Aug 21, 2024

gpetretto commented Aug 21, 2024

QuantumChemist commented Sep 18, 2024

QuantumChemist commented Sep 19, 2024

gpetretto commented Sep 19, 2024

QuantumChemist commented Sep 19, 2024

gpetretto left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuantumChemist Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuantumChemist commented Sep 25, 2024 • edited Loading

gpetretto commented Sep 25, 2024

QuantumChemist commented Sep 25, 2024

QuantumChemist commented Sep 25, 2024

QuantumChemist commented Sep 25, 2024

gpetretto commented Sep 27, 2024

QuantumChemist commented Sep 30, 2024

gpetretto commented Oct 2, 2024

QuantumChemist commented Oct 2, 2024

ml-evs commented Oct 11, 2024

QuantumChemist commented Oct 11, 2024

ml-evs commented Oct 11, 2024

QuantumChemist commented Oct 11, 2024

QuantumChemist commented Oct 24, 2024 • edited Loading

ml-evs commented Oct 24, 2024

QuantumChemist commented Oct 24, 2024

gpetretto commented Oct 25, 2024 • edited Loading

QuantumChemist commented Oct 25, 2024

QuantumChemist commented Oct 30, 2024

gpetretto commented Oct 30, 2024

QuantumChemist commented Aug 7, 2024 •

edited

Loading

gpetretto left a comment •

edited

Loading

QuantumChemist Sep 25, 2024 •

edited

Loading

QuantumChemist commented Sep 25, 2024 •

edited

Loading

QuantumChemist commented Oct 24, 2024 •

edited

Loading

gpetretto commented Oct 25, 2024 •

edited

Loading