Changing from signac-flow to row: Row status keeps completion status after completion file is deleted? #83

bcrawford39GT · 2025-03-14T13:58:23Z

bcrawford39GT
Mar 14, 2025

Hello!

I am working to change from signac-flow to row for a simple tutorial and example signac_numpy_tutorial for the university and general public. I noticed that when generating the "signac_job_document.json" file to track other calculated or site specific variables, as done with signac-flow, it work OK and accepts it being completed in row correctly (like signac-flow). However, when I delete the file and rerun "row show status", the completion status persists, despite it not being completed anymore.

In signac-flow, it would check everything every time from the "labels" (likely another reason it was slower). However, this posses an issue when you are adding to a more state points, more replicates, etc. Typically, I would just deleted the combined analysis files or folder in the main project space (where project.py or actions.py is) for all the combined workspace folders, replicates... when another workspace folder (state point) was created. This would force it to then rerun the analysis as it would be marked incomplete then.

It looks like this info is held in ".row/completed.postcard" and static, even if files are deleted. Is the idea to use "row scan" and "row clean --completed" and check? This is good, but may result in errors as it will not default catch these changes and potentially not run parts that are supposed to be run/rerun. Maybe I am missing the concept though. Is there a reason not to scan every time? Maybe it is too costly?

I would be happy to share my current row stating setup as needed, please just ask.

Is there a work around for this that I am missing that make the jobs uncompleted when it is no longer completed or its marked completion file is deleted? Maybe I am doing something wrong?

Instead of just searching for a file (assume if the file exists), Is there a way to run a function and accept a True/False like signac-flow, instead of just looking for if a specific file is there?
(example below)?

def part_1_initial_parameters_completed(*jobs):
    """Check that the data is generated and in the json files."""
    for job in jobs:
        data_written_bool = False
        if job.isfile(f"{'signac_job_document.json'}"):
            data_written_bool = True

        return data_written_bool

I will also be looking for a way to aggregate functions and jobs for replicates or all the jobs in 1, etc.

Any other thoughts or comments?

Versions used via micromamba:
row 0.4.0 h0716509_0 conda-forge
signac 2.2.0 pyhd8ed1ab_1 conda-forge
signac-dashboard 0.6.1 pyhd8ed1ab_1 conda-forge

joaander · 2025-03-14T14:47:23Z

joaander
Mar 14, 2025
Maintainer

Yes, this is intentional behavior. Row is fast by default. Fully scanning the entire workspace tree for completion files on every invocation would decrease performance by many orders of magnitude. This section of the documentation addresses the use-case you describe, specifically the "You delete product files in a directory" case: https://row.readthedocs.io/en/0.4.0/guide/concepts/cache.html#completed-directories

Is there a work around for this that I am missing that make the jobs uncompleted when it is no longer completed or its marked completion file is deleted? Maybe I am doing something wrong?

After you delete a completion file (or many completion files), you can clear the cache and determine the new set of completed actions with the commands:

row clean --completed
row scan

To prevent race conditions, you can only take these steps when there are no currently submitted jobs -- row clean --completed will raise an error if you execute it while submitted jobs remain.

At one point, I considered implementing a row uncomplete command that would combine the deletion of the product file and the cache update. I don't recall why I chose not to implement this command. Ultimately, it only addresses a single use-case while the documentation on the cache exhaustively covers all possible cases where an out of sync cache is possible.

If your workflow involves regularly invalidating the completion status of actions, consider removing the product entirely and then you can resubmit the job whenever you need to rerun it.

Is there a way to run a function and accept a True/False like signac-flow, instead of just looking for if a specific file is there?

The fundamental assumption that row makes is that when an action is complete on a directory then it will never need to be executed again. If you want dynamic scheduling (i.e. based on the phase of the moon), use a different tool. Think of if this way, say you have a 3 step worklow A->B->C. You run A. After A completes, B is now eligible and you run it. Now, C is eligible. But then you decide to change the completion condition for A. With dynamic completion conditions, now both A and C are eligible to run on the same directory. That violates the notion of C depending on B which depends on A. Row can't completely prevent you from doing this, but it tries to keep the workflow in a logically consistent state as much as possible.

The row way of doing what you ask is to move that logic into the action script and to touch a file when the completion condition is true. In the signac data model, you should create new statepoints to perform additional work. row will automatically detect the addition of new directories when you run row show status.

Touching a file appears to be a kludge at first glance (prefer to use the existence of actual output files whenever possible), but the only alternative would be a full fledged client/server database. This is possible on some HPC systems and not on others. In all cases, I think it is impractical to ask researchers to configure and manage the security of a database system solely for the purpose of managing workflow actions. What row is doing is using the filesystem itself as the database.

I will also be looking for a way to aggregate functions and jobs for replicates or all the jobs in 1, etc.

Jobs are aggregated by default in row. See https://row.readthedocs.io/en/0.4.0/guide/tutorial/group.html. You actually need to take additional steps (setting maximum_size) to get the signac-flow default behavior of 1 slurm job per directory. The grouping system in row is much improved over signac-flow. Row's groups are determined dynamically, where flow's aggregates are static. This allows row to group many directories into 1 SLURM job and effectively load-balance when some directories are complete and others are not.

You mention using groups to summarize the results of many directories (e.g. averaging over replicates). I wrote a whole howto on this topic: https://row.readthedocs.io/en/0.4.0/guide/howto/summarize.html#summarize-directory-groups-with-an-action. In short, set submit_whole to ensure that row will not execute your action on only some of the replicates.

One solution to the "summary" jobs is to give them no product files. These actions will then always be available to rerun when needed with no need to manually delete files or clear the cache.

2 replies

bcrawford39GT Mar 14, 2025
Author

Thanks for the explanation. I plan on using row, but it may take a little getting used to as with everything!

bcrawford39GT Mar 14, 2025
Author

Part A -> B -> C makes sense. The majority of the cases where you would delete the combined analysis file stem from using signac's init.py to add more replicates or state points. I think the key thing, is just to run the below after running init.py, or building those commands into the end of the init.py script (this may actually resolve the issue). Your explanation helped me make this determination though, so thank you again.

row clean --completed
row scan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changing from signac-flow to row: Row status keeps completion status after completion file is deleted? #83

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Changing from signac-flow to row: Row status keeps completion status after completion file is deleted? #83

Uh oh!

Uh oh!

bcrawford39GT Mar 14, 2025

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

joaander Mar 14, 2025 Maintainer

Uh oh!

bcrawford39GT Mar 14, 2025 Author

Uh oh!

Uh oh!

bcrawford39GT Mar 14, 2025 Author

bcrawford39GT
Mar 14, 2025

Replies: 1 comment 2 replies

joaander
Mar 14, 2025
Maintainer

bcrawford39GT Mar 14, 2025
Author

bcrawford39GT Mar 14, 2025
Author