-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fastq-sync service #871
Comments
The task token here is to synchronise the "task success" event for a set of libraries? |
Ah, the task token is to be able to resume (possibly multiple) paused step functions (as those are waiting for a particular token to be passed to them)? I can't say I like that concept, but I can understand where it's coming from in that case. Would it be possible to simply fail the "check fastqs" part and repeat it each time a new "fastq available" event comes through? Note: conceptual thinking for the future, does not have to influence initial implementations! |
It would only be able to pause one single step function, as the task token is bound to the task itself. It cannot be generated prior to the task (in this case a put event).
Benefit here is that you already have the tn coupling ready to go. If I'm pulling out a normal from archive to run on a new tumor sample for a patient with multiple tumors, how do we know what to pair when we get the fastq available event from the normal being thawed. With the task token syncing, the pairing happens at the samplesheet initialisation stage where we have pairing knowledge for a given sequencing run. Fastqs will still be made available by the FastqAvailable event and can / and will be done so independent of any task tokens. This is essentially a wrapper around that service. |
Understood. I am not sure I fully grasp the details here, so I was just wondering if another, less coupled path would also be feasible, but I recognise that this may only be possible with additional setup/services in the future. E.g. The pairing would happen on a predefined trigger, a new sequencing run, the arrival of new data, etc. That's independent of the availability of any data though. So If I see this correct, then your current path would start the related workflows, which in turn would check the FASTQ availability. Depending on that it might end up in a pause to the execution waiting until the required file(s) become available. That "waiting" is realised via task tokens send along to the availability check and corresponding "release" events for each execution/token whenever a fastq is restored/becomes available. My idea is very similar, but without the direct coupling or task tokens: I'd simply "fail" (or exit) the initial workflow execution if the required files are not available. I'd record those as "pending" or "waiting for requirement" and on each new file available event I'd run them again to see if the requirements are now met with the new data. It would however align with a future vision where the "READY-ness" of a workflow is evaluated outside and independently of any workflow execution. Such a "READY-ness check" would have to be quick to run and might have to consider a number of factors changing. Any workflow that was triggered before it was ready, would simply "fail" (with appropriate response). Again, this question / idea is more for the future and not meant to replace/change any current setups. For now I am just interested whether it would make conceptual sense at all... |
No, the step function above is in the 'glue' bit, after the fastqs are available, the ready command would be generated |
And that would then start the workflow? |
Yep, see the diagram above, the ready event isn't triggered until the fastqs become available through the sync services |
OK, so it's the same scenario, but one level up? |
This is a
simpleservice that performs the followingListens to 'CheckFastqAvailableSync' events generated by workflow glue services.
Listens to 'FastqAvailable' events from the (future) fastq-glue services. See Add 'fastq-glue' service #870
Listens to events from glue services requesting fastq unarchiving and calls the unarchive manager (asynchronous call from glue services- see Add 'fastq-glue' service #870)
Listens to the unarchive manager (Add synchronous s3 copy service into Orcabus #869) and checks with any FastqAvailableSync pending tokens to determine if the data is available yet for their workflow.
This allows the glue services to perform the following:
As soon as the fastq-glue services says that there are new sets available on the instrument run, we can trigger a step function to run that starts with 'CheckFastqAvailable' events for each library required in the analysis, these will hang until the data for these services is available.
Glue services can then re-query the readset now with the fastq manager now pointing to the restored file uris, and as such a READY event knowing that the data is readily available.
The text was updated successfully, but these errors were encountered: