Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'fastq-glue' service #870

Open
alexiswl opened this issue Feb 21, 2025 · 3 comments · May be fixed by #942 or #899
Open

Add 'fastq-glue' service #870

alexiswl opened this issue Feb 21, 2025 · 3 comments · May be fixed by #942 or #899
Assignees
Labels
feature New feature pipeline Workflow/Pipeline Manager

Comments

@alexiswl
Copy link
Member

alexiswl commented Feb 21, 2025

Fastq Manager shouldn't be responsible for listening to events that could be used to import fastq metadata.
However no other service should also be responsible for registering these fastqs.

Instead we have a simple 'fastq-glue' service that performs the following:

  1. Listens to the sequencer run manager event (which ever one says that a samplesheet exists) - this is hopefully before the run finishes

    • Creates fastq-sets for each library in the samplesheet, and deals with the logic of if it should be added to the existing fastq set for that library or register a new fastq set (handle topup / rerun scenario).
    • Generates an event saying 'new fastqs registered' which simply prints an instrument run id.
      • Future glue services will be able to then subscribe to this event and query the fastq manager and decide if they need to unarchive any complementary data, and call the unarchive service, see Add fastq-sync service #871, say new tumor has arrived for a subject that ran multiple months ago. Note that this would run asynchronously. Glue services instead call the fastq-sync service in Add fastq-sync service #871.
  2. Listens to the future bclconvert / ora-compression pipeline that places ora files in the primary directory.

    • Adds 'readsets' to the fastq objects mentioned above, so now fastq objects have file-information.
    • Generates an event to say that fastq object readsets have been added to the database for the instrument run id.
      • This will be the trigger for future glue services to generate 'READY' events.

This will replace the 'clag' services of the stacky stack. No more showers :(

@alexiswl alexiswl added the feature New feature label Feb 21, 2025
@alexiswl alexiswl self-assigned this Feb 21, 2025
@alexiswl
Copy link
Member Author

alexiswl commented Feb 21, 2025

A few other things that this service could/should provide.

  1. Running sequali stats for new fastq samples.
  2. Calculating the total fileGzSizeInBytes for fastq samples - beneficial for services that need to decompress ora samples prior to running their analyses such as cttsov2
  3. Generating an ntsm for these samples

@alexiswl
Copy link
Member Author

From Sequence Run Manager

Image

From BSSH Fastq Copy Service

Image

@alexiswl alexiswl linked a pull request Mar 12, 2025 that will close this issue
3 tasks
@alexiswl
Copy link
Member Author

Related to #906

@alexiswl alexiswl linked a pull request Apr 1, 2025 that will close this issue
@victorskl victorskl added the pipeline Workflow/Pipeline Manager label Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature pipeline Workflow/Pipeline Manager
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants