Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to re-generate all sample sheets #2930

Closed
diitaz93 opened this issue Feb 9, 2024 · 4 comments
Closed

Find a way to re-generate all sample sheets #2930

diitaz93 opened this issue Feb 9, 2024 · 4 comments

Comments

@diitaz93
Copy link
Contributor

diitaz93 commented Feb 9, 2024

Description

We need to regenerate many sample sheets to solve two issues:

  1. As mentioned in Regenerate all v2 sample sheets from before 25-01-24 #2922 the v2 sample sheets generated before 2024-01-25 don't have the patch on the Override cycles index 2, so the samples are basically being identified only by index1 (more urgent issue)
  2. For making BCLConvert the default method in production and deprecate bcl2fastq, we need to update all v1 sample sheets to v2.

The creation of new sample sheets requires the information in the RunParameters.xml files for each flow cell, which are currently unavailable for many flow cells whose sample sheets are in Housekeeper. This means that we can't run a script to update all sample sheets at once, we will have to do it on the run, once the RunParameters file is available for each flow cell.

Suggested solutions

  1. Implement a new sample sheet validation (different from the current one) which checks if a sample sheet has already the correct form of Override Cycles. This validation will be run whenever an old flow cell is fetched from Housekeeper or from a flow cell directories fetched from PDC. If the validation fails, regenerate the sample sheet with the current code which has the patch. This could also be needed when manually demultiplexing a flow cell, so the CLI command cg demultiplex samplsheet validate would have to include this update. UPDATE: Done in feat - new sample sheet validation and API #2958
    1. This will make some manual modifications done to the sample sheet be lost
    2. We could add an entry to the [Header] section on the sample sheet with the Index Settings for the sample sheet, so that the Run Parameters is not needed for the validation.
    3. We could also remove the empty columns AdapterRead1 and AdapterRead2 from the sample sheet now that we are re-generating them anyway.
  2. Remove all NovaSeq and NovaSeqX sample sheets from Housekeeper, so that we force the sample sheets to be re-generated. This can be done quickly while the new sample sheet validation is under development.

Independent of the chosen solution, it would be good to start storing the RunParameters file in the flow cell bundle together with the sample sheet and the logs.

This can be closed when

A method for re-generate the sample sheets in in place in production

Blocked by

If there are any blocking issues/prs/things in this or other repos. Please link to them.

@diitaz93
Copy link
Contributor Author

diitaz93 commented Feb 9, 2024

Started a demultiplexing on DRAGEN with jobid 6037368 with a modified sample sheet that includes the index settings in the Header section to see if it is allowed

UPDATE: The demultiplexer worked

@diitaz93
Copy link
Contributor Author

UPDATE: Index settings was added to the sample sheet in #2931

@diitaz93
Copy link
Contributor Author

UPDATE: First part of the issue was solved in #2958

@diitaz93
Copy link
Contributor Author

diitaz93 commented Apr 3, 2024

After the translation of sample sheets implemented in #3062 and the discussion around this, the issue looses relevance. Also, the run parameters addition is already stated in #3085

@diitaz93 diitaz93 closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant