Release v54.9.0
Set BCLConvert as default in automation (#2736)(minor)
Description
Part of Clinical-Genomics/project-planning#521
TL;DR
This PR implements the code to generate correct V2 sample sheets (also known in the code as BCLConvert sample sheets) for HiSeq flow cells keeping compatibility with NovaSeq flow cells and sets this kind of sample sheets as the default in the automation.
Details
Currently, the bcl converter of a flow cell is determined based on the sequencer:
- Bcl2fastq for HiSeqX and HiSeq2500
- BCLConvert for NovaSeq6000 and NovaSeqX
This PR makes the bcl converter of a flow cell be BCLConvert by default, so it will only be Bcl2fastq if it is given as a parameter in the installation of the FlowCellDirectoryData class. This means the sample sheets generated would be V2 unless explicitly specifying Bcl2fastq converter.
As stated in #2781, the rules for generating the attributes of the sample sheet have changed. 7 flow cell cases have been identified:
- HiSeqX with single index run (also covering HiSeq2500 single run)
- HiSeqX with dual index run
- HiSeq2500 with dual index run
- HiSeq2500 with custom index run (index1 of 17nt and index2 of 8nt)
- NovaSeq6000 pre 1.5 kits
- NovaSeq6000 post 1.5 kits
- NovaSeqX
The changes in the rules are the following:
- HiSeq single index run:
- Remove
IndexCycles2
from the[Reads]
section index2
column in[Data]
section must be empty or not be present at allbarcode_mismatch2
column in[Data]
section must not be present at all- Override cyles must omit the index2 parsing, e.g.
Y151;I8;Y151
2 & 3. HiSeq dual index run: These flow cells (as some NovaSeq) have single and dual index samples together. In case the samples are dual index, the rules stay the same as for NovaSeqX. For single-index samples:
- entry in
barcode_mismatch2
column in[Data]
section must be empty or'na'
. Set to'na'
for clarity - entry in
index2
column in[Data]
section must be empty - Override cycles must show the skipping of the index 2 with a leading
N
in the index2 parsing, e.g.Y151;I8;N8;Y151;
- HiSeq2500 with custom index run
- The poly-N tail of index1 should be removed, rules for index1 and index2 equality should be modified
- No changes w.r.t. master
- No changes w.r.t. master
- No changes w.r.t. master
Added
- Test functions for
FlowCellBCLConvertSample
andFlowCellBcl2FastqSample
classes - Fixtures and fixture files for new flow cell cases
Changed
- Separate the module
cg/apps/demultiplex/sample_sheet/models.py
into sample models (cg/apps/demultiplex/sample_sheet/sample_models.py
) and sample sheet models (cg/apps/demultiplex/sample_sheet/sample_sheet_models.py
) for readability. - Made all the sample updating logic part of the FlowCellSample class, moving all the functions in
index.py
that have sample logic to one of the flow cell sample models incg/apps/demultiplex/sample_sheet/sample_models.py
get_index_pair
->separate_indexes
update_barcode_mismatch_values_for_sample
->update_barcode_mismatches
pad_and_reverse_complement_sample_indexes
->_pad_indexes_if_necessary
update_indexes_for_samples
->process_indexes
- Moved the
IndexSettings
logic from the sample sheet creator to the RunParameters class - Moved all the override cycles and barcode mismatch updating logic from the sample sheet creator to the FlowCellSample models
- Removed the index length equality rule from RunParameters
- Removed the
bcl_converter
attribute from the FlowCellDirectoryData class, as it can be accessed through therun_parameters
attribute
Fixed
- Tests and usages that depended of the bcl converter of the flow cell
Co-authored-by: Henrik Stranneheim [email protected]