Skip to content

Release v54.9.0

Compare
Choose a tag to compare
@clingen-sthlm clingen-sthlm released this 04 Jan 09:48
· 1217 commits to master since this release

Set BCLConvert as default in automation (#2736)(minor)

Description

Part of Clinical-Genomics/project-planning#521

TL;DR

This PR implements the code to generate correct V2 sample sheets (also known in the code as BCLConvert sample sheets) for HiSeq flow cells keeping compatibility with NovaSeq flow cells and sets this kind of sample sheets as the default in the automation.

Details

Currently, the bcl converter of a flow cell is determined based on the sequencer:

  • Bcl2fastq for HiSeqX and HiSeq2500
  • BCLConvert for NovaSeq6000 and NovaSeqX

This PR makes the bcl converter of a flow cell be BCLConvert by default, so it will only be Bcl2fastq if it is given as a parameter in the installation of the FlowCellDirectoryData class. This means the sample sheets generated would be V2 unless explicitly specifying Bcl2fastq converter.

As stated in #2781, the rules for generating the attributes of the sample sheet have changed. 7 flow cell cases have been identified:

  • HiSeqX with single index run (also covering HiSeq2500 single run)
  • HiSeqX with dual index run
  • HiSeq2500 with dual index run
  • HiSeq2500 with custom index run (index1 of 17nt and index2 of 8nt)
  • NovaSeq6000 pre 1.5 kits
  • NovaSeq6000 post 1.5 kits
  • NovaSeqX

The changes in the rules are the following:

  1. HiSeq single index run:
  • Remove IndexCycles2 from the [Reads] section
  • index2 column in [Data] section must be empty or not be present at all
  • barcode_mismatch2 column in [Data] section must not be present at all
  • Override cyles must omit the index2 parsing, e.g. Y151;I8;Y151

2 & 3. HiSeq dual index run: These flow cells (as some NovaSeq) have single and dual index samples together. In case the samples are dual index, the rules stay the same as for NovaSeqX. For single-index samples:

  • entry in barcode_mismatch2 column in [Data] section must be empty or 'na'. Set to 'na' for clarity
  • entry in index2 column in [Data] section must be empty
  • Override cycles must show the skipping of the index 2 with a leading N in the index2 parsing, e.g. Y151;I8;N8;Y151;
  1. HiSeq2500 with custom index run
  • The poly-N tail of index1 should be removed, rules for index1 and index2 equality should be modified
  1. No changes w.r.t. master
  2. No changes w.r.t. master
  3. No changes w.r.t. master

Added

  • Test functions for FlowCellBCLConvertSample and FlowCellBcl2FastqSample classes
  • Fixtures and fixture files for new flow cell cases

Changed

  • Separate the module cg/apps/demultiplex/sample_sheet/models.py into sample models (cg/apps/demultiplex/sample_sheet/sample_models.py) and sample sheet models (cg/apps/demultiplex/sample_sheet/sample_sheet_models.py) for readability.
  • Made all the sample updating logic part of the FlowCellSample class, moving all the functions in index.py that have sample logic to one of the flow cell sample models in cg/apps/demultiplex/sample_sheet/sample_models.py
    • get_index_pair -> separate_indexes
    • update_barcode_mismatch_values_for_sample -> update_barcode_mismatches
    • pad_and_reverse_complement_sample_indexes -> _pad_indexes_if_necessary
    • update_indexes_for_samples -> process_indexes
  • Moved the IndexSettings logic from the sample sheet creator to the RunParameters class
  • Moved all the override cycles and barcode mismatch updating logic from the sample sheet creator to the FlowCellSample models
  • Removed the index length equality rule from RunParameters
  • Removed the bcl_converter attribute from the FlowCellDirectoryData class, as it can be accessed through the run_parameters attribute

Fixed

  • Tests and usages that depended of the bcl converter of the flow cell

Co-authored-by: Henrik Stranneheim [email protected]