Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat - new sample sheet validation and API #2958

Merged
merged 43 commits into from
Feb 22, 2024
Merged

Conversation

diitaz93
Copy link
Contributor

@diitaz93 diitaz93 commented Feb 16, 2024

Description

Fix #2922

TL;DR

Create a more strict validation for v2 sample sheets through SampleSheetValidator and OverrideCyclesValidator classes and implement it in the CLI commands through a new SampleSheetAPI. The sample sheets are now validated every time they are fetched from Housekeeper or read from a file.

Tips for reviewing

  • Start checking files cg/apps/demultiplex/sample_sheet/sample_sheet_validator.py and cg/apps/demultiplex/sample_sheet/override_cycles_validator.py which hold the new validator classes.
    • OverrideCyclesValidator has only one endpoint validate_sample, used only in SampleSheetValidator.
    • SampleSheetValidator has the endpoint functions validate_sample_sheet_from_content, validate_sample_sheet_from_file and get_sample_sheet_object_from_file.
  • Continue with the file cg/apps/demultiplex/sample_sheet/api.py that holds the new API SampleSheetAPI and compare it with the old CLI commands in cg/cli/demultiplex/sample_sheet.py. There are 3 endpoints in the API corresponding to the 3 CLI commands:
    • validate -> validate_sample_sheet
    • get_or_create_sample_sheet -> create
    • get_or_create_all_sample_sheets -> create-all
  • Continue with the rest of the changes

New validation

This PR implements a new validation for v2 sample sheets through the new class SampleSheetValidator that takes into account 5 aspects:

  1. All sections are present
    1. [Header]
    2. [Reads]
    3. [BCLConvert_Settings]
    4. [BCLConvert_Data]
  2. IndexSettings is present in the [Header] of the sample sheet
  3. Run read and index cycles are present in the [Reads] section and are valid
  4. The [BCLConvert_Data] section has the correct columns (sample validation)
  5. The override cycle values are correct. This implies that the cycle values for the reads and the index match with the run cycles specified in the [Reads] section and the index2 cycles are in the correct format according to the IndexSettings (reverse or forward). This is implemented through a new class OverrideCyclesValidator.

Added

  • A new class SampleSheetValidator with the endpoint function validate_sample_sheet which will be the new function to validate sample sheets.
  • A new class OverrideCyclesValidator with the endpoint function validate_sample which will validate if the override cycles for a single sample is correct. It is called for each sample inside the SampleSheetValidator.
  • A new SampleSheetAPI that takes the logic away from the CLI commands.
  • Tests and fixtures for new functions

Changed

  • Took the validation away from cg/apps/demultiplex/sample_sheet/read_sample_sheet.py into the validator class.
  • The sample sheet is now validated every time is fetched (for creation and demultiplexing).

Fixed

  • Tests for CLI command, parametrised with new fixtures

How to prepare for test

  • Ssh to relevant server (depending on type of change)
  • Use stage: us
  • Paxa the environment: paxa
  • Install on stage (example for Hasta):
    bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_cg -t cg -b feat-sample-sheet-api -a

How to test

  • Creation of sample sheets: cg demultiplex samplesheet create <flow_cell_name>
Type of sample sheet ordered Type of sample sheet in HK Type of sample sheet in Flow cell directory Result
bcl2fastq BclConvert - correct BclConvert - correct created new sample sheet
bcl2fastq None BclConvert - correct created new sample sheet
bcl2fastq None None created new sample sheet
bcl2fastq bcl2fastq - wrong bcl2fastq - correct used flow cell sample sheet
bcl2fastq bcl2fastq - wrong bcl2fastq - wrong created new sample sheet
bcl2fastq bcl2fastq - correct N/A used Housekeeper sample sheet
BclConvert bcl2fastq - correct bcl2fastq - correct created new sample sheet
BclConvert bcl2fastq - correct BclConvert - correct used flow cell sample sheet
BclConvert BclConvert - correct N/A used Housekeeper sample sheet
BclConvert None None created new sample sheet
BclConvert None BclConvert - correct used flow cell sample sheet
BclConvert BclConvert - missing section None created new sample sheet
BclConvert BclConvert - missing index setting None created new sample sheet
BclConvert BclConvert - missing run cycles None created new sample sheet
BclConvert BclConvert - wrong read cycles None created new sample sheet
BclConvert BclConvert - wrong index1 cycles None created new sample sheet
BclConvert BclConvert - wrong index2 cycles None created new sample sheet

Create all

  • Create all sample sheets when there are valid Bcl2fastq sample sheets in Housekeeper

Validation of sample sheets

  • Validate correct and incorrect sample sheets (bcl2fastq & BCLConvert)

Demultiplexing

  • Start demultiplexing a flow cell with an incorrect sample sheet

Expected test outcome

  • Check that correct sample sheets replace the incorrect ones when creating new sample sheets
  • Check that demultiplexing does not start and raises an error/warning
  • Check that wrong sample sheets fail validation and correct sample sheets pass validation

Review

  • Tests executed by SD
  • "Merge and deploy" approved by SA
    Thanks for filling in who performed the code review and the test!

This version is a

  • MINOR - when you add functionality in a backwards compatible manner

Implementation Plan

  • Deploy this branch on
    • cg stage
    • cg production
  • Inform to @karlnyr

@diitaz93 diitaz93 mentioned this pull request Feb 19, 2024
15 tasks
@diitaz93 diitaz93 marked this pull request as ready for review February 19, 2024 11:22
@diitaz93 diitaz93 requested a review from a team as a code owner February 19, 2024 11:22
@diitaz93 diitaz93 requested a review from karlnyr February 19, 2024 11:23
@diitaz93 diitaz93 self-assigned this Feb 19, 2024
@clingen-sthlm clingen-sthlm temporarily deployed to stage February 20, 2024 10:06 Inactive
@diitaz93
Copy link
Contributor Author

Testing on stage

Testing command cg demultiplex samplesheet create-all when there are Bcl2fastq flow cells in Housekeeeper:

$ cg demultiplex samplesheet create-all
Running cg demultiplex.
Called undefined __fields__ on HousekeeperAPI, please wrap
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/180509_D00450_0598_BHGYFNBCX2/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HGYFNBCX2/2022-01-18/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL2FASTQ
Samplesheet passed Bcl2Fastq validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/170517_ST-E00266_0210_BHJCFFALXX/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HJCFFALXX/2022-07-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HLYWYDSXX/2023-02-09/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/181005_D00410_0735_BHM2LNBCX2/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HM2LNBCX2/2023-02-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/20231108_LH00188_0028_B22F52TLT3/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/22F52TLT3/2023-11-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL2FASTQ
Samplesheet passed Bcl2Fastq validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/180508_ST-E00269_0269_AHL32LCCXY/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HL32LCCXY/2022-01-18/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/230912_A00187_1009_AHK33MDRX3/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HK33MDRX3/2023-09-12/SampleSheet.csv

@diitaz93
Copy link
Contributor Author

Tests on stage

Test demultiplexing a flow cell with an invalid sample sheet with cg demultiplex flow-cell <flow_cell_name>

 $ cg -l DEBUG demultiplex flow-cell --dry-run 190927_A00689_0069_BHLYWYDSXX
Running cg demultiplex.
Running cg demultiplex flow cell, using None
Instantiating sample sheet API
Instantiating housekeeper api
Initializing Store
Instantiating lims api
Called undefined __fields__ on HousekeeperAPI, please wrap
Instantiating demultiplexing api
Called undefined __fields__ on HousekeeperAPI, please wrap
Initialising Process with binary: sbatch
Use base call ['sbatch']
Set environment to stage
DemultiplexingAPI: Set dry run to True
SlurmAPI: Set dry run to True
setting flow cell id to 190927_A00689_0069_BHLYWYDSXX
setting demultiplexed runs dir to /home/proj/stage/sequencing_data/illumina/demultiplexed-runs
Instantiating FlowCellDirectoryData with path /home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX
Set flow cell id to BHLYWYDSXX
Check if demultiplexing is possible for HLYWYDSXX
Check if flow cell is ready for downstream processing
Check if sequencing is done
Sequence is done for flow cell HLYWYDSXX
Check if copy of data from sequence instrument is ready
All data has been transferred for flow cell HLYWYDSXX
Flow cell HLYWYDSXX is ready for downstream processing
Check if sample sheet exists
Fetch latest version from bundle HLYWYDSXX
Fetching files with tags in [HLYWYDSXX,samplesheet]
Fetching files from version 130972
Sample sheet was generated for BCL Convert
Validating BCLConvert sample sheet
Validating that the sample sheet has all the necessary sections
Looking for index settings in the sample sheet
No index settings found in sample sheet
Malformed sample sheet. Run cg demultiplex samplesheet validate /home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX/SampleSheet.csv
Aborted!

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@diitaz93 diitaz93 merged commit 5a49c9c into master Feb 22, 2024
9 checks passed
@diitaz93 diitaz93 deleted the feat-sample-sheet-api branch February 22, 2024 07:47
@diitaz93
Copy link
Contributor Author

Deployed to stage:

repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/js.diazboada
done.
Log deploy... done.
cg, version 59.4.0
θ71° [js.diazboada@hasta:~] [S_base] 3m16s $ up

Deployed to production:

repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/js.diazboada
done.
Log deploy... done.
cg, version 59.4.0
θ67° [js.diazboada@hasta:~] [P_base] 2m40s $ 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regenerate all v2 sample sheets from before 25-01-24
5 participants