-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat - new sample sheet validation and API #2958
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 tasks
diitaz93
commented
Feb 19, 2024
ChrOertlin
reviewed
Feb 19, 2024
ChrOertlin
reviewed
Feb 19, 2024
ChrOertlin
reviewed
Feb 19, 2024
Testing on stageTesting command $ cg demultiplex samplesheet create-all
Running cg demultiplex.
Called undefined __fields__ on HousekeeperAPI, please wrap
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/180509_D00450_0598_BHGYFNBCX2/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HGYFNBCX2/2022-01-18/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL2FASTQ
Samplesheet passed Bcl2Fastq validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/170517_ST-E00266_0210_BHJCFFALXX/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HJCFFALXX/2022-07-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HLYWYDSXX/2023-02-09/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/181005_D00410_0735_BHM2LNBCX2/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HM2LNBCX2/2023-02-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/20231108_LH00188_0028_B22F52TLT3/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/22F52TLT3/2023-11-08/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL2FASTQ
Samplesheet passed Bcl2Fastq validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/180508_ST-E00269_0269_AHL32LCCXY/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HL32LCCXY/2022-01-18/SampleSheet.csv
Fetching and validating sample sheet from Housekeeper
Sample sheet was generated for BCL Convert
Samplesheet passed BCLConvert validation
Sample sheet from Housekeeper is valid. Copying it to flow cell directory
/home/proj/stage/sequencing_data/illumina/flow_cells/230912_A00187_1009_AHK33MDRX3/SampleSheet.csv already exists. Overwriting with /home/proj/stage/housekeeper-bundles/HK33MDRX3/2023-09-12/SampleSheet.csv |
Tests on stageTest demultiplexing a flow cell with an invalid sample sheet with $ cg -l DEBUG demultiplex flow-cell --dry-run 190927_A00689_0069_BHLYWYDSXX
Running cg demultiplex.
Running cg demultiplex flow cell, using None
Instantiating sample sheet API
Instantiating housekeeper api
Initializing Store
Instantiating lims api
Called undefined __fields__ on HousekeeperAPI, please wrap
Instantiating demultiplexing api
Called undefined __fields__ on HousekeeperAPI, please wrap
Initialising Process with binary: sbatch
Use base call ['sbatch']
Set environment to stage
DemultiplexingAPI: Set dry run to True
SlurmAPI: Set dry run to True
setting flow cell id to 190927_A00689_0069_BHLYWYDSXX
setting demultiplexed runs dir to /home/proj/stage/sequencing_data/illumina/demultiplexed-runs
Instantiating FlowCellDirectoryData with path /home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX
Set flow cell id to BHLYWYDSXX
Check if demultiplexing is possible for HLYWYDSXX
Check if flow cell is ready for downstream processing
Check if sequencing is done
Sequence is done for flow cell HLYWYDSXX
Check if copy of data from sequence instrument is ready
All data has been transferred for flow cell HLYWYDSXX
Flow cell HLYWYDSXX is ready for downstream processing
Check if sample sheet exists
Fetch latest version from bundle HLYWYDSXX
Fetching files with tags in [HLYWYDSXX,samplesheet]
Fetching files from version 130972
Sample sheet was generated for BCL Convert
Validating BCLConvert sample sheet
Validating that the sample sheet has all the necessary sections
Looking for index settings in the sample sheet
No index settings found in sample sheet
Malformed sample sheet. Run cg demultiplex samplesheet validate /home/proj/stage/sequencing_data/illumina/flow_cells/190927_A00689_0069_BHLYWYDSXX/SampleSheet.csv
Aborted! |
Quality Gate passedIssues Measures |
Deployed to stage:repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/js.diazboada
done.
Log deploy... done.
cg, version 59.4.0
θ71° [js.diazboada@hasta:~] [S_base] 3m16s $ up Deployed to production:repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/js.diazboada
done.
Log deploy... done.
cg, version 59.4.0
θ67° [js.diazboada@hasta:~] [P_base] 2m40s $ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Fix #2922
TL;DR
Create a more strict validation for v2 sample sheets through
SampleSheetValidator
andOverrideCyclesValidator
classes and implement it in the CLI commands through a new SampleSheetAPI. The sample sheets are now validated every time they are fetched from Housekeeper or read from a file.Tips for reviewing
cg/apps/demultiplex/sample_sheet/sample_sheet_validator.py
andcg/apps/demultiplex/sample_sheet/override_cycles_validator.py
which hold the new validator classes.OverrideCyclesValidator
has only one endpointvalidate_sample
, used only inSampleSheetValidator
.SampleSheetValidator
has the endpoint functionsvalidate_sample_sheet_from_content
,validate_sample_sheet_from_file
andget_sample_sheet_object_from_file
.cg/apps/demultiplex/sample_sheet/api.py
that holds the new APISampleSheetAPI
and compare it with the old CLI commands incg/cli/demultiplex/sample_sheet.py
. There are 3 endpoints in the API corresponding to the 3 CLI commands:validate
->validate_sample_sheet
get_or_create_sample_sheet
->create
get_or_create_all_sample_sheets
->create-all
New validation
This PR implements a new validation for v2 sample sheets through the new class
SampleSheetValidator
that takes into account 5 aspects:[Header]
[Reads]
[BCLConvert_Settings]
[BCLConvert_Data]
IndexSettings
is present in the[Header]
of the sample sheet[Reads]
section and are valid[BCLConvert_Data]
section has the correct columns (sample validation)[Reads]
section and the index2 cycles are in the correct format according to theIndexSettings
(reverse or forward). This is implemented through a new classOverrideCyclesValidator
.Added
SampleSheetValidator
with the endpoint functionvalidate_sample_sheet
which will be the new function to validate sample sheets.OverrideCyclesValidator
with the endpoint functionvalidate_sample
which will validate if the override cycles for a single sample is correct. It is called for each sample inside theSampleSheetValidator
.Changed
cg/apps/demultiplex/sample_sheet/read_sample_sheet.py
into the validator class.Fixed
How to prepare for test
us
paxa
How to test
cg demultiplex samplesheet create <flow_cell_name>
Create all
Validation of sample sheets
Demultiplexing
Expected test outcome
Review
Thanks for filling in who performed the code review and the test!
This version is a
Implementation Plan