-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bring the MAS-ISO-seq workflows into our production pipelines #340
Conversation
626bd6d
to
d013453
Compare
@SHuang-Broad here are the files that docker script generated. I'm not sure it produced helpful output, though. |
Thank @jonn-smith. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with my 1st round of reviews. I have many questions.
Happy to go over in a meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but nothing to prevent the pr from merging
docker/lr-transcript_utils/python/process_quant_files_into_tsv.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments, nothing critical in light of the tight deadline.
d08819c
to
3b4ef83
Compare
- Added required apt-get update to docker. - Added missing slash for finalizing files. - Updated `PBFlowcell.wdl` with some initial offline comments. - Fixed potential issue with transcript equivalence classes. - Updated longbow version to 0.5.29. - Updated subreads stats to reflect MAS-seq yield. - Variable name and other minor updates in `update_umi_positions.py` - Updated `Utils.GetRawReadGroup` to escape spaces as well. - Updated `AdjustUmiSequenceWithAdapterAlignment` to work with all models. - Updated lr-10x docker image to v0.1.18 - Updated Longbow WDL for new model names in the latest version. - Adjusted ram for `Longbow.Correct` task. - Fixed model names to reflect new longbow model name convention. - Updated to latest version of lr-transcript_utils. - Fixed problem with argument parsing in update_umi_positions_2.py, causing umis to not be corrected for libraries other than 5'. - Updated model segment names in script to collect better stats. Some stats are still broken. - Added jupyter notebook wdl. - Several fixes to PBMASIsoSeqQuantify.wdl for bugs. - Updated memory for StringTie2 quantify to 64gb (bandaid fix). - Removed contig filters for sharding so that no read is dropped. - Added post-umi correction tags to `tags to preserve` in the call to minimap2 when aligning reads in `PBFlowcell.wdl` - Updated `StringTie2.wdl` to version `2.2.1` - Commented out some unused code in `PBMASIsoSeqQuantify.wdl` - The MAS-seq node filter was filtering the node names by a specific PacBio instrument name, causing all reads from sequencers other than that one to not be counted as MAS-seq nodes and therefore not quantified. This was replaced with more robust logic (a regular expression). - Updated `umi_correction.py` to have variables for all tags. - Updated `umi_correction.py` to switch between pre-extracted and non-extracted data; updated WDL accordingly. - Updated `PBFlowcell.wdl` to only process certain MAS-seq outputs on single-cell data. - Updated `Longbow.wdl` to only process certain MAS-seq outputs on single-cell data. - Updated longbow to 0.5.33 to fix issues for bulk libraries. - Added in overrides for barcode correction to enable more flexibility. - `PBFlowcell` now default shards to 100 for MASSEQ. - `Longbow.Process` now default shards to 25. - Exposed an optional input in `Longbow.Process` to allow for custom shard widths. - Added in a script to pull down all monitoring logs for a run. - Addressed reviewer comments.
3b4ef83
to
5dabacb
Compare
This PR enables MAS-seq production runs in
PBFlowcell.wdl
, with a quantification pipelinePBMASIsoSeqQuantify.wdl
and a sample demultiplexing pipelinePBMASIsoSeqDemultiplex.wdl
.I have made some substantial changes to PBFlowcell and other task WDLs. All WDLs validate and I have tested
PBFlowcell
andPBMASIsoSeqQuantify
. I haven't created a test dataset to run through the demultiplexing WDL, so that has not been tested yet.My tests have been done with a
5'
library because it's what we used for the paper. I'm going to start testing with the other libraries, and some modifications will likely be necessary. My thoughts are to get this in and then quickly update it with whatever changes are necessary for other preps.