Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring the MAS-ISO-seq workflows into our production pipelines #340

Merged
merged 1 commit into from
Jul 28, 2022

Conversation

jonn-smith
Copy link
Collaborator

This PR enables MAS-seq production runs in PBFlowcell.wdl, with a quantification pipeline PBMASIsoSeqQuantify.wdl and a sample demultiplexing pipeline PBMASIsoSeqDemultiplex.wdl.

I have made some substantial changes to PBFlowcell and other task WDLs. All WDLs validate and I have tested PBFlowcell and PBMASIsoSeqQuantify. I haven't created a test dataset to run through the demultiplexing WDL, so that has not been tested yet.

My tests have been done with a 5' library because it's what we used for the paper. I'm going to start testing with the other libraries, and some modifications will likely be necessary. My thoughts are to get this in and then quickly update it with whatever changes are necessary for other preps.

@jonn-smith jonn-smith force-pushed the kvg_pb_masseq branch 3 times, most recently from 626bd6d to d013453 Compare June 24, 2022 19:12
@jonn-smith
Copy link
Collaborator Author

@SHuang-Broad here are the files that docker script generated. I'm not sure it produced helpful output, though.

dockers.in_use.sorted.tsv.txt
dockers.latest.sorted.tsv.txt

@SHuang-Broad
Copy link
Collaborator

Thank @jonn-smith.
This is what I needed. Basically it lists the dockers used in the various places in the pipeline (which line it is mentioned, and which tag). This lets me focus where to look for potential issues, given that many dockers are touched in this PR.

Copy link
Collaborator

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with my 1st round of reviews. I have many questions.
Happy to go over in a meeting.

docker/lr-racon/Makefile Outdated Show resolved Hide resolved
docker/lr-racon/Makefile Outdated Show resolved Hide resolved
docker/lr-racon/Makefile Show resolved Hide resolved
docker/lr-finalize/Dockerfile Show resolved Hide resolved
docker/lr-align/Makefile Outdated Show resolved Hide resolved
docker/lr-jupyter/setupBigJupyterHost.sh Outdated Show resolved Hide resolved
docker/lr-ref/environment.yml Outdated Show resolved Hide resolved
docker/lr-ref/Dockerfile Show resolved Hide resolved
docker/lr-splicedbam2gff/Dockerfile Show resolved Hide resolved
docker/lr-transcript_utils/Dockerfile Show resolved Hide resolved
Copy link
Collaborator

@bshifaw bshifaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but nothing to prevent the pr from merging

docker/lr-10x/environment.yml Show resolved Hide resolved
docker/lr-align/environment.yml Show resolved Hide resolved
docker/lr-10x/extract_ilmn_bc_conf_scores.py Outdated Show resolved Hide resolved
docker/lr-utils/environment.yml Show resolved Hide resolved
docker/lr-cartographer/bash/quick_test_cartographer.sh Outdated Show resolved Hide resolved
docker/lr-10x/tool.py Show resolved Hide resolved
docker/lr-10x/update_umi_positions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, nothing critical in light of the tight deadline.

docker/lr-utils/Dockerfile Show resolved Hide resolved
docker/lr-stringtie2/Dockerfile Outdated Show resolved Hide resolved
docker/lr-stringtie2/Dockerfile Outdated Show resolved Hide resolved
docker/lr-splicedbam2gff/Makefile Outdated Show resolved Hide resolved
docker/lr-splicedbam2gff/Dockerfile Outdated Show resolved Hide resolved
docker/lr-align/Dockerfile Show resolved Hide resolved
docker/lr-align/Makefile Outdated Show resolved Hide resolved
docker/lr-metrics/Makefile Show resolved Hide resolved
@jonn-smith jonn-smith force-pushed the kvg_pb_masseq branch 2 times, most recently from d08819c to 3b4ef83 Compare July 28, 2022 16:01
- Added required apt-get update to docker.
- Added missing slash for finalizing files.
- Updated `PBFlowcell.wdl` with some initial offline comments.
- Fixed potential issue with transcript equivalence classes.
- Updated longbow version to 0.5.29.
- Updated subreads stats to reflect MAS-seq yield.
- Variable name and other minor updates in `update_umi_positions.py`
- Updated `Utils.GetRawReadGroup` to escape spaces as well.
- Updated `AdjustUmiSequenceWithAdapterAlignment` to work with all models.
- Updated lr-10x docker image to v0.1.18
- Updated Longbow WDL for new model names in the latest version.
- Adjusted ram for `Longbow.Correct` task.
- Fixed model names to reflect new longbow model name convention.
- Updated to latest version of lr-transcript_utils.
- Fixed problem with argument parsing in update_umi_positions_2.py,
  causing umis to not be corrected for libraries other than 5'.
- Updated model segment names in script to collect better stats.  Some
  stats are still broken.
- Added jupyter notebook wdl.
- Several fixes to PBMASIsoSeqQuantify.wdl for bugs.
- Updated memory for StringTie2 quantify to 64gb (bandaid fix).
- Removed contig filters for sharding so that no read is dropped.
- Added post-umi correction tags to `tags to preserve` in the call to
  minimap2 when aligning reads in `PBFlowcell.wdl`
- Updated `StringTie2.wdl` to version `2.2.1`
- Commented out some unused code in `PBMASIsoSeqQuantify.wdl`
- The MAS-seq node filter was filtering the node names by a specific
  PacBio instrument name, causing all reads from sequencers other than
  that one to not be counted as MAS-seq nodes and therefore not
  quantified.  This was replaced with more robust logic (a regular
  expression).
- Updated `umi_correction.py` to have variables for all tags.
- Updated `umi_correction.py` to switch between pre-extracted and non-extracted data; updated WDL accordingly.
- Updated `PBFlowcell.wdl` to only process certain MAS-seq outputs on single-cell data.
- Updated `Longbow.wdl` to only process certain MAS-seq outputs on single-cell data.
- Updated longbow to 0.5.33 to fix issues for bulk libraries.
- Added in overrides for barcode correction to enable more flexibility.
- `PBFlowcell` now default shards to 100 for MASSEQ.
- `Longbow.Process` now default shards to 25.
- Exposed an optional input in `Longbow.Process` to allow for custom shard widths.
- Added in a script to pull down all monitoring logs for a run.
- Addressed reviewer comments.
@jonn-smith jonn-smith merged commit e4afca7 into main Jul 28, 2022
@jonn-smith jonn-smith deleted the kvg_pb_masseq branch July 28, 2022 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants