You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The present WDL workflow only supports only one RG per cram. When the workflow creates paired end fastq files, the RG is not preserved and insert size estimates are based on reads from all RGs in the cram. This is problematic when each RG may have different insert sizes. If the RG is added to the paired end fastq files, unfortunately the kmc step breaks and does not recognize the read pairs.
If a cram has multiple RGs, the cram should initially be split into multiple bam files for each RG (https://www.htslib.org/doc/samtools-split.html). The paired-end fastq files collated from the bams could then preserve the RG. The giraffe alignment will then be based on the insert size of each RG in the cram. Each giraffe mapped RG bam can then be merged into one bam with each of the RGs in the header and each read properly tagged with the original RG.
This is how the RG is presently specified when using giraffe in the wdl tasks. Each RG is specified as "1" and the RGs in the original cram are lost.
The present WDL workflow only supports only one RG per cram. When the workflow creates paired end fastq files, the RG is not preserved and insert size estimates are based on reads from all RGs in the cram. This is problematic when each RG may have different insert sizes. If the RG is added to the paired end fastq files, unfortunately the kmc step breaks and does not recognize the read pairs.
If a cram has multiple RGs, the cram should initially be split into multiple bam files for each RG (https://www.htslib.org/doc/samtools-split.html). The paired-end fastq files collated from the bams could then preserve the RG. The giraffe alignment will then be based on the insert size of each RG in the cram. Each giraffe mapped RG bam can then be merged into one bam with each of the RGs in the header and each read properly tagged with the original RG.
This is how the RG is presently specified when using giraffe in the wdl tasks. Each RG is specified as "1" and the RGs in the original cram are lost.
The text was updated successfully, but these errors were encountered: