Feature Request: giraffe mapping of CRAMs with multiple RGs #151

jjfarrell · 2025-01-28T12:46:38Z

The present WDL workflow only supports only one RG per cram. When the workflow creates paired end fastq files, the RG is not preserved and insert size estimates are based on reads from all RGs in the cram. This is problematic when each RG may have different insert sizes. If the RG is added to the paired end fastq files, unfortunately the kmc step breaks and does not recognize the read pairs.

If a cram has multiple RGs, the cram should initially be split into multiple bam files for each RG (https://www.htslib.org/doc/samtools-split.html). The paired-end fastq files collated from the bams could then preserve the RG. The giraffe alignment will then be based on the insert size of each RG in the cram. Each giraffe mapped RG bam can then be merged into one bam with each of the RGs in the header and each read properly tagged with the original RG.

This is how the RG is presently specified when using giraffe in the wdl tasks. Each RG is specified as "1" and the RGs in the original cram are lost.

        vg giraffe \
          --progress \
          --read-group "ID:1 LB:lib1 SM:~{in_sample_name} PL:illumina PU:unit1" \
          --sample "~{in_sample_name}" \
          --output-format BAM \
          ~{in_giraffe_options} \
          --ref-paths ~{in_ref_dict} \
          -f ~{in_left_read_pair_chunk_file} -f ~{in_right_read_pair_chunk_file} \
          -x ~{in_xg_file} \
          -H ~{in_gbwt_file} \
          -g ~{in_ggbwt_file} \
          -d ~{in_dist_file} \
          -m ~{in_min_file} \
          -t ~{in_map_cores} > ~{in_sample_name}.${READ_CHUNK_ID}.bam
    >>>

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

jjfarrell commented Jan 28, 2025 •

edited

Loading

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

Comments

jjfarrell commented Jan 28, 2025 • edited Loading

jjfarrell commented Jan 28, 2025 •

edited

Loading