Skip to content

Stable version v1.0

Compare
Choose a tag to compare
@drewjbeh drewjbeh released this 18 Feb 09:28
· 173 commits to master since this release

Release v1.0 contains major upgrades to deconvolution and other optimisations and bug fixes

Deconvolution

Clusters that cannot be deconvoluted

  1. 3’ coverage bias at a mismatch required for deconvolution. In order to overcome this, the –deconv-cov-ratio parameter can be used to set a threshold for the difference in coverage at mismatch and 3' end of tRNA. Sequences not passing this threshold will be marked as not deconvoluted.
  2. Some tRNAs may only be distinguishable from the parent cluster by positions that might also be modified sites. Such tRNA transcripts (and the parent of the cluster) are also labelled as not deconvoluted.
  3. Reads that cannot be assigned to a transcript within a cluster are assumed to originate from the parent and are, by default, left assigned to the parent sequence. However, in some cases these parent-assigned reads also contain erroneous mismatches which indicate that they might not indeed belong to the parent transcript either. If 10% or more parent-assigned reads contain such mismatches, the entire cluster is not deconvoluted as these reads are likely to significantly impact the correct estimation of parent abundance, and can also impact the deconvolution choice made my mim-tRNAseq for other members of the cluster.

New handling and naming on unsplit clusters

Transcripts that are not deconvoluted are renamed to provide details on which transcripts remain clustered. These are then treated as other single transcripts for differential expression analysis, modification profiling, and other downstream analyses. For example, if Ala-AGC-1 and Ala-AGC-2 are clustered and cannot be deconvoluted, these two transcripts will remain clustered, be renamed to Ala-AGC1/2 (the parent isodecoder number for the cluster is always listed first), and appear as a single entry for counts, modification analysis, differential expression, and coverage data. For the purpose of readability in plots, the naming of unsplit clusters is shown as Ala-AGC-1-multi. This is particularly useful for clusters with many unsplit sequences as these have unnecessarily long labels for display purposes.

New output: annotation/*_unsplitClusterInfo.txt

This new output details the clusters that were not split as described above, the number of transcripts, the parent of the cluster and the reason the cluster could not be split (see output documentation)

Minor updates and bug fixes

  • Fixed tie-breaking algorithm for assigning reads during deconvolution when there are potential ties in the choice
  • Fixed erroneous new modification detection at position 34
  • Added rat rn7 reference