Skip to content

v1.0.0

Compare
Choose a tag to compare
@mirimia mirimia released this 16 Jun 22:42
· 136 commits to master since this release
5418141

Main module

  • Changes in intron phase handling. In previous versions, ExOrthist used the offset from the CDS lines (8th column) of the GTF file as representative of the intron phase. From this release onwards, we use the actual definition of intron phase (i.e. the nucleotide of the codon after which the intron is located). Moreover, phase 0 introns are now placed in the IPA before the aminoacid residue and phase 1 and 2 introns after the residue, to better reflect the coding meaning. IMPORTANT NOTE: these changes do not have a major impact in exon homology calling, but all the IPA alignments generated with previous versions will not be valid from v1.0.0 on (i.e. they cannot be used with the --prevaln option). [Script modified: A1].
  • Improvements in the addition of non-annotated exons (--extraexons option): the insertion of the exon in transcripts between coding C1 and C2 exons is prioritized over the insertion between non-coding exons. [Script modified: A1].
  • Non-annotated exons (--extraexons option) can now be added only for a subset of species (previously: either all or none). [Script modified: main.nf].
  • Introduction of stricter cutoffs when deciding not to realign a pair of matching exons in process parse_IPA_prot_aln. To not perform a realignment of a query and >= 2 target exons, it is now required that there is a single best exon pair from another isoform with less than 30% of gaps, more than 40% exon protein sequence similarity and an exon length ratio (shortest/longest) of at least 0.6. [Script modified: B1].
  • The file with the best matches (at the level of the target gene) for each overlapping group of exons is now saved in the output folder as filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab. [Script modified: main.nf].
  • Redundancy removal: if two variants of the same exon (overlap exon group) have two different exons from the same target gene as valid homologs, only one is selected. Priority is given to the exon associated with a bonafide exon variant (if provided), or, otherwise, to the representative variant of the query exon overlap group. [Scripts modified: C3 and C5].
  • Addition of time and version information in the run_info.log file. [Script modified: A0].

Exint plotter module

  • Addition of a test set for the exint_plotter.nf module.
  • Introduction of additional information in the legend of the exint plots.

Compare exon sets module

  • Changes in the call of not-orthologous exon regulation: it assigns the non-conserved label (instead of best_hit) to the query exon if the best hit of the target exon is in an orthogroup with a different query exon from the same query gene.
  • Changes in the statistics provided as text output: the pairwise comparisons between regulated exons in orthologous genes are now separately reported for each query species and based on the total number of query regulated exons, not pairwise comparisons.
  • Introduction of a graphical output when the module is run for two exon sets (see README).

Others

  • Updated documentation.
  • Uploading of pre-computed IPA protein alignments generated for all the species pairs in a human (hg38), mouse (mm10), zebrafish (danRer11) and fruitfly (dm6) genome-wide ExOrthist main.nf run. These pairwise alignments can be used for a new main module run with the --prevaln option, allowing to skip the alignment step for all specified species pairs. The pre-computed IPA alignments can be downloaded from the --prevaln section in the README, and more will be added to the Github repository in the near future.
  • Introduction of the retrieve_IPA_aln.pl script, to more easily isolate and visualize the best protein alignment between a pair of (query-target) exons.
  • Addition of a maximum length ratio filter to select liftOver hits in get_liftovers.pl.