Falsitron identification

Main scripts for the paper:

Direct long-read RNA sequencing identifies a subset of questionable exitrons likely arising from reverse transcription artifacts

Isoform annotation

The identification of isoforms from long read data is made following the ONT pipeline based on StringTie and other tools

cDNA and directRNA reads are processed independently and using the default parameters of the pipeline.

We use the script candidate_search.R in the following manner:

Rscript candidate_search.R <input_file>

The input file contains the following lines:

An example of this file is contained here.

query = Library with potential artifacts (usually cDNA)

target = Library to compare (usually dRNA)

The script will output the candidates under different filters, as described in the figure above.

The script repeat_search.R is used to process the file with the filter F3 in order to search for direct repeats in the candidates.