This repository contains the code used to translate transcript-relative loci (including the output of TargetFinder (https://github.com/carringtonlab/TargetFinder) into genome-relative loci. If the transcript-relative locations span introns) the output will include multiple genome-relative regions.
- Extracts the exon structure of loci transcripts from a genome annotation GFF3 file
- Splits the exon structure files into separate GFF3 files for transcripts on the negative and positive strand of the genome
Requirements:
- A UNIX shell such as BASH
Input files:
- A GFF3 file of transcript-relative loci you want to translate (loci_file)
- A genome annotation GFF3 file (annotation_file)
Usage:
- User input is required on lines 4 (path/to/working/directory), 9 (loci_file) and 66 (annotation_file)
- Ensure the required input files are in the working directory. All output files will also be written to this directory.
- Generates a .txt file of genome-relative loci_file
Requirements:
- R with tidyr, plyr and dplyr packages installed
Input files: All required input files are generated by script 01.
- annotation_file.exons.targets.positive.2.3.4.gff3
- annotation_file.exons.targets.negative.2.3.4.gff3
- loci_file.2.3.4.gff3
Usage:
- User input is required on lines 13 (path/to/working/directory), 20 (annotation_file), 23 (annotation_file) and 25 (annotation_file)
- The output file name can be changed on line 370
- The working directory should be the same as that used for script 01. The output file will also be written to this directory.
Output file The output of this pipeline is a .txt file ("genomic_loci.txt") with 5 or more columns
-
user-provided locus IDs
-
chromosome
-
genome strand
-
Region 1 end
-
Region 1 start
-
Onwards (if the locus spans multiple exons): Locus end and start positions