lncRNA-analysis

A Snakemake-based pipeline to annotate novel lncRNA using existing annotation.

Description

This pipeline uses various bioinformatics tools to annotate lncRNA. It requires reference genome(s) (optional HISAT index), sequencing reads, and reference annotation(s).

The workflow is mainly divided into two parts:

Part 1 (Snakefile):

The first part of pipeline works with the following steps:

uses HISAT2 to align all reads to the transcriptome,
creates a new gene list with all known and novel genes.
separates the novel genes
uses CPAT and CPC to annotate novel genes with coding potential
extracts non-coding genes
identifies and removes transcripts with coding isoforms and less than 200 bp long genes

Part 2 (Snakefile_quantification):

The second part of pipeline works with the following steps:

uses StringTie to merge lncRNAs from different sample
creates transcriptome using GFFread
creates Salmon index for transcriptome
quantify lncRNAs of each replicate of tissue/sample using salmon
creates a TPM matrix for each sample and replicates
filter lncRNAs with low expression value

Pipeline

A simple overview of pipeline is shown below:

Part 1 (Snakefile)	Part 2 (Snakefile_quantification)

Creating Conda environment

This pipeline uses conda environments. Snakemake will create conda environments during the first run of the pipeline. However, if running this pipeline on High-Performance Computing (HPC) it may not have an active internet connection. In that case, conda environment can be created using snakemake -s env_snakefile --useconda command from the node with an internet connection. And then the pipeline will be able to utilise existing environments.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
TRIAD		TRIAD
conda		conda
img		img
scripts		scripts
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
Snakefile_quantification		Snakefile_quantification
env_snakefile		env_snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lncRNA-analysis

Description

Part 1 (Snakefile):

Part 2 (Snakefile_quantification):

Pipeline

Creating Conda environment

Authors

About

Releases

Packages

Languages

License

TGAC/lncRNA-analysis

Folders and files

Latest commit

History

Repository files navigation

lncRNA-analysis

Description

Part 1 (Snakefile):

Part 2 (Snakefile_quantification):

Pipeline

Creating Conda environment

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages