Skip to content

nch-igm/PBFLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


pbflip
PB_FLIP

PacBio Fusion and Long Isoform Pipeline (PB_FLIP).

Python Dependancies Issues Downloads License

PacBio Fusion and Long Isoform Pipeline (PB_FLIP) incorporates a suite of RNA-Seq software analysis tools and scripts to identify expressed gene fusion partners and isoforms.


  • FusionHubDB
  • DisGeNETDB 7.0
    • Download curated_gene_disease_associations.tsv from the above link.
  • isoannotlitegff3
    • Download Homo_sapiens_GRCh38_Ensembl_86.zip or Mus_musculus_GRCm38_Ensembl_86.zip
  • STAR Genome Index
    • Provide STAR index folder for short-reads junction support
    • Human and Mouse References can be downloaded from Gencode. The pipeline was tested with Human Genome release version 38.

The absolute paths to these 4 files should be added to config/case.yml

DISGENET:
  /data/pbflip/DisGeNET/curated_gene_disease_associations.tsv

FUSIONHUBDB:
  /data/pbflip/FusionDatabase/Fusionhub_global_summary.txt

REFERENCES:
    genome: /data/pbflip/isoseq_db/genomes/hg38.fa
    annotation: /data/pbflip/isoseq_db/genomes/gencode.v32.annotation.gtf
    isoannotlitegff3: /data/pbflip/isoseq_db/Homo_sapiens_GRCh38_Ensembl_86.gff3

GENOMEINDEX:
  star_index: /data/pbflip/star_index

TX2G:
  "/data/pbflip/isoseq_db/gencode.v32.annotation.tr2g_gtf.tsv"

To create gencode.v32.annotation.tr2g_gtf.tsv

grep -w "exon" gencode.v32_SIRVome_isoforms_ERCCs_longSIRVs_200709a_C_170612a.gtf \
        | cut -f9 | cut -f1,2,4 -d";" \
        | sed 's/gene_id //g' | sed 's/; transcript_id / /' \
        | sed 's/; gene_name / /'| uniq > gencode.v32.annotation.tr2g_gtf.tsv

Set up conda environment for the PB_FLIP pipeline

cd ~
mkdir apps
cd apps
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
bash Miniconda3-py37_4.10.3-Linux-x86_64.sh

Follow the screen instruction to complete the installation.

Then, activate the base environment.

conda activate base

To install some of the python dependancies:

conda install -r environment.yml

At this point, you can follow the instruction in sandbox_installer.sh to install rest of the dependancies.


  1. Clone the repository to your local machine
git clone https://github.com/nch-igm/PBFLIP
cd PBFLIP
  1. Edit config/case.yml
  2. To run the pipeline you can issue the following command. This will run the pipeline on 16 cpu threads.
snakemake -f -p -j 16 -c 16 --latency-wait 20

The PB_FLIP container was evaluated using an AWS sandbox environment (16 CPU, 128GB RAM and 500GB disk space).

  1. Clone the repository to your local machine
git clone https://github.com/nch-igm/PBFLIP
cd PBFLIP
  1. Edit config/case.yml
  2. Create a folder called pbflip
mkdir pbflip
  1. Download all the required databases to pbflip directory as described Required External Databases

  2. Copy config folder to pbflipfolder

pbflip/
├── Brain_Reference_SIRV_4_C99_I95
├── config
├── DisGeNET
├── FusionDatabase
├── isoseq_db
└── star_index
  1. To run docker container in your local machine you can issue the following command. This will run the pipeline on 18 cpu threads.
docker run -d -rm -v '$(pwd)/pbflip:/data/pbflip' \
            -e 'configfile=/data/pbflip/config/case.yml' \
            -e 'threads=18' \
            -e 'result_dir=/data/pbflip' \
            public.ecr.aws/nch-igm/pb-flip:public
  1. The final results will be under $(pwd)/pbflip/working_dir

Inputs

Before you run PB_FLIP, you need to have the following input files from smrtlink analyses. These files are located in $SMRT_ROOT/userdata/jobs_root/0000/0000000/0000000002/outputs/.

cluster_report: cluster_report.csv

hq_transcripts: hq_isoforms.fasta

flnc: flnc.bam


Configuration File

CASENAME : A name for your project. This is your current working directory name

SMRTLINKFILES

version : Current pipeline only supports data generated from smrtlink version 10 or above

cluster_report : Path to Cluter report file generated through smrtlink analysis

hq_transcripts : Path to HQ transcripts generated through smrtlink analysis

flnc : Path to full-length Non-Concatemer bam file generated through smrtlink analysis

ILLUMINASHORTREADS : Full paths to short-reads, read 1 & 2, if available

REFERENCES

species : Species, currently Human (hs) and Mouse (mm) samples are supported

genome : Full path to genome file

annotation : Full path to annotation file

isoannotlitegff3 : Full path to IsoAnnotLite annotation file

COLLAPSEPARAM : cDNA_Cupcake/ToFU collapse_isoforms_by_sam.py parameters

FILTERBYCOUNTS : cDNA_Cupcake/ToFU filter_by_count.py parameters

PBSVCALLERPARAM : pbsv parameters

PBBAM : pbindex and bam2fastq

MAPPERS : Short and long reads mappers used in the pipeline

GENOMEINDEX : Path to Genome index for STAR aligner

TX2G : Transcripts to gene association file for your species.

PICARD : Full path to picard software

SNPEFF : Full paths to snpEFF.jar and SnpSift.jar files

ISOSEQSCRIPTS : A collection of scripts from cDNA_Cupcake/ToFU and SQANTI3

LIBPATHS : PYTHONPATH for you conda environment

DISGENET : Full path to DisGeNET file

FUSIONHUBDB : Full path to the file downloaded from FusionHUB


Output Folder Structure


Fusion Pipeline Output Folder Structure
pbflip


Isoform Pipeline Output Folder Structure
pbflip

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published