-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Welcome to the EasyFuse wiki!
EasyFuse is a pipeline to efficiently detect fusion transcripts from RNA-seq data with high accuracy. EasyFuse uses five fusion gene detection tools, STAR-Fusion, InFusion, MapSplice2, Fusioncatcher, and SoapFuse along with powerful read filtering, stringent re-quantification of candidates, and machine learning for highly specific and sensitive fusion gene prediction.
A manuscript describing the method and performance evaluations is submitted for peer-review and publication.
Follow this link for instructions on Installing EasyFuse
Using paired end FASTQ Files, run EasyFuse in the following way:
processing.py -i /path/to/directory/of/fastq/files \
-o /path/to/output/directory \
-c /path/to/config/file \
-u USERNAME \
-p SLURM-Partition
Processing multiple samples with paired FASTQ files is possible. In any case, FASTQ files need to be
- in the same folder
- possess the same base name for each pair
- possess unique names for each sample
EasyFuse will automatically create a folder structure
EasyFuse will generate a database samples.db
, saving the progress and successfully completed steps. If this database exists, EasyFuse can be restarted using the same command as above and will resume processing from the last successfully completed step.
In order to change tool configuration, paths to indices, or remove certain steps from the pipeline the config file can be edited.
[general]
tools=QC,Readfilter,Fusioncatcher,Star,Starfusion,Infusion,Mapsplice,Soapfuse,Fetchdata,Summary
fusiontools=Fusioncatcher,Starfusion,Infusion,Mapsplice,Soapfuse
fd_tools=Fusiongrep,Contextseq,Starindex,ReadFilter2,ReadFilter2b,StaralignBest,BamindexBest,RequantifyBest
cis_near_distance=1000000
model_pred_threshold=0.75
tsl_filter=4,5,NA
requant_mode=best
context_seq_len=400
ref_genome_build=hg38
ref_trans_version=ensembl
queueing_system=slurm
QC: FASTQ-QC and trimming to ensure high quality reads using FastQC and skewer
Readfilter: Read filtering step to remove normal mapping reads
Fusioncatcher,Star,Starfusion,Infusion,Mapsplice,Soapfuse: Alignment and fusion detection tools (STAR and at least one tools is necessary to run the pipeline)
Fetchdata: Module to parse all outputs in a similar format, calculate context sequences around breakpoint and their translated peptide sequence, realign those sequences for quantification
Summary: Summarize data to final output, needs Fetchdata to finish successfully
Employed fusion tools in pipeline run (needs to be the same as in tools above)
It is not recommended to change this line, EasyFuse will not work as intended!
Fusiongrep: Parses output from detection tools into Detected_Fusions.csv
Contextseq: Calculates context_sequences and annotates fusion genes into Context_Seq.csv
Starindex: Generates STAR-index from context sequences
ReadFilter2: Generates alignment from
ReadFilter2b: Generates FASTQs from bam file
StaralignBest: Aligns FASTQs to context sequence star index
BamindexBest: Indexes resulting bam file from StaralignBest
RequantifyBest: Calculates mapping reads per 100 million reads
It is not recommended to change these entries, EasyFuse will not work as intended!
cis_near_distance: Distance between neighboring genes to be qualified as "cis_near" when detected as fusion
model_pred_threshold:
tsl_filter: Threshold for transcripts to be filtered out by tsl_level
requant_mode:
context_seq_len: Length of context sequence from breakpoint in either direction
ref_genome_build: Version of reference genome (e.g. hg38, hg19 etc.). It is strongly recommended to use hg38.
ref_trans_version: Transcript version (only Ensembl supported)
queueing_system: Which queueing system to use (only SLURM supported)
[references]
ensembl_genome_fasta_hg38=
ensembl_genome_fastadir_hg38=
ensembl_genome_sizes_hg38=
ensembl_genes_fasta_hg38=
ensembl_genes_gtf_hg38=
ensembl_genes_adb_hg38=
ensembl_genes_tsl_hg38=
[indices]
ensembl_star_hg38_sjdb49=/projects/data/human/ensembl/GRCh38.86/STAR_idx/
ensembl_bowtie_hg38=/projects/data/human/ensembl/GRCh38.86/bowtie_index/hg38
ensembl_starfusion_hg38=/projects/data/human/ensembl/GRCh38.86/starfusion_index/
ensembl_fusioncatcher_hg38=/projects/data/human/ensembl/GRCh38.86/fusioncatcher_index/
[otherFiles]
ensembl_infusion_cfg_hg38=/projects/data/human/ensembl/GRCh38.86/infusion_index/infusion.cfg
ensembl_soapfuse_cfg_hg38=/code/SOAPfuse/1.27/config/config_h86.txt
easyfuse_model=/code/easyfuse/1.3.0/data/model/Fusion_modeling_IVAC_BNT_v16.model.requant_and_boundary_org.randomForest.rds
All paths to the references must be absolute paths. Examples are given for Ensembl86
ensembl_genome_fasta_hg38: Fasta file containing complete genome (Homo_sapiens.GRCh38.dna.primary_assembly.fa)
ensembl_genome_fastadir_hg38: Directory with single fasta from each chromosome
ensembl_genome_sizes_hg38: Genome sizes calculated by STAR (chNameLength)
ensembl_genes_fasta_hg38: cDNA file containing all ensembl transcripts (Homo_sapiens.GRCh38.cdna.all.fa)
ensembl_genes_gtf_hg38: Ensembl gtf file (Homo_sapiens.GRCh38.86.gtf)
ensembl_genes_adb_hg38: gff in database form (Homo_sapiens.GRCh38.86.gff3)
ensembl_genes_tsl_hg38: blacklist of known low tsl level transcripts, based of Ensembl gtf (Homo_sapiens.GRCh38.86.gtf)
Absolute Paths to complete indices, based on Ensembl86
ensembl_star_hg38_sjdb49: Path to STAR index
ensembl_bowtie_hg38: Path to bowtie index
ensembl_starfusion_hg38: Path to STAR-Fusion index
ensembl_fusioncatcher_hg38: Path to Fusioncatcher index
Single config files containing options/indices for certain tools/modules
ensembl_infusion_cfg_hg38: Path to InFusion config
ensembl_soapfuse_cfg_hg38: Path to SoapFuse config
easyfuse_model: Path to .rds file for EasyFuse model