DSL2 modules #60

BarryDigby · 2022-10-13T21:33:06Z

Description of feature

Categorization of the workflow at the process level with the corresponding modules needed to port to 'DSL2'. Once the modules have been created, I can place more shape on this in terms of subworkflows.

N.B: please checkout new branches for individual features and push to the DSL2 branch, not dev.

Input files

Currently, circRNA takes as input a samplesheet.csv file and a phenotype.csv file. Functions already exist to check these files, all that is needed is to place these in an input_check.nf local subworkflow.

I would like to incorporate strandedness like other nf-core workflows. Will check which circRNA quantification tools have a parameter denoting strandedness.

Pre-processing

The workflow takes as input fastq or bam files (which are converted to fastq using picard SamToFastq) and performs FastQC on the raw reads prior to trimming using BBDUK. The trimmed reads are then checked using FastQC again and placed in channels for downstream analyses.

FastQC
MultiQC
BBDUK
picard/SamToFastq (I don't care if we drop this functionality.)

circRNA Discovery

Several tools utilize the same aligner, there will be duplicates here.

CIRIquant

bwa index
hisat build
ciriquant

CIRCexplorer2

STAR genomegenerate
STAR align (2 Pass mode)
circexplorer2 parse
circexplorer2 annotate

circRNA_finder

star genomegenerate
star align (2 Pass mode)
circRNA_Finder (postProcessStarAlignment.pl script)

DCC

DCC maps paired-end reads jointly and separately using STAR 2 pass mode. The goal is to generate chimeric.junction.out files from joint STAR mapping and individual read 1 and read 2 STAR mapping.

star genomegenerate
star align (2 Pass)
dcc

find_circ

bowtie2 build
bowtie2 align
find_circ find_anchors
find_circ find_circ

Mapsplice

bowtie build
mapsplice align
circexplorer2 parse
circexplorer2 annotate

Segemehl

segemehl align

Custom scripts to parse segemehl output, no need to create a module.

circRNA annotation

customized bash script to standardise the annotation outputs from the seven quantification tools.

circRNA FASTA sequence

customized bash script to generate the mature spliced sequence in FASTA format, and append the back-splice junction sequence for miRNA target prediction.

circRNA count matrix

consolidate the circRNAs called by multiple tools on a per sample basis, generate the count matrix.

miRNA target prediction

miRanda

miranda

TargetScan

targetscan. biocontainers #475

custom script to amalgamate the results from both tools.

Differential expression

hisat build
hisat align
stringite

Custom R scripts for DESeq2 and CircTest, no need to create modules.

The text was updated successfully, but these errors were encountered:

nictru · 2023-10-01T17:59:18Z

I think all these modules have been properly implemented. Can we close this issue?

BarryDigby added the enhancement Improvement for existing functionality label Oct 13, 2022

BarryDigby assigned JackCurragh and BarryDigby Oct 13, 2022

BarryDigby added WIP Work in progress DSL2 labels Oct 13, 2022

BarryDigby changed the title ~~DSL2~~ DSL2 modules Oct 13, 2022

nictru closed this as completed Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSL2 modules #60

DSL2 modules #60

BarryDigby commented Oct 13, 2022 •

edited by nictru

Loading

nictru commented Oct 1, 2023

DSL2 modules #60

DSL2 modules #60

Comments

BarryDigby commented Oct 13, 2022 • edited by nictru Loading

Description of feature

Input files

Pre-processing

circRNA Discovery

CIRIquant

CIRCexplorer2

circRNA_finder

DCC

find_circ

Mapsplice

Segemehl

circRNA annotation

circRNA FASTA sequence

circRNA count matrix

miRNA target prediction

miRanda

TargetScan

Differential expression

nictru commented Oct 1, 2023

BarryDigby commented Oct 13, 2022 •

edited by nictru

Loading