seabed-symphony is a metagenomics pipeline designed for Biosynthetic Gene Clusters (BGCs) identification from marine sediment microbiome sequencing data using long-reads. This workflow processes raw sequencing reads, performs quality control, assembles metagenomes (MAGs), classifies bacterial genomes taxonomically, annotates their functions, and detects BGCs using state-of-the-art tools.
✅ Barcode demultiplexing for multiplexed sequencing data
✅ Metagenome-Assembled Genomes (MAGs) reconstruction
✅ Genome filtering to keep only prokaryotic contigs for accurate gene prediction
✅ High-quality genome binning and taxonomic classification
✅ Identification and visualization of Biosynthetic Gene Clusters (BGCs)
✅ Compatible with HPC environments
The pipeline follows a stepwise approach to process raw sequencing data:
0_rawReads
→ Raw sequencing reads1_rawReadsQC
→ Quality control of raw reads (NanoPlot & SeqKit)2_trimmedReads
→ Read trimming (Filtlong)3_adapterRemoval
→ Adapter removal (Porechop)4_trimmedReadsQC
→ Post-trimming QC (NanoPlot & SeqKit)
5_metagenomeAssembly
→ Metagenome assembly (metaFlye)6_bandage
→ Assembly visualization (Bandage)7_assignTaxonomy
→ Taxonomic classification (Whokaryote)8_prokaryoteContigs
→ Filtering prokaryotic contigs (extractContigsFromWhokaryote)
9_genomeBinning
→ Genome binning (MetaBAT2 & MaxBin2 & CONCOCT & GraphMB)10_nonRedundantBins
→ Non-redundant genome bins (DAS Tool)11_CheckM
→ Genome bin quality assessment (CheckM]
12_GTDB-Tk
→ Phylogenetic classification (GTDB-Tk)13_bakta
→ Functional annotation (Bakta)14_BGCs
→ Biosynthetic Gene Clusters detection (antiSMASH)15_BiG-SCAPE
→ BGC classification and networking (BiG-SCAPE)
A step-by-step tutorial is available in documentation/documentation.pdf
, providing guidance on installation, usage, and best practices.
For questions, feel free to open an issue.