GitHub - wen1112/SinProVirP-v1.0

SinProVirP: Signature-Protein based Virome Profiling tool

A signature protein-based tool developed for genus-level profiling of the human gut virome.

Citation

xxxxxxxxxxxxxxxxxxx

Installation

Step One - Clone the repository
Clone the repository to the desired location on your system using the following command:

git clone https://github.com/wen1112/SinProVirP-v1.0.git

Step Two - Create a new conda environment
Create a new conda environment via the following command, replacing /full/path/to/cloned/repo with the appropriate path to your cloned repository:

conda env create -n SinProVirP_env --file /full/path/to/cloned/repo/SinProVirP_env.yaml

Activate the environment by executing:

conda activate SinProVirP_env

If you have trouble creating the environment using the above commands, you can alternatively follow the instructions to create a enviroment.

conda create -n SinProVirP_env python=3.11
conda activate SinProVirP_env

conda install bioconda::snakemake
conda install samtools=1.6
conda install bedtools=2.31.0
conda install diamond=2.0.14
conda install biopython

If you do not already have conda installed, please install using the instructions provided here.
If you have some problem with snakemake install, please install using the instructions provided here

Test Your Installation

Input file

If the input file is Paired-end, then it will 'cat R1.fq R2.fq > R1R2.fq' as the final input file because of diamond required SE reads input.
The input file each col should split by '\t'.
Paired-end reads

sample_id fq1 fq2

s1 s1.R1.fq.gz s1.R2.fq.gz

s2 s2.R1.fq.gz s2.R2.fq.gz

s3 s3.R1.fq.gz s3.R2.fq.gz

s4 s4.R1.fq.gz s4.R2.fq.gz
Also you can 'cat R1.fq R2.fq > R1R2.fq' before you input file.

sample_id fq

s1 s1.R1R2.fq.gz

s2 s2.R1R2.fq.gz

s3 s3.R1R2.fq.gz

s4 s4.R1R2.fq.gz
Single-end reads

sample_id fq1 fq2

s1 s1.R1.fq.gz

s2 s2.R1.fq.gz

s3 s3.R1.fq.gz

s4 s4.R1.fq.gz

Quick start

We need three file: 'work.sh', 'config.yaml' and 'sample_file.txt'
Please remeber to change the path in it.

cd /full/path/to/cloned/repo/test_data
bash work.sh

work.sh:

conda activate SinProVirP_env
snakemake -s /full/path/to/cloned/repo/Snakefile.py \
--configfile /full/path/to/cloned/repo/test_data/config.yaml \
--jobs 99 --cores 8

config.yaml

pipeline_directory: /full/path/to/cloned/repo

# Sample file specifies sample names and names of files containing sample reads
# Format: Tab-delimited, two columns
# sample_name  reads_R1R2file
# See example (samp_file.txt) in the testing folder
sample_file: /full/path/to/sample/file

# In which directory should results be outputted?
outdir: /full/path/to/output/directory

sample_file.txt - split each col by "\t".

sample1 /full/path/to/cloned/repo/test_data/test1_R1R2.fastq
sample2 /full/path/to/cloned/repo/test_data/test2_R1R2.fastq

The script output log may just like this:

Building DAG of jobs...
Relative file path './test2_R1R2.fastq' starts with './'. This is redundant and strongly discouraged. It can also lead to inconsistent results of the file-matching approach used by Snakemake. You can simply omit the './' for relative file paths.
Using shell: /bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       add_taxonomy
        1       all
        1       calculate_relative_abudance
        1       diamond_blastx
        1       filter_mapped_reads
        1       get_marker_coverage
        1       index_bam_file
        1       reads_per_marker_protein
        1       sam_to_bam
        1       sort_bam_file
        10

2024-05-30 22:03:25.334570
The sample ./test_data/sample1/sample1.sorted.bam.idxstats.addTaxonomy.final.abundance.txt of SinProVirP pipline is done!

Output file

./
├── sample1.bam
├── sample1.sam
├── sample1.sorted.bai
├── sample1.sorted.bam
├── sample1.sorted.bam.coverage
├── sample1.sorted.bam.filter.coverage.idxstats
├── sample1.sorted.bam.idxstats
├── sample1.sorted.bam.idxstats.abundance
└── sample1.sorted.bam.idxstats.addTaxonomy.final.abundance.txt

0 directories, 9 files

*.final.abundance.txt

VCID	Relative_abundance	VC_lineage	VC_lifestyle	VC_host_lineage
VC_3803_0	0.0123774546010874	Viruses;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;;	Temperate	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Ruminococcus;
VC_2046_0	0.0362526365705308	Viruses;Uroviricota;Caudoviricetes;Caudovirales;Myoviridae;;	Temperate	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Faecalibacterium;
VC_2673_0	0.0140144228362927	Viruses;Uroviricota;Caudoviricetes;;;;	Unknown	Bacteria;Firmicutes;Clostridia;Eubacteriales;Oscillospiraceae;Oscillibacter;
VC_6076_0	0.0063759751235582	Viruses;Uroviricota;Caudoviricetes;Caudovirales;Myoviridae;;	Virulent	Bacteria;Bacteroidota;Bacteroidia;Bacteroidales;Bacteroidaceae;Bacteroides;

About parameters

If you want to adjust the parameters about SinProVirP pipeline used, you could change the 'config.yaml' file.

# The value setting
threads: 8 # see usage instructions - can increase if you have more threads available
IDENTITY: 90 # diamond balst used identity
COVERAGE: 80 # diamond balst used coverage
MARKER_COVERAGE: 0.7 # SinProVirP profiling used marker coverage
MARKER_RATIO: 0.5 # SinProVirP profiling used marker ratio

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
DCS.cloud		DCS.cloud
database		database
script		script
test_data		test_data
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
SinProVirP_env.yaml		SinProVirP_env.yaml
Snakefile.py		Snakefile.py
config.yaml		config.yaml
sample_file.txt		sample_file.txt
work.sh		work.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SinProVirP: Signature-Protein based Virome Profiling tool

A signature protein-based tool developed for genus-level profiling of the human gut virome.

Citation

Installation

Test Your Installation

Input file

Quick start

Output file

About parameters

DCS Cloud avaliable

About

Uh oh!

Releases 1

Packages

Languages

sample_id	fq1	fq2
s1	s1.R1.fq.gz	s1.R2.fq.gz
s2	s2.R1.fq.gz	s2.R2.fq.gz
s3	s3.R1.fq.gz	s3.R2.fq.gz
s4	s4.R1.fq.gz	s4.R2.fq.gz

sample_id	fq
s1	s1.R1R2.fq.gz
s2	s2.R1R2.fq.gz
s3	s3.R1R2.fq.gz
s4	s4.R1R2.fq.gz

License

wen1112/SinProVirP-v1.0

Folders and files

Latest commit

History

Repository files navigation

SinProVirP: Signature-Protein based Virome Profiling tool

A signature protein-based tool developed for genus-level profiling of the human gut virome.

Citation

Installation

Test Your Installation

Input file

Quick start

Output file

About parameters

DCS Cloud avaliable

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages