Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Novoplasty on metagenomic data #229

Open
pguenzi-tiberi opened this issue Apr 24, 2024 · 1 comment
Open

Novoplasty on metagenomic data #229

pguenzi-tiberi opened this issue Apr 24, 2024 · 1 comment

Comments

@pguenzi-tiberi
Copy link

Hello,

I used Novoplasty on metagenomic data. I used this file as a configuration file:

============

Project:

Project name = MS_assembly_chloroplast
Type = chloro
Genome Range = 200000-500000
K-mer = 33
Max memory =
Extended log = 0
Save assembled reads = no
Seed Input = /home/guenzitp/work_dir_bettik/results/Organelle_Assembly/rbcl_nucleotides.fasta
Extend seed directly = yes
Reference sequence =
Variance detection =
Chloroplast sequence =

Dataset 1:

Read Length = 300
Insert size = 100
Platform = illumina
Single/Paired = PE
Combined reads =
Forward reads = /bettik/guenzitp/data/RQ/MiSeq/RQ_tr_1.fastq
Reverse reads = /bettik/guenzitp/data/RQ/MiSeq/RQ_tr_2.fastq
Store Hash =

Heteroplasmy:

MAF =
HP exclude list =
PCR-free =

Optional:
Insert size auto = yes
Use Quality Scores = no
Reduce ambigious N's =
Output path = /home/guenzitp/work_dir_bettik/results/Organelle_Assembly/MS_novoplasty_rbcl_nucleo_only_kmer33

=================

and here's the output file. I have a lot of small contigs and the total length is greater than the chloroplast size. Do you know why this doesn't work? I've tried various options (extended seed directly or not, kmer size = 20 or 33) but it never works.

===================


NOVOPlasty: The Organelle Assembler
Version 4.3.1
Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: *** Verify if everything is correct ***

Project:

Project name = MS_assembly_chloroplast
Type = chloro
Genome range = 200000-500000
K-mer = 33
Max memory =
Extended log = 0
Save assembled reads = no
Seed Input = /home/guenzitp/work_dir_bettik/results/Sanguina_Organelle_Assembly/rbcl_nucleotides.fasta
Extend seed directly = yes
Reference sequence =
Variance detection =
Chloroplast sequence =

Dataset 1:

Read Length = 300
Insert size = 100
Platform = illumina
Single/Paired = PE
Combined reads =
Forward reads = /bettik/guenzitp/data/Sanguina/RQ/MiSeq/Sanguina_RQ_tr_1.fastq
Reverse reads = /bettik/guenzitp/data/Sanguina/RQ/MiSeq/Sanguina_RQ_tr_2.fastq
Store Hash =

Heteroplasmy:

Heteroplasmy =
HP exclude list =
PCR-free =

Optional:

Insert size auto = yes
Use Quality Scores =
Output path = /home/guenzitp/work_dir_bettik/results/Sanguina_Organelle_Assembly/MS_novoplasty_rbcl_nucleo_only_kmer33

Reading Input......OK

Building Hash Table......OK

Subsampled fraction: 99.96 %
Forward reads without pair: 28207
Reverse reads without pair: 8711

Start Assembly...

------------Assembly 1 finished: Contigs are automatically merged in Merged_contigs file------------

Contig 01 : 9267 bp
Contig 02 : 2783 bp
Contig 03 : 2592 bp
Contig 04 : 318 bp
Contig 05 : 2615 bp
Contig 06 : 889 bp
Contig 07 : 694 bp
Contig 08 : 3792 bp
Contig 09 : 3554 bp
Contig 10 : 3554 bp
Contig 100 : 6996 bp
Contig 101 : 1279 bp
Contig 102 : 1279 bp
Contig 103 : 4023 bp
Contig 104 : 316 bp
Contig 105 : 9135 bp
Contig 106 : 9351 bp
Contig 107 : 115 bp
Contig 108 : 2122 bp
Contig 109 : 2261 bp
Contig 11 : 4729 bp
Contig 110 : 301 bp
Contig 12 : 311 bp
Contig 13 : 4908 bp
Contig 14 : 2016 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 14 : 5068 bp
Contig 15 : 2030 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 15 : 5068 bp
Contig 16 : 2210 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 16 : 5068 bp
Contig 17 : 1819 bp
Contig 18 : 1671 bp
Contig 19 : 1955 bp
Contig 20 : 2164 bp
Contig 21 : 2978 bp
Contig 22 : 2978 bp
Contig 23 : 5164 bp
Contig 24 : 5164 bp
Contig 25 : 739 bp
Contig 26 : 2993 bp
Contig 27 : 718 bp
Contig 28 : 7175 bp
Contig 29 : 682 bp
Contig 30 : 6923 bp
Contig 31 : 7001 bp
Contig 32 : 10174 bp
Contig 33 : 9985 bp
Contig 34 : 9831 bp
Contig 35 : 9606 bp
Contig 36 : 9825 bp
Contig 37 : 10987 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 37 : 8759 bp
Contig 38 : 6918 bp
Contig 39 : 6918 bp
Contig 40 : 6918 bp
Contig 41 : 9074 bp
Contig 42 : 307 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 42 : 8998 bp
Contig 43 : 3971 bp
Contig 44 : 3810 bp
Contig 45 : 308 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 45 : 8998 bp
Contig 46 : 1422 bp
Contig 47 : 1590 bp
Contig 48 : 12833 bp
Contig 49 : 12704 bp
Contig 50 : 2167 bp
Contig 51 : 1204 bp
Contig 52 : 11231 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 52 : 8720 bp
Contig 53 : 1034 bp
Contig 54 : 11031 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 54 : 12428 bp
Contig 55 : 14986 bp
Contig 56 : 14986 bp
Contig 57 : 682 bp
Contig 58 : 682 bp
Contig 59 : 11231 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 59 : 12428 bp
Contig 60 : 7622 bp
Contig 61 : 390 bp
Contig 62 : 4275 bp
(Check manually if the two contigs overlap to merge them together!)
Contig 62 : 388 bp
Contig 63 : 4452 bp
Contig 64 : 8229 bp
Contig 65 : 8450 bp
Contig 66 : 10697 bp
Contig 67 : 10904 bp
Contig 68 : 3394 bp
Contig 69 : 3590 bp
Contig 70 : 661 bp
Contig 71 : 112 bp
Contig 72 : 429 bp
Contig 73 : 5990 bp
Contig 74 : 6251 bp
Contig 75 : 18412 bp
Contig 76 : 18319 bp
Contig 77 : 8140 bp
Contig 78 : 6735 bp
Contig 79 : 5968 bp
Contig 80 : 18785 bp
Contig 81 : 3224 bp
Contig 82 : 4996 bp
Contig 83 : 608 bp
Contig 84 : 3531 bp
Contig 85 : 4926 bp
Contig 86 : 479 bp
Contig 87 : 302 bp
Contig 88 : 2937 bp
Contig 89 : 3119 bp
Contig 90 : 1953 bp
Contig 91 : 1202 bp
Contig 92 : 1790 bp
Contig 93 : 1994 bp
Contig 94 : 4339 bp
Contig 95 : 8157 bp
Contig 96 : 1328 bp
Contig 97 : 2451 bp
Contig 98 : 2660 bp
Contig 99 : 6817 bp

Total contigs : 120
Largest contig : 18785 bp
Smallest contig : 112 bp
Average insert size : 100 bp

-----------------------------------------Input data metrics-----------------------------------------

Total reads : 70768916
Aligned reads : 1282980
Assembled reads : 240094


Thank you for using NOVOPlasty!

Thank you for your time!

@ndierckx
Copy link
Owner

Hi, It is not meant for metagenomic datasets, are there multiple similar chloroplast genomes or what kind of dataset you have and what do you expect to get?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants