Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault while merging PE reads #29

Open
hudenise opened this issue Jun 11, 2015 · 5 comments
Open

Segmentation fault while merging PE reads #29

hudenise opened this issue Jun 11, 2015 · 5 comments

Comments

@hudenise
Copy link

Hi, I encounter an issue (exit code 139) with this error message
Processing reads... |/tmp/.lsbtmp4010/.lsbatch/1433770267.781893: line 8: 10935 Segmentation fault (core dumped) /nfs/seqdb/production/interpro/development/metagenomics/pipeline/tools/bin/SeqPrep -f ERR884064_1.fastq -r ERR884064_2.fastq -1 ERR884064_1_paired.fastq.gz -2 ERR884064_2_paired.fastq.gz -3 ERR884064_1_unpaired.fastq.gz -4 ERR884064_2_unpaired.fastq.gz -s ERR884064_paired.fastq.gz
I checked the read files and they do not contain non-ascii characters and all quality score lines have the same length than the sequence lines. I have successfully ran SeqPrep, with the same parameters, before and since so the installation is correct. Any suggestion as how to successfully merge the files? Thanks Hubert

@hudenise
Copy link
Author

hudenise commented Jul 2, 2015

After investigating the files, the issue was that a few sequences from the _1 file were quite short (<10 nt) while they counterparts from file _2 were significantly longer or also very short.
Example from 1 file:
@Miseq ....
TGCAGGATATCGCGGCCGT
+
BCC-@ECFGGD7F7@FE+6
and the counterpart in 2 file:
TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN
+
-6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6+3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*
**3,95DD,__3__1/,0+;9+80)0A)/4/))..););6;
()/2).))./)0);)1474)4?4)6))))(.640)))1)4)).,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are aware, cheers Hubert

@jstjohn
Copy link
Owner

jstjohn commented Jul 2, 2015

Looks like your bcl2fastq job is already doing some kind of trimming for
you. This is not expected input for seqprep. Maybe if you have say over
bcl2fastq parameters you could turn this off? Not sure which settings or
defaults would do this in your version.
On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:

After investigating the files, the issue was that a few sequences from the
_1 file were quite short (<10 nt) while they counterparts from file _2 were
significantly longer or also very short.
Example from _1 file:
@Miseq ....
TGCAGGATATCGCGGCCGT
+
BCC-@ECFGGD7F7@FE+6
and the counterpart in _2 file:

TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN
+
-6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C
,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F
,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6
+3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3
1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4))
.,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are
aware, cheers Hubert


Reply to this email directly or view it on GitHub
#29 (comment).

@hudenise
Copy link
Author

hudenise commented Jul 2, 2015

Thanks, I will forward your email to the user who generated the
sequences submitted to our pipeline, cheers Hubert

On 02/07/2015 14:29, John St. John wrote:

Looks like your bcl2fastq job is already doing some kind of trimming for
you. This is not expected input for seqprep. Maybe if you have say over
bcl2fastq parameters you could turn this off? Not sure which settings or
defaults would do this in your version.
On Thu, Jul 2, 2015 at 2:23 AM hudenise [email protected] wrote:

After investigating the files, the issue was that a few sequences from the
_1 file were quite short (<10 nt) while they counterparts from file _2 were
significantly longer or also very short.
Example from _1 file:
@Miseq ....
TGCAGGATATCGCGGCCGT
+
BCC-@ECFGGD7F7@FE+6
and the counterpart in _2 file:

TATCCGTTACCTATCGTCCGCGAGAAAGCTAGTAGACACACAGCACCCAGGCGTGCAAGTCACCTTCAGATGACTACACCGAACCTGGTTAAAAGAGTCTATGGCCACCCCTACTTTAGAGTAAAAAAACCACACCTCTATTGCGCTGGGTACTAGAATAAGCTAACTACCTAGTCCGTTTCCGGCTGACTTTTTTGGGAATAACATACCACCCATCGTGATTACGTTCGCCACCGTTCTACTGCTCTCTTCACTAGGTTTGCACATTGTTTGTTCCCCTATGGCTAATTTATAGAGGACN
+
-6A,A@D<@,CFC,C,E86C+:++8CE,,C,6,;,,CAFGGCE,,C66DE,,C:@C
,9,,,<6,6CE<,,,,<,,C,,<,7++8+B88,AFF@F
,,,,:5?B,,AECFG:+AD,5AF9;,?,4,C,,+++488,=+,=,,,73,+6@6
+3,26=,@,733=6,,==FCG,@,,45<?6,,1,+*_**3,95DD,_3
1/,0+;9+80)0A)/4/))..););6;()/2).))./)0);)1474)4?4)6))))(.640)))1)4))
.,,()),8((,.8((-)).9)-4...((,!

It is the first time I encountered such issue so I don't know if you are
aware, cheers Hubert


Reply to this email directly or view it on GitHub
#29 (comment).


Reply to this email directly or view it on GitHub:
#29 (comment)

Dr Hubert DENISE

Metagenomics
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus,
Hinxton,
Cambridge, CB10 1SD,
United Kingdom
Tel : (+44)01223 494102

@chloeloiseau
Copy link

Hello,
I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error:
/tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log

Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?

Many thanks for you help on this issue
Chloé

@hudenise
Copy link
Author

Dear Chloe,
Indeed we're using SeqPrep upstream of Trimmomatic on the raw reads with
just the primer/adapter removed. Then we apply Trimmomatic on the merged
file. Sincerely, Hubert

On 15/11/2016 09:57, chloeloiseau wrote:

Hello,
I am using SeqPrep after Trimmomatic (which trims the reads) and I am experiencing for some files this error:
/tmp/sge_spool/lhi10/job_scripts/19611347: line 20: 15553 Segmentation fault (core dumped) SeqPrep -f trimmomatic_input_1P.fastq -r trimmomatic_input_2P.fastq -1 seqprep_1_trimmed.fastq.gz -2 seqprep_2_trimmed.fastq.gz -3 seqprep_1_notmerged.fastq.gz -4 seqprep_2_notmerged.fastq.gz -A AGATCGGAAGAGCACACGTCT -B AGATCGGAAGAGCGTCGTGTA -L 20 -o 40 -s seqprep_merged.fastq.gz -2 file_seqprep.txt.gz 2>> seqprep.log

Above, you mention that the segmentation fault may be explained by the fact that the read pairs in the input file to SeqPrep may not have the same length. Does this mean Trimmomatic should not be used prior to SeqPrep and that the program expects the same length of read pairs to work?

Many thanks for you help on this issue
Chloé

Dr Hubert DENISE

Metagenomics
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus,
Hinxton,
Cambridge, CB10 1SD,
United Kingdom
Tel : (+44)01223 494102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants