Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No alignment when adapters are found #16

Open
mairae opened this issue Apr 2, 2014 · 3 comments
Open

No alignment when adapters are found #16

mairae opened this issue Apr 2, 2014 · 3 comments

Comments

@mairae
Copy link

mairae commented Apr 2, 2014

I'm trying to merge reads with SeqPrep.

I have read 1:

@M01271:10:000000000-A3WGH:1:1101:15333:1723 1:N:0:1
GTTAATGTTTGGCTGCGAGGTCGGGGCGGACGGACGCTTCCTCCGCGGGTACCAGCAGGACGCCTATGATGGCGCTGACTACATCGCCCTGAATGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGC
+
BBBBAFFFFFF@GGGEGGGGGGGGGGGGGEGGGGGECFGEHCGHFFGGGEBHHHGAFGHEHGGGGGHHEHHHFDDGGGGHHGHHHGGGGGGGGHHHGHFGHFHHFHHHGHGGHCGDACGGGFF;BFFFFFFFFDFFFFFFFFFFFFDFF-@FFFE?ADFFADCAF:FFCF-DC;BFBFFBBFFBF/AFEFAFFFFFCF9

and read 2:

@M01271:10:000000000-A3WGH:1:1101:15333:1723 2:N:0:1
TGGTTGGCCTTCCCCATCTCCAGGTATCTGCGCAGCCACTCCACGCACGTGCCCCCCAGGTAGGCTCTGTCTTGTTCAGCCGCACCTGCCGCCTCCCACTTGCGCCGGGTGATCTGAGCCGCGGTGTCCG
+
BABBBCCFFFFFGGGFGGGGGGHHHFHHHHHGGGGGHHHHHHHHGGGGGHHHHHGGGGGGHFHGHHHHHHHHHHHHHHGGHGGGEGHHGHCGGGGGGHHHHHGGGGGCGFG/?GGFFFHHFGGGD@FHHD

Looking at them, they do have a perfect overlap.

Read 1
GTTAATGTTTGGCTGCGAGGTCGGGGCGGACGGACGCTTCCTCCGCGGGTACCAGCAGGACGCCTATGATGGCGCTGACTACATCGCCCTGAATGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGC
-----------------------------------------------------------------------------------------------------------------------CGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGCACGTGCGTGGAGTGGCTGCGCAGATACCTGGAGATGGGGAAGGCCAACCA
Read 2 reverse complement

Then I took the first few base pairs of each read as adapter sequences:

A GTTAATGTTTGGCTGCGA
B TGGTTGGCCTTCCCCATC

And used SeqPrep to merge them:

SeqPrep -f rabbit_R1_01.fastq -r rabbit_R2_01.fastq -1 R1.fastq.gz -2 R2.fastq.gz -3 R1.disc.fastq.gz -4 R2.disc.fastq.gz -s merged.fastq.gz -E align.fastq.gz -A 'GTTAATGTTTGGCTGCGA' -B 'TGGTTGGCCTTCCCCATC'

But it seems not to be able to do the alignment. Am I missing something?

Processing reads... |
Pairs Processed:        1
Pairs Merged:   0
Pairs With Adapters:    1
Pairs Discarded:        1

(When I put something like CCCCCCCCCCCCC as adapters, the reads get merged.)

Thanks a lot,
Marie

@jstjohn
Copy link
Owner

jstjohn commented Apr 2, 2014

Hello,

What do you mean that you take the first few base-pairs of the read as the adapter sequence?

For the alignment you drew, I think that I would not expect there to be adapter sequence present. Adapter sequence shows up when you have the opposite case, something like this (excuse the bogus alignment):

read1: -------AAAAAAAAAAAA
read2: AAAAAAAAAAA---------

In this case the overhang is read2 coming before read1. This is what happens when the underlying fragment is shorter than the read length.

If you use the first few BP of read1 as the “adapter” then SeqPrep will go ahead and assume that the pair of reads starts with the adapter sequence (is an adapter dimer), in which case you want to throw it out completely.

Make sense?

Thanks,
John

On Apr 2, 2014, at 6:52 AM, mairae [email protected] wrote:

I'm trying to merge reads with SeqPrep.

I have read 1:

@M01271:10:000000000-A3WGH:1:1101:15333:1723 1:N:0:1
GTTAATGTTTGGCTGCGAGGTCGGGGCGGACGGACGCTTCCTCCGCGGGTACCAGCAGGACGCCTATGATGGCGCTGACTACATCGCCCTGAATGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGC
+
BBBBAFFFFFF@GGGEGGGGGGGGGGGGGEGGGGGECFGEHCGHFFGGGEBHHHGAFGHEHGGGGGHHEHHHFDDGGGGHHGHHHGGGGGGGGHHHGHFGHFHHFHHHGHGGHCGDACGGGFF;BFFFFFFFFDFFFFFFFFFFFFDFF-@fffe?ADFFADCAF:FFCF-DC;BFBFFBBFFBF/AFEFAFFFFFCF9
and read 2:

@M01271:10:000000000-A3WGH:1:1101:15333:1723 2:N:0:1
TGGTTGGCCTTCCCCATCTCCAGGTATCTGCGCAGCCACTCCACGCACGTGCCCCCCAGGTAGGCTCTGTCTTGTTCAGCCGCACCTGCCGCCTCCCACTTGCGCCGGGTGATCTGAGCCGCGGTGTCCG
+
BABBBCCFFFFFGGGFGGGGGGHHHFHHHHHGGGGGHHHHHHHHGGGGGHHHHHGGGGGGHFHGHHHHHHHHHHHHHHGGHGGGEGHHGHCGGGGGGHHHHHGGGGGCGFG/?GGFFFHHFGGGD@FHHD
Looking at them, they do have a perfect overlap.

Read 1
GTTAATGTTTGGCTGCGAGGTCGGGGCGGACGGACGCTTCCTCCGCGGGTACCAGCAGGACGCCTATGATGGCGCTGACTACATCGCCCTGAATGAGGACCTGAGCTCCTGGACCGCGGCGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGC
-----------------------------------------------------------------------------------------------------------------------CGGACACCGCGGCTCAGATCACCCGGCGCAAGTGGGAGGCGGCAGGTGCGGCTGAACAAGACAGAGCCTACCTGGGGGGCACGTGCGTGGAGTGGCTGCGCAGATACCTGGAGATGGGGAAGGCCAACCA
Read 2 reverse complement
Then I took the first few base pairs of each read as adapter sequences:

A GTTAATGTTTGGCTGCGA
B TGGTTGGCCTTCCCCATC
And used SeqPrep to merge them:

SeqPrep -f rabbit_R1_01.fastq -r rabbit_R2_01.fastq -1 R1.fastq.gz -2 R2.fastq.gz -3 R1.disc.fastq.gz -4 R2.disc.fastq.gz -s merged.fastq.gz -E align.fastq.gz -A 'GTTAATGTTTGGCTGCGA' -B 'TGGTTGGCCTTCCCCATC'
But it seems not to be able to do the alignment. Am I missing something?

Processing reads... |
Pairs Processed: 1
Pairs Merged: 0
Pairs With Adapters: 1
Pairs Discarded: 1
(When I put something like CCCCCCCCCCCCC as adapters, the reads get merged.)

Thanks a lot,
Marie


Reply to this email directly or view it on GitHub.

@mairae
Copy link
Author

mairae commented Apr 3, 2014

Hello John,
thanks a lot for your quick response!

It is true, that I don't necessarily want/expect to see adapter sequences at the beginning of a read in the SeqPrep alignment. The alignment I drew did not come from SeqPrep though and was just to demonstrate that I would expect that the two reads in question should be able to be merged.

My data mostly looks like this:

Read 1 AAAAACCCCCCCCCCGG (adapter 1; sequence; part of rc of adapter 2)
Read 2 CCCCCGGGGGGGGGGTTT (adapter 2; rc of sequence; part of rc of adapter 1)

(So the fragment should not be shorter than the read lengths, but it's only a few bases longer.)
Then I would assume the alignment to be like

AAAAACCCCCCCCCCGG---
---AAACCCCCCCCCCGGGGG

or

------CCCCCCCCCCGG---
---AAACCCCCCCCCC-----

with trimmed adapter sequences.

Is this not what I should be expecting?
I thought, the reads might get discarded because the score of the alignment is not good enough. That's why I selected a read pair that should have a perfect overlap.
Additionally, I used the first bases of read 1 as forward primer (-A) and the first bases of read 2 as reverse primer (-B). (Those sequences are very similar to the real primer sequences I'm using, but for testing I did not want to have any mismatches.)
So, in the toy example, the forward primer would be AAAAA and the reverse primer CCCCC.

And on a related note: I think, in an earlier version, it was possible, to put empty strings as adapter sequences (-A '' -B ''), but it does not seem to be working anymore. (Similar problem: SeqPrep reports the reads as "Pairs With Adapters" and as "Pairs Discarded".)
This would be useful for me, since I'm using different different adapters and don't necessarily need SeqPrep to strip the adapters for me. Is there any way to do this?

Best regards,
Marie

@jstjohn
Copy link
Owner

jstjohn commented Apr 7, 2014

There wouldn’t be an or situation like you drew below. The way illumina adapters work it would always be the second case if indeed adapter sequences are present. They always happen after the genomic DNA in the first read. When there is no genomic DNA present, then it happens at the beginning of the first read. If the adapter happens at the beginning of the first read then there should not be any genomic DNA present, unless you are following a non-standard paired-end illumina protocol, like long insert mate-pair or something.

On Apr 3, 2014, at 4:53 AM, mairae [email protected] wrote:

Hello John,
thanks a lot for your quick response!

It is true, that I don't necessarily want/expect to see adapter sequences at the beginning of a read in the SeqPrep alignment. The alignment I drew did not come from SeqPrep though and was just to demonstrate that I would expect that the two reads in question should be able to be merged.

My data mostly looks like this:

Read 1 AAAAACCCCCCCCCCGG (adapter 1; sequence; part of rc of adapter 2)
Read 2 CCCCCGGGGGGGGGGTTT (adapter 2; rc of sequence; part of rc of adapter 1)
(So the fragment should not be shorter than the read lengths, but it's only a few bases longer.)
Then I would assume the alignment to be like

AAAAACCCCCCCCCCGG---
---AAACCCCCCCCCCGGGGG
or

------CCCCCCCCCCGG---
---AAACCCCCCCCCC-----
with trimmed adapter sequences.

Is this not what I should be expecting?
I thought, the reads might get discarded because the score of the alignment is not good enough. That's why I selected a read pair that should have a perfect overlap.
Additionally, I used the first bases of read 1 as forward primer (-A) and the first bases of read 2 as reverse primer (-B). (Those sequences are very similar to the real primer sequences I'm using, but for testing I did not want to have any mismatches.)
So, in the toy example, the forward primer would be AAAAA and the reverse primer CCCCC.

And on a related note: I think, in an earlier version, it was possible, to put empty strings as adapter sequences (-A '' -B ''), but it does not seem to be working anymore. (Similar problem: SeqPrep reports the reads as "Pairs With Adapters" and as "Pairs Discarded".)
This would be useful for me, since I'm using different different adapters and don't necessarily need SeqPrep to strip the adapters for me. Is there any way to do this?

Best regards,
Marie


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants