Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing reads between 40 and 50bp after trimming? #34

Open
jessicaathomas opened this issue Apr 7, 2016 · 2 comments
Open

Missing reads between 40 and 50bp after trimming? #34

jessicaathomas opened this issue Apr 7, 2016 · 2 comments

Comments

@jessicaathomas
Copy link

Hello, I was wondering if someone could help me?

I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!

So: read length plots resemble this:
L120_2.read_lengths.pdf

I'm running SeqPrep as follows:

SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10

You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below with [xx])…

Read1 quality trimmed, L120_2 above:

@HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCT[AGATCGGAAGA] + A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG CTAGACCGCGAATACACACA[AGATCGGAAGAGCACACGTCTGAACTCCAG] + 33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTG[AGATCG] + #3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG

@HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCT[AGATCGGAAGAGC] + 33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG CCATCCTAGTGGGGGGAAAT[AGATCGGAAGAGCACACGTCTGAACTCCAA] + <330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBC

Read 2, quality trimmed, for L120_2 above.

@HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTG[GGAAGAGCGTC] + BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAG[GGAAAGAGTGTCG] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCAT[GGAAGA] + CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCT[GGAAGAGCGTCGT] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG ATTTCCCCCCACTAGGATGT[GGAAGAGCGTCGTGTAGGGAAAGAGTGTCG] + BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFG

So I think the adapter sequences are correct, but I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?

Many thanks!

@jessicaathomas
Copy link
Author

I should also add, that the depth of this dip differs between my different samples (i.e. some sample have barely any reads between 40 and 50bp, whereas some have barely any missing). The only thing which differs between samples is the 8bp index, found within the adapter sequence. I'm not sure how Seqprep removes the adapter sequence, but I don't think this should affect it? Again, any thoughts welcome.

@jessicaathomas
Copy link
Author

Has anyone come across anything like this in the last 5 years?! Can anyone give me any suggestions as to what I can try to figure out what is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant