You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!
You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below with [xx])…
So I think the adapter sequences are correct, but I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?
Many thanks!
The text was updated successfully, but these errors were encountered:
I should also add, that the depth of this dip differs between my different samples (i.e. some sample have barely any reads between 40 and 50bp, whereas some have barely any missing). The only thing which differs between samples is the 8bp index, found within the adapter sequence. I'm not sure how Seqprep removes the adapter sequence, but I don't think this should affect it? Again, any thoughts welcome.
Has anyone come across anything like this in the last 5 years?! Can anyone give me any suggestions as to what I can try to figure out what is going on?
Hello, I was wondering if someone could help me?
I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!
So: read length plots resemble this:
L120_2.read_lengths.pdf
I'm running SeqPrep as follows:
SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10
You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below with [xx])…
Read1 quality trimmed, L120_2 above:
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCT[AGATCGGAAGA] + A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG CTAGACCGCGAATACACACA[AGATCGGAAGAGCACACGTCTGAACTCCAG] + 33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTG[AGATCG] + #3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCT[AGATCGGAAGAGC] + 33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG CCATCCTAGTGGGGGGAAAT[AGATCGGAAGAGCACACGTCTGAACTCCAA] + <330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBC
Read 2, quality trimmed, for L120_2 above.
@HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTG[GGAAGAGCGTC] + BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF
@HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAG[GGAAAGAGTGTCG] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCAT[GGAAGA] + CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCT[GGAAGAGCGTCGT] + CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG ATTTCCCCCCACTAGGATGT[GGAAGAGCGTCGTGTAGGGAAAGAGTGTCG] + BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFG
So I think the adapter sequences are correct, but I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?
Many thanks!
The text was updated successfully, but these errors were encountered: