Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated sequences in contaminant_list.txt #53

Open
aushev opened this issue Jun 30, 2020 · 1 comment
Open

Duplicated sequences in contaminant_list.txt #53

aushev opened this issue Jun 30, 2020 · 1 comment

Comments

@aushev
Copy link

aushev commented Jun 30, 2020

I think it's a bit confusing that many sequences in the list are duplicated, for example AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA is listed 5 times, with different names:

  • Illumina DpnII expression PCR Primer 2
  • Illumina DpnII Gex PCR Primer 2
  • Illumina NlaIII expression PCR Primer 2
  • Illumina NlaIII Gex PCR Primer 2
  • Illumina Small RNA PCR Primer 2

As I understand, the output FastQC report will display only the first occurrence in the "Overrepresented sequences" table?

@s-andrews
Copy link
Owner

The set of sequences we have there are a bit haphazard I'm afraid. We can't get a definitive list from ther original vendors as although they will supply them you're required to agree to a license which would mean we couldn't distribute them with FastQC, so we've built up a collection based on user submissions.

It's absolutely possible that the exact same sequence appears in multiple kits under different names, and yes, it will only be the first instance which is reported for a given hit.

We definitely welcome any corrections or clean ups to the lists we have so please do submit a pull request if you have an improved version of the current contmainats file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants