Quality score encoding #147

rjg2186 · 2025-02-07T15:11:28Z

I have a scenario where, I have few thousand reads for which the base quality is >=30 for all the bases in the reads. Majority of the reads have qual 34. When I run through FastQC, the report says that the quality encoding is "Illumina 1.5", but these should be basically Illumina 1.9 phred scale 33. Is there any way to provide the quality encoding as parameter to FastQC. Below is example of few reads

@VH00243:66:AAGGMYWM5:1:1101:66233:4502 1:N:0:NGTCAGACGA+TGTCGCTGGT
ACCTTACGGGACTTTCCTACTTGGCAGTACATCTACGTA
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
@VH00243:66:AAGGMYWM5:1:1101:40159:4900 1:N:0:NGTCAGACGA+TGTCGCTGGT
CACTGAGGCCGCCCGGGCAAAGCCCGGGCGTCGGG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
@VH00243:66:AAGGMYWM5:1:1101:56954:13930 1:N:0:NGTCAGACGA+TGTCGCTGGT
CAGTACGCCTTTGTCACTTTCTTACACTGTCTCCTATAG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

Thanks

The text was updated successfully, but these errors were encountered:

s-andrews · 2025-02-07T15:27:41Z

Something odd is going on with this data. This type of error is certainly possible in fastqc, but it would only happen if there were no bases anywhere in the file with a Phred score of less than 31 (ASCII char < 64) which would be a fairly remarkable dataset unless it's been heavily filtered.

In your case it's even more weird because it appears that every base call in every read has the exact same quality (ASCII=C, Phred33=34, Phred64=3). That would seem very unlikely in any real dataset so either you're looking at a highly selected subset of reads, or something has messed with your quality scores before they got here.

There isn't an option in fastqc to bypass the auto-detection. In theory this could be added but I've never seen a real dataset where this was needed, and adding it would require adding a bunch of other sanity checks because it would allow for really stupid Phred scores to be calculated which could break other parts of the code if it was applied incorrectly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality score encoding #147

Quality score encoding #147

rjg2186 commented Feb 7, 2025 •

edited

Loading

s-andrews commented Feb 7, 2025

Quality score encoding #147

Quality score encoding #147

Comments

rjg2186 commented Feb 7, 2025 • edited Loading

s-andrews commented Feb 7, 2025

rjg2186 commented Feb 7, 2025 •

edited

Loading