Summary
Document.encoding returns the resolved encoding's label (ASCII-lowercased) instead of the exact-case canonical Name for ISO-8859-2, ISO-8859-5, ISO-8859-7, ISO-8859-15, KOI8-R, KOI8-U, and EUC-JP. The CJK/Unicode encodings already return the correct exact-case name (Shift_JIS, Big5, GBK, EUC-KR, UTF-8), which shows the lowercase ISO-8859/KOI8/EUC-JP entries are an oversight rather than deliberate normalization.
Spec
Encoding Standard §4.2 "Names and labels":
An encoding has a name and one or more labels...
The encodings table's Name column reads exactly ISO-8859-2, ISO-8859-5, ISO-8859-7, ISO-8859-15, KOI8-R, KOI8-U, EUC-JP (and Shift_JIS, Big5, GBK, EUC-KR, UTF-8). The spec further notes:
for each encoding, ASCII-lowercasing its name yields one of its labels
i.e. the lowercase form is a label, not the name.
DOM Standard §4.5: the characterSet/charset/inputEncoding getter steps return "this's ... encoding's name" — the exact-case Name column value, which is what Document.encoding surfaces.
Repro
```python
import turbohtml
for c in ['iso-8859-2','iso-8859-5','iso-8859-7','iso-8859-15','koi8-r','koi8-u','euc-jp','shift_jis','big5','gbk','euc-kr','utf-8']:
print(c, '->', turbohtml.parse(('<meta charset="%s">
x' % c).encode()).encoding)
```
Output:
```
iso-8859-2 -> iso-8859-2 # expected ISO-8859-2
iso-8859-5 -> iso-8859-5 # expected ISO-8859-5
iso-8859-7 -> iso-8859-7 # expected ISO-8859-7
iso-8859-15 -> iso-8859-15 # expected ISO-8859-15
koi8-r -> koi8-r # expected KOI8-R
koi8-u -> koi8-u # expected KOI8-U
euc-jp -> euc-jp # expected EUC-JP
shift_jis -> Shift_JIS # correct (exact-case Name)
big5 -> Big5 # correct
gbk -> GBK # correct
euc-kr -> EUC-KR # correct
utf-8 -> UTF-8 # correct
```
Expected vs actual
| charset label |
spec Name (expected) |
turbohtml actual |
| iso-8859-2 |
`ISO-8859-2` |
`iso-8859-2` |
| iso-8859-5 |
`ISO-8859-5` |
`iso-8859-5` |
| iso-8859-7 |
`ISO-8859-7` |
`iso-8859-7` |
| iso-8859-15 |
`ISO-8859-15` |
`iso-8859-15` |
| koi8-r |
`KOI8-R` |
`koi8-r` |
| koi8-u |
`KOI8-U` |
`koi8-u` |
| euc-jp |
`EUC-JP` |
`euc-jp` |
The exact-case CJK/Unicode results prove the intent is to return the Name column, so the lowercase ISO-8859/KOI8/EUC-JP entries are inconsistent.
html5lib (klass B — shared lag)
```python
import webencodings
for c in ['iso-8859-2','koi8-r','euc-jp','shift_jis','utf-8']:
print(c, '->', webencodings.lookup(c).name)
iso-8859-2 -> iso-8859-2 ; koi8-r -> koi8-r ; euc-jp -> euc-jp ; shift_jis -> shift_jis ; utf-8 -> utf-8
```
html5lib's webencodings ASCII-lowercases all names uniformly, so it diverges from the exact-case Name column for every encoding (a documented uniform normalization). Both impls report a label where the spec requires the name; turbohtml is internally inconsistent (exact-case for CJK/Unicode, lowercase for ISO-8859/KOI8/EUC-JP).
Severity
Low — decoding itself is correct (the codec column is unaffected); only the reported Document.encoding name string casing is wrong.
Summary
Document.encodingreturns the resolved encoding's label (ASCII-lowercased) instead of the exact-case canonical Name forISO-8859-2,ISO-8859-5,ISO-8859-7,ISO-8859-15,KOI8-R,KOI8-U, andEUC-JP. The CJK/Unicode encodings already return the correct exact-case name (Shift_JIS,Big5,GBK,EUC-KR,UTF-8), which shows the lowercase ISO-8859/KOI8/EUC-JP entries are an oversight rather than deliberate normalization.Spec
Encoding Standard §4.2 "Names and labels":
The encodings table's Name column reads exactly
ISO-8859-2,ISO-8859-5,ISO-8859-7,ISO-8859-15,KOI8-R,KOI8-U,EUC-JP(andShift_JIS,Big5,GBK,EUC-KR,UTF-8). The spec further notes:i.e. the lowercase form is a label, not the name.
DOM Standard §4.5: the
characterSet/charset/inputEncodinggetter steps return "this's ... encoding's name" — the exact-case Name column value, which is whatDocument.encodingsurfaces.Repro
```python
import turbohtml
for c in ['iso-8859-2','iso-8859-5','iso-8859-7','iso-8859-15','koi8-r','koi8-u','euc-jp','shift_jis','big5','gbk','euc-kr','utf-8']:
print(c, '->', turbohtml.parse(('<meta charset="%s">
x' % c).encode()).encoding)
```
Output:
```
iso-8859-2 -> iso-8859-2 # expected ISO-8859-2
iso-8859-5 -> iso-8859-5 # expected ISO-8859-5
iso-8859-7 -> iso-8859-7 # expected ISO-8859-7
iso-8859-15 -> iso-8859-15 # expected ISO-8859-15
koi8-r -> koi8-r # expected KOI8-R
koi8-u -> koi8-u # expected KOI8-U
euc-jp -> euc-jp # expected EUC-JP
shift_jis -> Shift_JIS # correct (exact-case Name)
big5 -> Big5 # correct
gbk -> GBK # correct
euc-kr -> EUC-KR # correct
utf-8 -> UTF-8 # correct
```
Expected vs actual
The exact-case CJK/Unicode results prove the intent is to return the Name column, so the lowercase ISO-8859/KOI8/EUC-JP entries are inconsistent.
html5lib (klass B — shared lag)
```python
import webencodings
for c in ['iso-8859-2','koi8-r','euc-jp','shift_jis','utf-8']:
print(c, '->', webencodings.lookup(c).name)
iso-8859-2 -> iso-8859-2 ; koi8-r -> koi8-r ; euc-jp -> euc-jp ; shift_jis -> shift_jis ; utf-8 -> utf-8
```
html5lib's
webencodingsASCII-lowercases all names uniformly, so it diverges from the exact-case Name column for every encoding (a documented uniform normalization). Both impls report a label where the spec requires the name; turbohtml is internally inconsistent (exact-case for CJK/Unicode, lowercase for ISO-8859/KOI8/EUC-JP).Severity
Low — decoding itself is correct (the codec column is unaffected); only the reported
Document.encodingname string casing is wrong.