spec: resolved encoding name lowercased for ISO-8859-x/KOI8-x/EUC-JP (Document.encoding casing)

## Summary

`Document.encoding` returns the resolved encoding's *label* (ASCII-lowercased) instead of the exact-case canonical **Name** for `ISO-8859-2`, `ISO-8859-5`, `ISO-8859-7`, `ISO-8859-15`, `KOI8-R`, `KOI8-U`, and `EUC-JP`. The CJK/Unicode encodings already return the correct exact-case name (`Shift_JIS`, `Big5`, `GBK`, `EUC-KR`, `UTF-8`), which shows the lowercase ISO-8859/KOI8/EUC-JP entries are an oversight rather than deliberate normalization.

## Spec

[Encoding Standard §4.2 "Names and labels"](https://encoding.spec.whatwg.org/#names-and-labels):

> An encoding has a name and one or more labels...

The encodings table's **Name** column reads exactly `ISO-8859-2`, `ISO-8859-5`, `ISO-8859-7`, `ISO-8859-15`, `KOI8-R`, `KOI8-U`, `EUC-JP` (and `Shift_JIS`, `Big5`, `GBK`, `EUC-KR`, `UTF-8`). The spec further notes:

> for each encoding, ASCII-lowercasing its name yields one of its labels

i.e. the lowercase form is a *label*, not the *name*.

[DOM Standard §4.5](https://dom.spec.whatwg.org/#dom-document-characterset): the `characterSet`/`charset`/`inputEncoding` getter steps return "this's ... encoding's **name**" — the exact-case Name column value, which is what `Document.encoding` surfaces.

## Repro

\`\`\`python
import turbohtml
for c in ['iso-8859-2','iso-8859-5','iso-8859-7','iso-8859-15','koi8-r','koi8-u','euc-jp','shift_jis','big5','gbk','euc-kr','utf-8']:
    print(c, '->', turbohtml.parse(('<meta charset=\"%s\"><p>x' % c).encode()).encoding)
\`\`\`

Output:

\`\`\`
iso-8859-2 -> iso-8859-2      # expected ISO-8859-2
iso-8859-5 -> iso-8859-5      # expected ISO-8859-5
iso-8859-7 -> iso-8859-7      # expected ISO-8859-7
iso-8859-15 -> iso-8859-15    # expected ISO-8859-15
koi8-r -> koi8-r             # expected KOI8-R
koi8-u -> koi8-u             # expected KOI8-U
euc-jp -> euc-jp             # expected EUC-JP
shift_jis -> Shift_JIS       # correct (exact-case Name)
big5 -> Big5                 # correct
gbk -> GBK                   # correct
euc-kr -> EUC-KR             # correct
utf-8 -> UTF-8               # correct
\`\`\`

## Expected vs actual

| charset label | spec Name (expected) | turbohtml actual |
|---|---|---|
| iso-8859-2 | \`ISO-8859-2\` | \`iso-8859-2\` |
| iso-8859-5 | \`ISO-8859-5\` | \`iso-8859-5\` |
| iso-8859-7 | \`ISO-8859-7\` | \`iso-8859-7\` |
| iso-8859-15 | \`ISO-8859-15\` | \`iso-8859-15\` |
| koi8-r | \`KOI8-R\` | \`koi8-r\` |
| koi8-u | \`KOI8-U\` | \`koi8-u\` |
| euc-jp | \`EUC-JP\` | \`euc-jp\` |

The exact-case CJK/Unicode results prove the intent is to return the Name column, so the lowercase ISO-8859/KOI8/EUC-JP entries are inconsistent.

## html5lib (klass B — shared lag)

\`\`\`python
import webencodings
for c in ['iso-8859-2','koi8-r','euc-jp','shift_jis','utf-8']:
    print(c, '->', webencodings.lookup(c).name)
# iso-8859-2 -> iso-8859-2 ; koi8-r -> koi8-r ; euc-jp -> euc-jp ; shift_jis -> shift_jis ; utf-8 -> utf-8
\`\`\`

html5lib's `webencodings` ASCII-lowercases **all** names uniformly, so it diverges from the exact-case Name column for every encoding (a documented uniform normalization). Both impls report a label where the spec requires the name; turbohtml is internally inconsistent (exact-case for CJK/Unicode, lowercase for ISO-8859/KOI8/EUC-JP).

## Severity

Low — decoding itself is correct (the codec column is unaffected); only the reported `Document.encoding` name string casing is wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: resolved encoding name lowercased for ISO-8859-x/KOI8-x/EUC-JP (Document.encoding casing) #97

Summary

Spec

Repro

Expected vs actual

html5lib (klass B — shared lag)

iso-8859-2 -> iso-8859-2 ; koi8-r -> koi8-r ; euc-jp -> euc-jp ; shift_jis -> shift_jis ; utf-8 -> utf-8

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

charset label	spec Name (expected)	turbohtml actual
iso-8859-2	`ISO-8859-2`	`iso-8859-2`
iso-8859-5	`ISO-8859-5`	`iso-8859-5`
iso-8859-7	`ISO-8859-7`	`iso-8859-7`
iso-8859-15	`ISO-8859-15`	`iso-8859-15`
koi8-r	`KOI8-R`	`koi8-r`
koi8-u	`KOI8-U`	`koi8-u`
euc-jp	`EUC-JP`	`euc-jp`

spec: resolved encoding name lowercased for ISO-8859-x/KOI8-x/EUC-JP (Document.encoding casing) #97

Description

Summary

Spec

Repro

Expected vs actual

html5lib (klass B — shared lag)

iso-8859-2 -> iso-8859-2 ; koi8-r -> koi8-r ; euc-jp -> euc-jp ; shift_jis -> shift_jis ; utf-8 -> utf-8

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions