Skip to content

Normalization: Identify Python's default Normalization form #9

Open
@cweb

Description

@cweb

See: http://websec.github.io/unicode-security-guide/character-transformations/#normalization

Identify Python's normalization form when handling Unicode - is this documented? If so, skip the following tests.

If not documented, test major versions to identify:

  • normalization behavior - what normalization form do the core Encoding APIs use by default?

One way to test this might be to use a few specific code points which have known transformations in certain normalization forms. These include (from http://www.unicode.org/reports/tr15/):

U+212B in NFC becomes U+00C5
U+212B in NFD becomes U+0041 U+030A
The sequence U+1E9B U+0323 in NFKC becomes U+1E69
The sequence U+1E9B U+0323 in NFKD becomes U+0073 U+0323 U+0307

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions