Normalization: Identify Python's default Normalization form

See: http://websec.github.io/unicode-security-guide/character-transformations/#normalization

Identify Python's normalization form when handling Unicode - is this documented?  If so, skip the following tests.

If not documented, test major versions to identify:
- normalization behavior - what normalization form do the core Encoding APIs use by default?

One way to test this might be to use a few specific code points which have known transformations in certain normalization forms.  These include (from http://www.unicode.org/reports/tr15/):

U+212B in _NFC_ becomes U+00C5
U+212B in _NFD_ becomes U+0041 U+030A
The sequence U+1E9B U+0323 in _NFKC_  becomes U+1E69
The sequence U+1E9B U+0323 in _NFKD_  becomes U+0073 U+0323 U+0307


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalization: Identify Python's default Normalization form #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Normalization: Identify Python's default Normalization form #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions