Skip to content

Best-Fit Mappings: Test core Python string encoding APIs #2

Open
@cweb

Description

@cweb

See: http://websec.github.io/unicode-security-guide/character-transformations/#best-fit

Identify Python core string encoding APIs, and test major Python versions to document:

  • best-fit mapping behavior - does the API best-fit characters by default?
  • override options - can default be overridden?

One way to test this might be to brute force a large set of Unicode characters by converting them to a target encoding and seeing if they convert to anything 128-bit ASCII.

// Loop through all available encodings
for each available encoding {
  // Loop through first 65,535 code points, starting at 0x80 to avoid 
  // using 128-bit ASCII as the source, because we want to test
  // if ASCII is the outcome!
  for each Unicode character 0x080 to 0xffff {
    convert the Unicode character from UTF-8 or UTF-16 to the target encoding (e.g. shift_jis, ISO-8859-1, etc)
    test if the target character is ASCII 0x00 to 0x80 after the conversion
  }
}

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions