Skip to content

Version 3.0.0

Compare
Choose a tag to compare
@Ousret Ousret released this 20 Oct 09:00
· 179 commits to master since this release
0ec52ef

3.0.0 (2022-10-20)

Added

  • Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
  • Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
  • Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
  • normalizer --version now specify if the current version provides extra speedup (meaning mypyc compilation whl)

Changed

  • Build with static metadata (not pyproject.toml yet)
  • Make language detection stricter
  • Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Fixed

  • CLI with opt --normalize fail when using full path for files
  • TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha characters have been fed to it
  • Sphinx warnings when generating the documentation

Removed

  • Coherence detector no longer returns 'Simple English' instead returns 'English'
  • Coherence detector no longer returns 'Classical Chinese' instead returns 'Chinese'
  • Breaking: Method first() and best() from CharsetMatch
  • UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflicts with ASCII)
  • Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
  • Breaking: Top-level function normalize
  • Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
  • Support for the backport unicodedata2

This is the last version (3.0.x) to support Python 3.6 We plan to drop it for 3.1.x