Skip to content

Commit

Permalink
Version 3.0.0 (#223)
Browse files Browse the repository at this point in the history
* 🔖 bump version 3.0.0

* 📝 docs user::support languages update

* 🔧 use a dedicated reqs.txt for the optional build
  • Loading branch information
Ousret authored Oct 20, 2022
1 parent db134f3 commit 0ec52ef
Show file tree
Hide file tree
Showing 5 changed files with 37 additions and 5 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,8 @@ jobs:
#CIBW_BUILD_FRONTEND: "build"
CIBW_ARCHS_MACOS: x86_64 arm64 universal2
CIBW_ENVIRONMENT: CHARSET_NORMALIZER_USE_MYPYC='1'
CIBW_BEFORE_BUILD: pip install -r build-requirements.txt
#CIBW_CONFIG_SETTINGS: "--build-option=--no-isolation"
CIBW_BEFORE_BUILD: pip install -r dev-requirements.txt
CIBW_TEST_REQUIRES: pytest codecov pytest-cov
CIBW_TEST_COMMAND: pytest {package}/tests
CIBW_SKIP: pp*
Expand Down
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,34 @@
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)

### Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)

### Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

### Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- Sphinx warnings when generating the documentation

### Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- Breaking: Method `first()` and `best()` from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function `normalize`
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
- Support for the backport `unicodedata2`

## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)

### Added
Expand Down
6 changes: 6 additions & 0 deletions build-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# in the meantime we migrate to pyproject.toml
# this represent the minimum requirement to build (for the optional speedup)
mypy==0.982; python_version >= "3.7"
mypy==0.971; python_version < "3.7"
build==0.8.0
wheel==0.37.1
2 changes: 1 addition & 1 deletion charset_normalizer/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
Expose version
"""

__version__ = "3.0.0rc1"
__version__ = "3.0.0"
VERSION = __version__.split(".")
4 changes: 1 addition & 3 deletions docs/user/support.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,6 @@ Bulgarian,
Croatian,
Hindi,
Estonian,
Simple English,
Thai,
Greek,
Tamil,
Classical Chinese.
Tamil.

0 comments on commit 0ec52ef

Please sign in to comment.