Conversation
…Fchunk to CRFChunk Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
is_thai_char, is_thai, count_thai; rename CRFchunk → CRFChunk
|
@copilot also check the uses of name mangling, private members. See if all them are appropriate. The goal is to have consistent stable public API and try not to expose anything internal or unnecessary (that would make it difficult to change the API in the future) |
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Done in commit Name mangling removed:
Internal helpers made private (added
Cross-module private imports removed:
|
is_thai_char, is_thai, count_thai; rename CRFchunk → CRFChunkis_thai_char, is_thai, count_thai; rename CRFchunk → CRFChunk; enforce private API boundaries
|
@wannaphong Do we like to have the |
No, I think I forget to add. |
Updated changelog for version 5.3.2, including security improvements and deprecated functions.
…ers pass Co-authored-by: bact <128572+bact@users.noreply.github.com>
Done in commit New
Deprecated wrappers (removed in 6.0):
Note: Deprecation consistency fixes:
All 5 type checkers pass: ruff ✓ mypy ✓ pyright ✓ flake8 ✓ pyrefly ✓. 85 tests pass. |
is_thai_char, is_thai, count_thai; rename CRFchunk → CRFChunk; enforce private API boundariesis_thai_char/is_thai/count_thai; create pythainlp.chunk with CRFChunkParser; enforce private API boundaries
|



Public API had several
snake_caseviolations, inconsistent class naming, inappropriate name mangling, internal symbols leaking into the public namespace, and chunk-parsing logic housed in the wrong module.What do these changes do
is_thai_char(),is_thai(),count_thai()as properly named replacements foristhaichar(),isthai(),countthai()pythainlp.chunkmodule (following the NLTKnltk.chunkconvention) containing:CRFChunkParser— canonical CRF-based chunk parser class (moved frompythainlp.tag.crfchunk)chunk_parse()— convenience function (moved frompythainlp.tag)_prefix):extract_features→_extract_featuresintag/crfchunk.pyandtokenize/crfcut.pyThaiNameTagger.__extract_features→_extract_featuresintag/thainer.py(removes unnecessary name mangling)THAI_CHAR_NAMES→_THAI_CHAR_NAMESinutil/thai.py, additionally wrapped intypes.MappingProxyTypeto make it read-only at runtimetokenize/nlpo3.pyno longer imports_THAI_WORDS_FILENAMEfromcorpus/commoncorpus_path()function instead of_CORPUS_PATHpythainlp.tools.warn_deprecation: message format made generic (applies to both functions and classes); parameters renamed fromdeprecated_func/replacing_functodeprecated_symbol/replacing_symbolpythainlp.utilWhat was wrong
isthaichar,isthai, andcountthaiviolated PEP 8snake_caseand NLTK naming conventions (is_*for boolean checks). Chunk-parsing logic lived inpythainlp.tageven though it is a chunker, not a tagger — the correct home ispythainlp.chunk, mirroring NLTK'snltk.chunkmodule.Additionally, several internal-only symbols lacked a
_prefix, making them appear public.ThaiNameTagger.__extract_featuresused Python name mangling (double underscore) unnecessarily, making the method harder to discover and debug._THAI_CHAR_NAMESwas a plain mutabledict, allowing accidental mutation of a constant. Private constants (_THAI_WORDS_FILENAME,_CORPUS_PATH) were being imported directly by other modules and tests, creating hidden coupling to internal implementation details.warn_deprecation's message said "function" even when deprecating classes.How this fixes it
New Pythonic names are the canonical implementations. Old names become deprecated wrappers emitting
DeprecationWarning(scheduled for removal in 6.0):pythainlp.chunkis the new canonical location for chunk parsing:CRFchunk(lowercasec) is kept as a deprecated subclass ofCRFChunkParserthat emits the same pattern of warning on instantiation.CRFChunkwas introduced in this same PR so it is renamed directly toCRFChunkParserwithout a separate deprecation path.pythainlp.tag.chunk_parseis kept as a deprecated wrapper aroundpythainlp.chunk.chunk_parse._THAI_CHAR_NAMESis wrapped intypes.MappingProxyType— a stdlib read-only proxy with negligible overhead (~2%) and no new dependencies — so accidental mutation raisesTypeErrorat runtime.Internal helpers are renamed with
_prefix so they are clearly not part of the public API and can be changed freely in future releases. Cross-module private imports are replaced with either local constants or public accessor functions, so the internal implementation of each module is no longer exposed to its callers.All five static type checkers pass with zero errors on changed files: ruff, mypy, pyright, flake8, and pyrefly.
Your checklist for this pull request
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.