Skip to content

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

License

Notifications You must be signed in to change notification settings

seanghay/khmersegment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Khmer Segment

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

Important

km-5tag-seg-model is required for this script to work. This library doesn't provide the model file.

Usage

pip install khmersegment
from khmersegment import Segmenter

segmenter = Segmenter("-m km-5tag-seg-model")

print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=False))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នកណា', 'ទេ', '?']

print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=True))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នក', 'ណា', 'ទេ', '?']

License

Apache-2.0

Related

About

A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages