Skip to content

A Thai word segmentation library, built on PyTorch

License

Notifications You must be signed in to change notification settings

phiradet/torchtext-th

Repository files navigation

TorchText TH

A (beta version) Thai word segmentation library built on PyTorch

Usage

>>> from typing import List
>>> from torchtext_th.tokenizer import get_tokenizer
>>> tokenizer = get_tokenizer("artifact/emb150_bilstm512_1layer.pt")
>>>
>>> input_text: str = "ลองทดสอบโปรแกรมตัดคำด้วย PyTorch ง่ายๆ จร้าาาา ถถถ"
>>> output: List[str] = tokenizer.tokenize(input_text)
>>> print("|".join(output))
ลอง|ทดสอบ|โปรแกรม|ตัด|คำ|ด้วย| |PyTorch| |ง่าย|| |จร้าาาา| |ถถถ

About

A Thai word segmentation library, built on PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages