Releases: daulet/tokenizers
Releases · daulet/tokenizers
v1.23.0
v1.22.2 fix: handle decoding partial UTF-8 characters
v1.22.1 Link time version check
Full Changelog: v1.22.0...v1.22.1
v1.22.0 Better errors on Tokenizer init
Full Changelog: v1.21.1...v1.22.0
v1.21.1 Tiktoken stability
Full Changelog: v1.21.0...v1.21.1
v1.21.0 Tiktoken support
Full Changelog: v2.20.2...v1.21.0
v1.20.2
What's Changed
- feat: better error message when tokenizers lib mismatch by @daulet in #28
- feat: FromPretrained to load tokenizer directly from HF by @berkayersoyy in #27
New Contributors
- @berkayersoyy made their first contribution in #27
Full Changelog: v0.9.0...v1.20.2
v0.9.0
What's Changed
- feat: add option to retrieve offsets from tokenizer by @riccardopinosio in #21
- Update to huggingface/tokenizers v0.20.0 by @daulet in #23
New Contributors
- @riccardopinosio made their first contribution in #21
Full Changelog: v0.8.0...v0.9.0
v0.8.0
Breaking change:
Path to compiled rust library needs to be specified via -ldflags. I found it most convenient to use CGO_LDFLAGS env variable to avoid always setting it. See #18 for more details.
What's Changed
New Contributors
Full Changelog: v0.7.1...v0.8.0
v0.7.1
- Update core tokenizers library to latest: v0.15.2;
- Expose init time parameter to encode special tokens (or not);
Full Changelog: v0.7.0...v0.7.1