Skip to content

Releases: daulet/tokenizers

v1.23.0

12 Sep 19:31
0539f01

Choose a tag to compare

What's Changed

  • Update HuggingFace tokenizers dependency to v0.22.0 by @Nav31 in #44
  • catch and return panics in encoders by @steved in #46
  • bump tiktoken-rs/fancy-regex to 0.16 by @steved in #45

New Contributors

Full Changelog: v1.22.2...v1.23.0

v1.22.2 fix: handle decoding partial UTF-8 characters

25 Jul 05:21

Choose a tag to compare

v1.22.1 Link time version check

18 Jul 06:26

Choose a tag to compare

v1.22.0 Better errors on Tokenizer init

18 Jul 05:44

Choose a tag to compare

v1.21.1 Tiktoken stability

17 Jul 18:25

Choose a tag to compare

v1.21.0 Tiktoken support

15 Jul 23:08

Choose a tag to compare

v1.20.2

07 Nov 21:15

Choose a tag to compare

What's Changed

  • feat: better error message when tokenizers lib mismatch by @daulet in #28
  • feat: FromPretrained to load tokenizer directly from HF by @berkayersoyy in #27

New Contributors

Full Changelog: v0.9.0...v1.20.2

v0.9.0

09 Aug 23:20

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.8.0...v0.9.0

v0.8.0

12 Jun 01:43
d503b5b

Choose a tag to compare

Breaking change:

Path to compiled rust library needs to be specified via -ldflags. I found it most convenient to use CGO_LDFLAGS env variable to avoid always setting it. See #18 for more details.

What's Changed

  • Update to allow for platform dependent libs in CGO by @jmoney in #18

New Contributors

Full Changelog: v0.7.1...v0.8.0

v0.7.1

10 Apr 23:30

Choose a tag to compare

  • Update core tokenizers library to latest: v0.15.2;
  • Expose init time parameter to encode special tokens (or not);

Full Changelog: v0.7.0...v0.7.1