-
Notifications
You must be signed in to change notification settings - Fork 907
Pull requests: karpathy/minbpe
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
add lexicographic ordering for breaking ties to make the tokenizer deterministic
#90
opened Sep 21, 2024 by
dapopov-st
Optimal algorithm for _encode_chunk(): 20% faster encoding, with 0.5% better COMPRESSION
#84
opened Jun 17, 2024 by
Majdoddin
Deduplication of text chunks with frequency count, training and encoding 5x speedup
#82
opened Jun 8, 2024 by
Majdoddin
calling len(ids) in merge() function only once to increase performance
#76
opened May 14, 2024 by
crpatil1901
Updated decode() method in GPT4Tokenizer so that it handles special t…
#63
opened Apr 7, 2024 by
Vakarva
Update lecture.md based on video tutorial content from 08:15 through 28:23
#42
opened Feb 23, 2024 by
astaff
Use
pyproject.toml
, pdm
and ruff
for improved reproducibility and cleaner code
#40
opened Feb 22, 2024 by
nizhib
ProTip!
Updated in the last three days: updated:>2025-03-23.