Skip to content
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.

Issues: VKCOM/YouTokenToMe

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Can't pip install on a new env (without Cython)
#94 opened May 19, 2022 by zachmoshe updated Jan 5, 2024
convert from sentencepiece enhancement New feature or request question Further information is requested
#9 opened Aug 1, 2019 by jwijffels updated Dec 22, 2023
Visual studio c++ 14.0 error while installing.
#112 opened Oct 12, 2023 by manjunath7472 updated Oct 12, 2023
Special tokens accessibility
#111 opened Sep 11, 2023 by xdevfaheem updated Sep 11, 2023
error: Microsoft Visual C++ 14.0 or greater is required.
#89 opened Jul 23, 2021 by fourat-bs updated Jul 26, 2022
Installing error C1083
#91 opened Sep 20, 2021 by YFrite updated Jun 17, 2022
compilation issues help wanted Extra attention is needed
#50 opened Nov 20, 2019 by jwijffels updated Oct 8, 2021
How to generate vocab.json and merges.txt for YTTM tokenizer?
#66 opened Mar 8, 2020 by nikhilno1 updated Jul 26, 2021
Decode() got an unexpected keyword argument 'ignore_ids'
#88 opened Jul 8, 2021 by raphkhan updated Jul 8, 2021
Tokenizing large corpus
#80 opened Nov 1, 2020 by quetz updated Jun 29, 2021
No word tokenizer under the hood?
#87 opened May 17, 2021 by slowwavesleep updated May 17, 2021
Error during installation
#86 opened May 17, 2021 by ruruu127 updated May 17, 2021
How does YouTokenToMe's speed compare to subword-nmt?
#85 opened Apr 16, 2021 by gowtham1997 updated Apr 19, 2021
Add an option to predefine special tokens enhancement New feature or request
#44 opened Nov 15, 2019 by Kyeongpil updated Mar 10, 2021
Using YouTokenToMe with pre-defined vocab and embeddings
#84 opened Feb 16, 2021 by alexbalandi updated Feb 16, 2021
Is it possible to unset random seed for BPE-dropout? enhancement New feature or request
#75 opened Sep 17, 2020 by skurzhanskyi updated Sep 20, 2020
Controlling word tokenization
#73 opened Aug 20, 2020 by MexicanMan updated Aug 20, 2020
How to train with multiple corpus files?
#72 opened Aug 15, 2020 by hccho2 updated Aug 15, 2020
how to get vocab
#69 opened Apr 30, 2020 by wqfengnlpr updated Aug 12, 2020
Process killed?
#71 opened Jul 31, 2020 by miguelvictor updated Aug 1, 2020
"▁" character can be separated when using BPE-dropout
#67 opened Apr 4, 2020 by TIXFeniks updated Jul 10, 2020
Vocabulary contains underscore multiple times?
#68 opened Apr 20, 2020 by RuABraun updated Apr 20, 2020
ProTip! Find all open issues with in progress development work with linked:pr.