-
Notifications
You must be signed in to change notification settings - Fork 100
32-bit platform support #6
Comments
Do you have any compilation errors?
What kind of issues do you experience when using byte pair encoding? |
Yes, I'm getting a crash on 32bit and not on 64bit on exactly the same example. I'm trying to see exactly where this is coming from. by using this C++ code
|
The crash on 32-bit happens here: https://github.com/VKCOM/YouTokenToMe/blob/master/youtokentome/cpp/bpe.cpp#L999 |
It seems like the problem relates to this one skarupke/flat_hash_map#5. Underlying hash table is not support 32-bit platforms. You can try to replace all |
Would be great if you could find a solution on this, I'm not familiar with that flat_hash_map code you have. It's a pity that 32bit is not working out as otherwise I could have pushed to cran today but now this R package will not be accepted on cran. |
@jwijffels, as @yutkin already said, you can replace all |
thanks for the suggestion @Oktai15
other suggestions? |
FYI. I think I found a solution. I've replaced cpp/third_party/flat_hash_map.h with another hash map from https://github.com/greg7mdp/parallel-hashmap which did work on 32 bit. I've put this for the time being in branch phmap at https://github.com/bnosac/tokenizers.bpe/tree/phmap using commit bnosac/tokenizers.bpe@d900ec6 It would be nice if the authors of YouTokenToMe could also make such a change on their c++ code such that it would work on 32 bit also. |
Only compiler warning still remaining after using parallel-hashmap instead of the hashmap youtokentome provides is:
any idea on how to fix this one? |
@jwijffels We decided that currently, we are not planning to support 32-bit platforms due to their rarity. This library is developed for training deep neural networks that mostly require a 64-bit environment (For example Pytorch, Tensorflow). In our tests |
Setting to low priority for now. |
I understand. Do you have your benchmarks compared to other open-source hash table implementations somewhere in a document which you can share? |
So, as you see even tensorflow doesn't support 32bit platform. We also don't plan to support it. |
I'm checking the R packages which I developed at https://github.com/bnosac/tokenizers.bpe which wraps the c++ code.
This works fine on mingw 64-bit and on Ubuntu but using mingw 32 bit on Windows, I'm getting compilation problems related to cpp/third_party/flat_hash_map.h giving further issues when using the code to do byte pair encoding.
Could you advise on how to solve these? Below the issues (full trace log at https://win-builder.r-project.org/mBBw27b161WH/00install.out).
Trace log
The text was updated successfully, but these errors were encountered: