Using BlenderBot for other languages #3633
Replies: 15 comments 11 replies
-
I've been working with a MS student who's been doing French. It works okay just fine tuning on French data. Korean probably wouldn't work that well though. You'd need to train a new model from scratch on that or similar. mBART might be more up your alley for this. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
hi @stephenroller :) I want to ask about Turkish language :) I want to fine tune on Turkish data, could you guide like which model is more suitable for this purpose ? Thanks for your time |
Beta Was this translation helpful? Give feedback.
-
My student and I had success with french with the blenderbot 90M model, but that was somewhat chosen for resource constraint reasons. I would suggest giving BlenderBot400 or larger a shot, or otherwise try |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Okey, Thanks a lot for your response, I got it and I'm gonna try for Turkish with adding turkish tokenizer :) @stephenroller |
Beta Was this translation helpful? Give feedback.
-
@stephenroller I want to use hugginface tokenizer that has usage like; Lines 391 to 402 in d191b80
|
Beta Was this translation helpful? Give feedback.
-
hi @stephenroller, can you suggest some steps need to train a blenderbot with other languages such as |
Beta Was this translation helpful? Give feedback.
-
Ref, #2830 (comment) |
Beta Was this translation helpful? Give feedback.
-
hi again @stephenroller when I tried to fine tune bart model with turkish wikipedia data it gives error like |
Beta Was this translation helpful? Give feedback.
-
That usually happens when you have an out-of-bounds error on one of your embedding inputs. Could be either position embeddings (your truncate arguments are too long) or embeddings (something is wrong with the dict) |
Beta Was this translation helpful? Give feedback.
-
okey I am gonna check it again |
Beta Was this translation helpful? Give feedback.
-
Hello, I intend to train a model in Portuguese (Brazil) but I'm facing a few dificulties. I suspect this is due to my dataset². It has around 5000 lines - pairs of ¹ What I run to train the model: The simple.dict is generated by running: ²This is a sample from my dataset, This is what I'm asking for now:
Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Hello! Is anyone here had any success with training (either pre-training from scratch or fine-tuning) BlenderBot for any other language? I am currently working on the same task and would be happy to discuss the approaches and issues with someone :) |
Beta Was this translation helpful? Give feedback.
-
Hi @stephenroller, with my team we are trying to fine tune blenderbot 90M for italian language with the translated dataset BST+convai2+wow (so the same ones that were used originally in BlenderBot), to check how the model can be adapted to italian. Do you think we can obtain good result just fine tuning the model on the new dataset (with Recipes command) or do you think it is necessary to change tokenizer and use an italian one? We tried to fine tune the 90M model only on translated BST without changing tokenizer but we didnt obtain good result but we don't know if it was due to the tokenizer or to the fact we used only BST dataset instead of BST+convai2+wow. |
Beta Was this translation helpful? Give feedback.
-
Use this to open other questions or issues, and provide context here.
Is there any work going on to use Blenderbot for dialogue generation for other languages like Korean and Japanese?
I am trying to implement the Blenderbot strategy to train a model on Korean language. Can I retrain Blenderbot from scratch?
Is there any implementation idea for Blenderbot in other language?
Please help me out.
Beta Was this translation helpful? Give feedback.
All reactions