-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very long generation time #23
Comments
This is weird, I used to generate ~27k sentences in around a few minutes. That being said, I didn't test performances since v1.4.2, and I made a large refactor which might have decreased performances. I'll check again on v1.6.0. Would you mind answering a few questions so that I can have a better understanding of what is going on here?
To answer your questions, you can see a rule as a tree, where each piece of content would be a node, and branches would be possible choices the generator can make. For example, the template
To generate an example, Chatette traverses such a tree. Therefore, several factors can increase generation time:
There are a few things I would suggest you do to try and decrease the execution time, aside from the obvious suggestion (e.g. closing all resource intensive application when running Chatette):
Of course, those suggestions are just workarounds, having Chatette execute for an hour for "only" 35k examples is not acceptable. Improving the performance of Chatette is the real solution here. I will do it but it will certainly take time. On a side note, if you are using Chatette for machine learning tasks, be careful that you can easily have your models overfit with such a high number of examples. I hope this helps and thank you for your kind message :) |
What i'm going to do is to try to refactor my templates as much as i can, but without hurting the modularity and readability as much (chatette winning point). i would pay time in exchange for nicer syntax as it would be easier to maintain and update. |
Thanks for your replies :) I think I see what the problem is: since v1.6.0, I've been trying to use caching during generation to improve generation time, but for big templates, it will use the whole RAM which makes it very slow. I assumed this would be a problem for much larger files but apparently I was wrong. I'll add a way to disable that (or just turn it down) in the next update. It is likely it will take time as I don't have much time at the moment, so I hope you can cope with the current version. Thanks again for the heads up! |
I just released a patch (v1.6.1) which tackles the issue temporarily: the problem came from the caching strategy which is way too aggressive. That patch disables caching when there are more than 50 units (aliases, slots and intents) declared. Of course, this is just a temporary solution, I plan to tackle the issue in a cleaner and more robust way in future releases. I will keep this issue open until I am satisfied with the caching strategy, and more generally the execution time. |
Thanks for putting time and effort in such an amazing project, Great work!
Currently i'm using Chatette in a project (Generation modifiers are awesome), one problem though, the generation takes too much time!
i'm generating about ~35K sentences
generation takes about ~ 1 hour, while a clone to chatito takes about ~ 10 min
my question is:
what's the complexity of chatette? or what aspect of the statistics above, the generation time is directly propotional to?
(i noticed that chatette runs on a single core, so i've tried splitting the master file into 4 files and used ray to generate each master file on a separate worker.
with this i managed to get the generation time down to ~25 min but still quite an overhead)
if anyone have a thought or advice, i'd be grateful!
The text was updated successfully, but these errors were encountered: