-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added option to set grammar with custom lexicon #1362
base: master
Are you sure you want to change the base?
Conversation
Lovely idea, would love to use it :)
|
I'll try to fix it in the next days. |
@Shallowmallow you should be able to compile it now. Let me know if it worked or not... |
Indeed, it compiles. Thanks @mmende ! |
If there was a function to activate and deactivate multiple grammars this could allow for context/application specific grammars. Use case would be multiple programs each with their own grammars. Client-side the foreground window changes the appropriate grammar would be activated in the not relevant loaded grammars would be disabled but loaded in the back-end. |
This PR adds a new API method
vosk_recognizer_set_grm_with_lexicon
which allows providing a custom pronunciation lexicon in addition to a grammar. The recognizer uses this lexicon to recreate the HCLr transducer at runtime which allows recognizing words that were not in the lexicon before.To be able to recreate the HCLr transducer, the model must be a lookahead model and include the context dependency (
tree
) file and phone symbol table (phones.txt
). In some rough, unscientific tests with vosk-model-small-de-0.15, the HCLr recreation for 10 words took ~15ms, 100 words took ~70ms, 500 words took ~430ms, 1000 words took about 1500ms.There are some hardcoded variables such as the silence phone label (
SIL
), silence probability , self loop scale and transition scale and grammar fst's are not yet supported.Furthermore does the method require
that the epsilon entry (that phones with positional information must be used correctly (if the model uses such).<eps>
) is also in the given lexicon andPS: Sorry for the whole reformatting stuff (that must have been the Clang-Format extension).