-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
just thought #30
Comments
Hello, Don't worry, no suggestion is ridiculous.
The most limiting factor for integrating larger models is the amount of RAM on the phones. Let's start from the assumption that usually the maximum amount of RAM usable for an application is half of the phone's RAM (the rest is consumed by the operating system and other apps). To explain it in simple terms, an AI model must be loaded entirely into RAM to be executed, to calculate its minimum consumption (because they usually consume more) in bytes just multiply the number of its parameters by 1 (normally by 4 but my models are quantized so each parameter weighs 1 byte instead of 4), so Whisper Large for example, which has 1.5B parameters would consume 1.5GB (so it would also be usable, but let's continue). In the case of my app I should keep both Whisper and the translation model (in this case NLLB) in RAM, and in the case of Whisper, the increase in quality from the small model onwards is gradually smaller, in fact based on the data and my tests, the quality of Whisper small is already very good, the side that needs most improvement is the translation, in this case in fact, unlike Whisper, other translator models with more parameters have a significantly higher quality than NLLB. Precisely for this reason before the release of the app I tried Madlad, a 3B parameter translator (4GB of RAM used, because to maintain the quality I had to leave some parameters at 4bytes), and together with Whisper small the total RAM consumption was about 5GB (because even Whisper small consumes more than expected), and even with my phone with 12GB of RAM, being so close to the limit (6GB for a 12GB phone), sometimes the app crashed randomly. So I would say that, at least for now, only those who have a phone with 16GB of RAM can enjoy a better experience than the current one (even if slower), and they are too few to justify the time needed to add other models. Although when OnnxRuntime will support 0.5 byte (4bit) quantization I will probably be able to include Madlad among the options (and before that I could also add Whisper base). I have already gone too long 🙃, so for the execution speed I'll just tell you that I can only use the CPU, because to use the GPU I have to use Android APIs (NNAPI) that are only supported by a few CPU models 😡 (my Snapdragon 8+ Gen 1 is not supported for example).
I didn't understand these questions, what do you mean? |
Hello, I think the accuracy rate is due to whisper. (Walkie Talkie Mode) Can it be adapted for a single language? Enjoy your work, |
thanks for explaining in such a nice detail @niedev i'm not into AI but its fun to know! |
You can already do this by setting the same language for the two languages in WalkieTalkie mode, I adapted the WalkieTalkie mode to become practically a transcriptor in that case.
Turkish seems to have problem also for translation, in this particular case probably the language identification have failed and the app translated English text into English, thinking it was Turkish, solving this is complicated because the method I have found to improve the language recognition hurts the performance quite a bit, so I can implement this tecniche only when I will optimize the Whisper speed even more (or maybe I will add an option to manually specify the language spoken in WalkieTalkie mode). |
Great project, thank you! I hope https://github.com/ggerganov/whisper.cpp can be useful for you. |
Thank you @data-man! |
Oh, I forgot about https://github.com/rhasspy/piper. :) |
Oh I didn't know these models, I'll take a look at them, thanks! |
Hello, Good works, |
@yenerismail Thank you! I'll take a look at these projects. |
Hello,
As an end user, (My suggestions may be ridiculous because I have no software knowledge.)
"The model used is Whisper-Small-244M with KV cache."
Can Whisper-Large-V3 be used?
Can the user make a choice? (such as tiny, base, small, medium, large)
CPU and GPU are advancing rapidly in GSM phones.
For example, my phone is Qualcomm Snapdragon 8 Gen 2 and Adreno(TM) 740
Can corrections be made during the conversation to prevent people from understanding and translating the wrong word? (Walkie Talkie Mode).
(Walkie Talkie Mode) Can it be adapted for a single language?
Is it possible to input voice for Conversation Mode? (Without keyboard feature)
The text was updated successfully, but these errors were encountered: