-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aligner #756
base: master
Are you sure you want to change the base?
Add aligner #756
Conversation
I'm also waiting for the tests for the aligner so we can automatically verify the code |
…ged value name LM to recognizer
…duration as parametr(input/output), set realign case audiofile start position as tell(), splited lines for more readable script
…SS, NFIA, NFIT attributes from str to int)
…ithm of obtaining chunk's start/end idxs, deleted condition non-existing start/end edges of chunk, added adjustment values shift_start/end tuning left/right edges of chunk
…stakes inside, either test_align.py script with 5 tests which using pytest, vosk_align.pt was modified for testing outside
…mber tokens in txt and wav files, forced_aligner.py: wavfile was added as arg for multipass, multipass.py: now getting wavfile as arg, either were added case if first or last token in txt file does not found in transcript or audio and case if start_pos value less than 0 because shift_start can shift it to negative number
…ords was added as variable for left/right words around NFIA or NFIT words; property names was added for non success cases instead of numbers
… recognizer to process_text, either for forced_aligner.py and multipass.py; in cats.txt, dagon.txt, glorious.txt, polar.txt was added mistakes for testing, fixed bug in polar.wav, deleted unused wendy example, added log files cats, dagon, glorious, polar for tests; fixed mistakes in test_align.py, added asserts; vosk_align.py: added logging for msgs and opportunity to call vosk_align.py from test_align.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rooting for you and your work! Please keep up the great work!
python/vosk/aligner/transcription.py
Outdated
options = { | ||
'sort_keys': True, | ||
'indent': 4, | ||
'separators': (',', ': '), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also 'ensure_ascii': False,
There is also a lot of trailing whitespace in the code.
(Nice PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/vosk/aligner/diff_align.py
Outdated
|
||
for op, a, b in word_diff(hypothesis, reference): | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double indented (8 instead of 4 spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
amount, length = unalign(words) | ||
logging.info("%d unaligned words (of %d)", amount, length) | ||
|
||
if amount != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amount
is unassigned if logging is None
Also, amount != 0
is duplicated
Also, progress_cb
could be None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for you help, I will fix it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for you help, I will fix it :)
Hello did you have success fixing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, no for a while, but I hope to start it after finish my current project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I returned to aligner project, need to rework code a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful 👍 So excited, can't wait to try. @vadimdddd Do you an email or way I contact you to collaborate? Would love to share some thoughts and ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeFusionFX, sure thing - [email protected]
… spaces; transcription.py: added parameter in options; forced_aligner.py: deleted duplicated amount condition
…ed_aligner.py; vosk_align.py: get_result(args) was extracted from main() for testing; test_align.py: passing of args for testing has been changed to get_result(args)
In testing this locally, there needs to be an empty |
@ryanfb thx for the info. I will fix it. |
I'm really interested in this PR, is there anything I can do to help? |
@Laurian just ping me if I forget please, I'll try to merge it |
I am also very interested in this as well. What can I do to help? |
@nshmyrev ping 🙏 |
@nshmyrev |
@nshmyrev |
Is this Pull dead? |
Aligner is a program for aligning words in time relative to other words in audio file. Gentle project used m3.cc and k3.cc as language and acoustic models for alignment, these approaches were reworked into aligner, which made it possible to use different language models and accelerated the alignment process. Also in setup.py was added ability to run the aligner not only from the folder with it was added.
How to work:
a) path to the wavfile; b) path to the textfile; c) path to the language model.
Example(how to run):
python3 vosk_align.py example/glorious.wav example/glorious.txt example/model