Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for tool: resegment into well-formed sentences #87

Open
bricksdont opened this issue Feb 28, 2023 · 5 comments
Open

Idea for tool: resegment into well-formed sentences #87

bricksdont opened this issue Feb 28, 2023 · 5 comments

Comments

@bricksdont
Copy link

Hi, this is a great library!

I added one more tool in my fork that does automatic sentence segmentation: bricksdont#1

It changes the distribution of subtitle segments so that each subtitle is exactly one well-formed (and complete) sentence. It's not perfect, a machine learning model is involved.

Here is an example:

# Input

10:01:23,880 --> 10:01:27,640
Regelmässig nimmt er an Veranstaltungen
von FRAGILE Suisse teil,

23
10:01:27,720 --> 10:01:31,840
der Patientenorganisation
für Menschen mit Hirnverletzungen.

# Output

10:01:23,880 --> 10:01:31,840
Regelmässig nimmt er an Veranstaltungen
von FRAGILE Suisse teil, der Patientenorganisation
für Menschen mit Hirnverletzungen.

Would you be interested in a PR for this?

@cdown
Copy link
Owner

cdown commented Feb 28, 2023

This looks wonderful and would make a great addition to the repository as part of srt_tools! My only point of note is that the machine learning part would need to be an optional dep assuming it's heavy, but that's it :)

@bricksdont
Copy link
Author

What would be your preferred way of making this an optional dependency? Just letting the user run into an import error? (yes it's heavy :-))

@cdown
Copy link
Owner

cdown commented Feb 28, 2023

I guess wrap the ImportError and provide some nice message, but yes. There's also the question of ongoing maintenance -- are you happy to help keep it up to date with new Python versions, for example?

I suppose this should probably go in srt_tools/contrib since it's not under the same maintenance conditions as the rest of the repository.

@cdown
Copy link
Owner

cdown commented Feb 28, 2023

Oh, and can I also take a look at the current code before committing to anything :-)

@bricksdont
Copy link
Author

Oh, and can I also take a look at the current code before committing to anything :-)

Yes, of course. Any feedback is welcome, and of course you are not obliged to merge this code.

Here is a Colab that shows basic usage: https://colab.research.google.com/drive/1OHBylPv-8s__IU9_lwTW5CLwHQvfB9Rt?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants