Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic detection of encoding #67

Open
alex-left opened this issue Apr 9, 2021 · 1 comment
Open

Automatic detection of encoding #67

alex-left opened this issue Apr 9, 2021 · 1 comment

Comments

@alex-left
Copy link

althought the current code accept via an argument the encoding of the subtitle I think it could be interesting if the program would be able to detect it automatically. I thought it could be done easily using the chardet library. Doing it also would require to use an external library so, include a requirements file (or put it in the setup.py)

I could try to find some while free to do it, but before do any PR I would like to discuss some specific details of the implementation, for example I thought a function that reads the raw input with chardet to detect the encoding and returning it with a fallback to utf-8 and implement it around line #155 in the utils.py.

what do you think?

@cdown
Copy link
Owner

cdown commented Apr 12, 2021

Yeah, I've thought about this a few times over the past few years, but this would mean srt (well, srt_tools) starts having dependencies from a state where no such complexity exists, so it irks me a little for a feature that most people will never make use of.

If it can be implemented in a way which is -- tastefully -- an optional dependency, doesn't require reading/reopening the file twice (so probably just reinterprets a bytestream on demand), and is documented well, I'm not against it. It must not require any changes for current srt/srt_tools users, and they must not receive chardet without taking some explicit action.

The code you've highlighted is roughly the right place, but is probably too early, since the file isn't open yet. This would probably require quite a decent rework of how the encoding logic works.

@cdown cdown changed the title [proposal] automatic detection of encoding Automatic detection of encoding Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants