Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically determine sample level based on text length for short text #28

Open
faassen opened this issue Jan 10, 2025 · 1 comment
Open

Comments

@faassen
Copy link
Collaborator

faassen commented Jan 10, 2025

The following constraint exists on sampling level:

2^L < N

where L is the sampling level, and N is the size of the text. At least that's what the error says.

This means that a sampling level of 2 must have N > 5, and a sampling level of 1 requires N > 2.

Strangely enough I think it requires level 0 to have N > 1, but in #23 I report I can make a size of 1 work with level 0? I don't understand why I didn't get this error.

It strikes me that it would be possible to automatically reduce level if the text is too short to sample effectively anyway. That way we don't bother people with this error. We could write that this happens in the documentation. It would make the library more ergonomic to use.

@faassen
Copy link
Collaborator Author

faassen commented Jan 10, 2025

Ah, I see in #23 I couldn't make it work with just a \0. Not clear whether I could make it work without a \0, but since the \0 is required at the end that's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant