Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need better n-gram counting (count-min sketch?) #9

Open
larsmans opened this issue Nov 13, 2014 · 0 comments
Open

Need better n-gram counting (count-min sketch?) #9

larsmans opened this issue Nov 13, 2014 · 0 comments

Comments

@larsmans
Copy link
Contributor

Exact n-gram counting is too expensive in terms of storage: a few 10s of 1000s of articles take GBs of storage and we need to process millions. I think we can work around this by using two count-min sketches, one for tf and one for df.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants