Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to create distributed inverted index? #3269

Open
chenkovsky opened this issue Dec 18, 2024 · 1 comment
Open

Is there a way to create distributed inverted index? #3269

chenkovsky opened this issue Dec 18, 2024 · 1 comment

Comments

@chenkovsky
Copy link
Contributor

I want to create fts index on big corpus. but it seems that currently we cannot create an index that contains multiple index files. does anyone have any suggestion? or I want to create a PR to support it.

@BubbleCal
Copy link
Contributor

hi @chenkovsky
now lance doesn't support to build the inverted index in a distributed way directly, but it's doable.
check the lines

you can see that the algorithm dispatches the tokens into multiple shards by a simple hashing, generally:

so for distributed indexing, exposing the index worker may be a good start point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants