Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acquire train data from lmdb #148

Open
jemmyshin opened this issue Aug 25, 2022 · 3 comments
Open

acquire train data from lmdb #148

jemmyshin opened this issue Aug 25, 2022 · 3 comments
Assignees

Comments

@jemmyshin
Copy link
Contributor

Sometimes we need to train the PCA model when we already created an indexer. (for example, there is a memory issue after we have indexed thousands or even millions of data, and we need PCA to fix it.)

We need to fetch train data from lmdb, but this is tricky when we move to jcloud since we need to fetch data from the server instead of local machine.

One way to solve this is to add a new endpoint in client called /fetch:

data = client.post('/fetch', params={'batch_size': 1024})

for training we can use partial_train():

annlite.partial_train(data)

@jemmyshin jemmyshin self-assigned this Aug 25, 2022
@JoanFM
Copy link
Member

JoanFM commented Aug 25, 2022

lets make sure we are not overcomplicating the indexer and its usage.

@JoanFM
Copy link
Member

JoanFM commented Aug 25, 2022

If this LMDB is not going to work on jcloud, please do not proceed with this idea. If there is a need for less memory, for now I would just expect to have proper contiguration from beginning.

@JoanFM
Copy link
Member

JoanFM commented Aug 25, 2022

what will be the difference between this training data and the indexed data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants