acquire train data from lmdb #148

jemmyshin · 2022-08-25T07:11:49Z

Sometimes we need to train the PCA model when we already created an indexer. (for example, there is a memory issue after we have indexed thousands or even millions of data, and we need PCA to fix it.)

We need to fetch train data from lmdb, but this is tricky when we move to jcloud since we need to fetch data from the server instead of local machine.

One way to solve this is to add a new endpoint in client called /fetch:

data = client.post('/fetch', params={'batch_size': 1024})

for training we can use partial_train():

annlite.partial_train(data)

The text was updated successfully, but these errors were encountered:

JoanFM · 2022-08-25T09:43:36Z

lets make sure we are not overcomplicating the indexer and its usage.

JoanFM · 2022-08-25T09:46:53Z

If this LMDB is not going to work on jcloud, please do not proceed with this idea. If there is a need for less memory, for now I would just expect to have proper contiguration from beginning.

JoanFM · 2022-08-25T09:47:16Z

what will be the difference between this training data and the indexed data?

jemmyshin self-assigned this Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acquire train data from lmdb #148

acquire train data from lmdb #148

jemmyshin commented Aug 25, 2022

JoanFM commented Aug 25, 2022

JoanFM commented Aug 25, 2022

JoanFM commented Aug 25, 2022

acquire train data from lmdb #148

acquire train data from lmdb #148

Comments

jemmyshin commented Aug 25, 2022

JoanFM commented Aug 25, 2022

JoanFM commented Aug 25, 2022

JoanFM commented Aug 25, 2022