Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental saving approach #2

Closed
sshivam95 opened this issue Jun 11, 2024 · 4 comments
Closed

Incremental saving approach #2

sshivam95 opened this issue Jun 11, 2024 · 4 comments

Comments

@sshivam95
Copy link
Collaborator

sshivam95 commented Jun 11, 2024

From the issue #1 comment, this approach will use incremental saving on pickle files. It will create a dictionary in main memory upto a threshold triples, e.g., 10 million (1 chunk), then dump it all in a pickle file.

@sshivam95 sshivam95 changed the title Noctua_2_WHALE_Memory_Mapping_RDFa_10M-7844215 stopped working (float issue) Incremental saving approach Jun 11, 2024
@sshivam95
Copy link
Collaborator Author

Issue: To update the pickle file, there is no direct functionality to update the file itself. To update the data in the file, it needs to be loaded first in a variable and then updated with the new data. This results in the same RAM overshooting problem.

@sshivam95
Copy link
Collaborator Author

Alternative solution: using a (key, value) database like shelve to store the indices. Commit

New tests are running which worked successfully on small portions of the dataset (745 million triples). However, the reading of the whole dataset is very slow. A current test run on the whole dataset is running for 3 days and still has not read 5% of the data.

@sshivam95
Copy link
Collaborator Author

Usage of mmappickle.mmapdict showed progress on smaller triple size file #4

@sshivam95
Copy link
Collaborator Author

Issues with mmapickle.mmapdict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant