-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental saving approach #2
Comments
Issue: To update the pickle file, there is no direct functionality to update the file itself. To update the data in the file, it needs to be loaded first in a variable and then updated with the new data. This results in the same RAM overshooting problem. |
Alternative solution: using a (key, value) database like New tests are running which worked successfully on small portions of the dataset (745 million triples). However, the reading of the whole dataset is very slow. A current test run on the whole dataset is running for 3 days and still has not read 5% of the data. |
Usage of |
Issues with |
From the issue #1 comment, this approach will use incremental saving on pickle files. It will create a dictionary in main memory upto a threshold triples, e.g., 10 million (1 chunk), then dump it all in a
pickle
file.The text was updated successfully, but these errors were encountered: