Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move storage part to a parallel process? #402

Open
yannikschaelte opened this issue Jan 28, 2021 · 6 comments
Open

Move storage part to a parallel process? #402

yannikschaelte opened this issue Jan 28, 2021 · 6 comments

Comments

@yannikschaelte
Copy link
Member

yannikschaelte commented Jan 28, 2021

Meaning: For store_population, do that in a parallel process, not in the main program. This way, we could get rid of a big part of the "between-generations" time. Downside: Some servers might not like the creation of this additional parallel process. @FelipeR888 @EmadAlamoudi what do you think?

@EmadAlamoudi
Copy link
Collaborator

sounds good. However, what is the expected overhead for this additional process? Also, will we need to copy the entire history object to the parallel process? if that is the case then we might need to check RAM per core.

@yannikschaelte
Copy link
Member Author

The history object itself should be cheap, as it at no point holds the SQL database in memory (I think), but only the pointers to it, and dynamically queries it when needed. Therefore, I would expect next to no overhead in the main process (except copying this object). File accessors might be a problem, tbd.

@yannikschaelte
Copy link
Member Author

And for fast-running simulations we need to make sure that iteration 3 is not written at the same time as iteration 2, so somehow one would need to lock access there. So not trivial to implement. Another problem could be that the main program is canceled by the user, but the writing process has not finished writing yet.

@FelipeR888
Copy link
Contributor

Would probably also be sort of redis-specific then, wouldnt it? But at least for us it does seem reasonable

@EmadAlamoudi
Copy link
Collaborator

This what comes to my mind too. However, it seems that SQLite handle that on its own since all of its operations are atomics : https://stackoverflow.com/questions/25700759/avoiding-race-conditions-when-an-update-is-based-on-the-count-of-prior-select

@yannikschaelte
Copy link
Member Author

Good to know. Maybe one could just try writing a simple parallel process which is started at the beginning of run(), and then waits on a queue for results to write to database. No big algorithmic improvement, but might make things faster for fast models.

Mid-term moving to hdf5 (or making the sql handling faster) probably still necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants