Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved History recording efficiency - no more exponential run-time increase possibly #275

Open
sniffo opened this issue Nov 6, 2023 · 9 comments

Comments

@sniffo
Copy link
Contributor

sniffo commented Nov 6, 2023

Hi,

Some people might be interested so I am posting here.

I noticed that increasing the length of a simulation using the EulerSolver caused the total run time of simulations to increase exponentially (Don't have testing data as I did not save the run times but noticed as I was tracking them, will gather some when I have time). Once I did some profiling and improved the efficiency of some geometry and gradient methods, the behavior persisted but then I found that the History.record method was to blame.

The method uses pd.concat method every time to add the datasets onto increasingly larger datasets in History.datasets. This causes the method to slow down during the course of the simulation as it is loaded on to memory each time with a larger dataframe. I've changed it so that History.datasets is now a dictionary of dictionaries. History.datasets now contains dictionaries for all elements (vert, edge, face, and cell if necessary), and each element dictionary now contains all the datasets as key value pairs with the keys being the time points and the values being the datasets at those time points.

I then introduced a History instance method History.update_datasets(), which is run within the EulerSolver.solve() method at the final time-point, which concatenates all of the datasets within the dictoinaries to provide the History.datasets as was implemented previously. This significantly reduced the time to record and it no longer increases the tun time exponentially. I have not done extensive testing yet, but all other History methods should still work as intended.

@glyg please let me know if you would like me to add these changes to my PR and any specific tests you'd like to see. I will be updating this issue with run-time tests with different numbers of time-steps.

Thanks,
Tasnif

@sniffo sniffo changed the title Improved History recording efficiency - no more exponential increase possibly Improved History recording efficiency - no more exponential run-time increase possibly Nov 7, 2023
@sniffo
Copy link
Contributor Author

sniffo commented Nov 7, 2023

Here's an initial comparison for up to 5120 time steps:
MicrosoftTeams-image (2)

@glyg
Copy link
Member

glyg commented Nov 7, 2023

Hi Tasnif,

Thanks a lot for this contribution. Please add that to the PR it looks great! Just to make sure, this is transparent for the user (it does not change the I/O API)?
Thanks again

G

@sniffo
Copy link
Contributor Author

sniffo commented Nov 7, 2023

It only changes the History class properties History.record method, and EulerSolver.solve method. I've used the to_archive and from_archive methods to save and load datasets and it works fine. Nothing else was changed.

@glyg
Copy link
Member

glyg commented Nov 7, 2023

great :) Please include that in the PR then !

@sniffo
Copy link
Contributor Author

sniffo commented Nov 7, 2023

Perfect. Will add over the weekend :)

@sniffo
Copy link
Contributor Author

sniffo commented Nov 7, 2023

I've encountered an issue with this implementation. For some reason, when using HistoryHdf5.from_archive to reload a dataset, the History.datasets returned is empty. But somehow History.retrieve still works. Not quite sure how that is possible will report back once the issue is sorted. Likely has to do with changes to the __init__() method in the base History class.

@sniffo
Copy link
Contributor Author

sniffo commented Nov 7, 2023

Issue has been fixed. It was the __init__ method. Instead of changing History.datasets to the dictionaries mentioned above, I introduced another variable, History.dict. New datasets are added to this dictionary in History.record. But the History.datasets is left the same in the original __init__ method. This allows for HistoryHdf5.from_archive to recreate the accurate History object with datasets. Works well now.

@glyg glyg closed this as completed Jan 10, 2024
@sniffo
Copy link
Contributor Author

sniffo commented Dec 2, 2024

I have a fork from the up-to-date build implementing these changes if there is still interest. I can create a pull request if you would like to see the changes and suggest improvements.

Best,
Tasnif

@glyg
Copy link
Member

glyg commented Dec 4, 2024

Hi @sniffo yes do open a PR please

@glyg glyg reopened this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants