We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate data:
import numpy as np import pandas as pd N = 1_000_000 data = pd.DataFrame({ "c": np.random.choice(["a", "b", "c"], size=N), "x": np.random.uniform(size=N), "y": np.random.normal(size=N) }) data.to_csv("blob.csv") # File is about 45 Mb
Slow execution: 24.1 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numpy as np import pandas as pd from executorlib import SingleNodeExecutor def get_sum(i, df): return i, df["x"].sum(), df["y"].sum() with SingleNodeExecutor(max_workers=10) as exe: future_lst = [exe.submit(get_sum, df=pd.read_csv("blob.csv"), i=i) for i in range(100)] result_lst = [f.result() for f in future_lst]
Reduce the startup time for the processes: 19.5 s ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numpy as np import pandas as pd from executorlib import SingleNodeExecutor def get_sum(i, df): return i, df["x"].sum(), df["y"].sum() with SingleNodeExecutor(max_workers=10, block_allocation=True) as exe: future_lst = [exe.submit(get_sum, df=pd.read_csv("blob.csv"), i=i) for i in range(100)] result_lst = [f.result() for f in future_lst]
Load the data only once for each process: 946 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
import numpy as np import pandas as pd from executorlib import SingleNodeExecutor def get_sum(i, df): return i, df["x"].sum(), df["y"].sum() def init_funct(): return {"df": pd.read_csv("blob.csv")} with SingleNodeExecutor(max_workers=10, block_allocation=True, init_function=init_funct) as exe: future_lst = [exe.submit(get_sum, i=i) for i in range(100)] result_lst = [f.result() for f in future_lst]
The text was updated successfully, but these errors were encountered:
In addition, include the option to monitor the data transfer to fine-tune the performance #671
Sorry, something went wrong.
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
Generate data:
Slow execution: 24.1 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Reduce the startup time for the processes: 19.5 s ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Load the data only once for each process: 946 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The text was updated successfully, but these errors were encountered: