Open
Description
While exploring scheduler improvements it was recently discovered that removing redundant string concatenations has a measurable performance increase. Tom noted that there currently were no isolated benchmarks which tested graph construction
We could do something similar to what was suggested in dask/dask#6137
In [2]: ddf_d = timeseries(start='2000-01-01', end='2002-01-01', partition_freq='1d')
In [3]: %timeit shuffle(ddf_d, "id", shuffle="tasks")
67.2 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Metadata
Metadata
Assignees
Labels
No labels