Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #1

Open
TomAugspurger opened this issue Nov 2, 2016 · 8 comments
Open

Documentation #1

TomAugspurger opened this issue Nov 2, 2016 · 8 comments

Comments

@TomAugspurger
Copy link
Member

This is a sketch for some sections of documentation that should go in the README.

What to test?

Ideally, benchmarks measure how long our project (dask, distributed) spends doing something, not the underlying libraries they're built on. We want to limit the variance across runs to just code we control.

For example, I suspect (self.data.a > 0).compute() is not a great benchmark. My guess (without having profiled) is that the .compute part takes the majority of the time, most of which would be in pandas / NumPy. (I need to profile all these. I'm reading through dask now to find places where dask is doing a lot of work.)

Benchmarking new Code

If you're writing an optimization, say, you can benchmark it by

  • writing a benchmark that exercises your optimization and placing it in benchmarks/
  • setting the repo field in asv.conf.json to the path of your dask / distributed repository on your local file system
  • running asv continuous -f 1.1 upstream/master HEAD (optionally with a regex -b <regex> to filter to just your benchmark.

Naming Conventions

Directory Structure

This repository contains benchmarks for several dask related projects.
Each project needs it's own benchmark directory because asv is built around
one configuration file (asv.conf.json) and benchmark suite per repository.

@pitrou
Copy link
Member

pitrou commented Nov 2, 2016

When benchmarking local changes, I also find asv dev to be very useful. Not sure it needs to be mentioned in the README, though.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2016

I think we should also have guidelines for benchmarks:

  • have individual time_xxx methods take on the order of 100-300 ms if possible (obviously some workloads will need more), so that asv can repeat the method several times and output a stable minimum
  • perhaps choose worker counts so as to minimize variability?

@pitrou
Copy link
Member

pitrou commented Nov 3, 2016

Another issue: which timer function should be used? asv's default timer may not be adequate:
https://asv.readthedocs.io/en/latest/writing_benchmarks.html#timing

Should we measure CPU time or wallclock time? IMHO we should measure wallclock time: if dask or distributed schedules tasks inefficiently and doesn't make full use of the CPU, it's a problem that should appear in the benchmark results.

@danielballan
Copy link

@TomAugspurger I'm interested in helping with this, partly as a way to become more familiar with the dask API. Is there anything in particular you would prefer me to target, to start?

@TomAugspurger
Copy link
Member Author

@danielballan great, thanks! I'm guessing that @mrocklin, @jcrist, and Antoine have the most knowledge on which parts of dask would be best to benchmark.

My current thinking is that we'll have two kinds of benchmarks: The first are higher-level benchmarks that hit things like top-level methods on dask.array, dask.bag, and dask.dataframe. The second kind of benchmarks are for "internal" methods in places like https://github.com/dask/dask/blob/master/dask/optimize.py.

I think the first kind will be easier to write benchmarks for as you learn the library (that's true for me anyway. ATM I have no idea how to write a good benchmark for something in dask.optimize).

@mrocklin
Copy link
Member

mrocklin commented Nov 3, 2016

I agree with @TomAugspurger 's classification of high-level external benchmarks and internal ones.

I also agree that high-level external benchmarks are probably both the more useful and the more approachable. Actually, I'm curious if, as with all things, we can steal from Pandas a bit here. Are there benchmarks in Pandas that are appropriate to take?

There are some extreme things we can test as well, such as doing groupby-applies with small dask dataframes with 1000 partitions, or calling

delayed(sum)([delayed(inc)(i) for i in range(1000)].compute(get=...)

These should be good to stress the administrative side.

@pitrou
Copy link
Member

pitrou commented Dec 5, 2016

Other question: I see a couple of existing benchmarks parameterize on the get function (multiprocessing.get, threaded.get, etc.). Is this useful/desired? What are we trying to achieve here?

@TomAugspurger
Copy link
Member Author

@pitrou for a bit, I was thinking these benchmarks could be helpful for users to see the overall performance characteristics of the various backends across different workloads. In hindsight it's probably best to keep this strictly for devs.

I'll send along a PR to remove those when I get a chance. Been swamped lately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants