Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling utilities and routine profiling #24

Open
jmason42 opened this issue Dec 21, 2018 · 3 comments
Open

Profiling utilities and routine profiling #24

jmason42 opened this issue Dec 21, 2018 · 3 comments

Comments

@jmason42
Copy link
Contributor

It's hard to speed up code without proper profiling tools. Existing solutions that I'm aware of:

  • The built in profile/cProfile modules. I find these to be cumbersome, particularly because they can track evaluation time for functions that aren't being called or defined in the relevant scope. Perhaps there is a better way to use these.
  • kernprof.py AKA the line profiler. Really terrific as far as isolating and inspecting one scope. Not great for routine profiling.
  • IDE (e.g. PyCharm) profilers. In my brief stint with PyCharm I was unable to get this tool running; it also is unsuitable for routine profiling.
  • Calling time via the command line, or other tools like pytest. Only useful for isolated code, and subject to a lot of variability.
  • Tools like the timeit module. Again, only useful for isolated code, but does a few things to temper out any evaluation-to-evaluation variability.

Custom solutions I've used in the past:

  • Peppering code with time.time() calls to collect the evaluation time associated with blocks of code. Quick and easy, but potentially ugly and non-specific.
  • Writing a decorator that wraps a function and accumulates information about how long it takes to run. Can be largely transparent. Requires functionalization of the interesting bits.

I'm inclined to go with the latter; it will require us to break off more pieces and test them independently, but I think that's healthy anyway.

As far as routine profiling goes - I think this is desirable. I find it to be a useful way to check the health of code, as well as a way to make sure that anticipated performance hits do indeed have the anticipated effect (sanity check). It's unclear to me where this ought to go. It can't really be a unit test, since we have no expected run time that is going to be consistent across hardware (and run time is variable regardless).

@prismofeverything
Copy link
Member

I have had some success with cProfile. It takes some work to sift through the results but it has a programmatic interface that makes this automatable once you have some patterns established. I'll give it a run today. Feel free to use your own approaches as well, profiling is more of an art than a science.

We can make a profiling dir if there is code or other artifacts associated with profiling.

@prismofeverything
Copy link
Member

Just started a cython branch with some explorations there, FYI. Getting some performance improvements, the conversion takes some work however. Compiling with cython -a arrow/arrow.pyx generates an html file that describes how much python you have to use (currently in arrow/arrow.html).

@jmason42
Copy link
Contributor Author

Excellent, it will be good to have a few implementations to compare. Frankly I may need to rewrite some things from the ground up to get Numba working; their support for NumPy is not complete.

Incidentally, an issue you will bump into with Cython (and was part of the reason why I was so compelled by Numba) is random number generation; Cython can't compile calls to numpy.random functions. @1fish2 had a nice solution, which I think is in the old complexation code: generate large amounts of random numbers simultaneously, and re-generate those as needed (but not as often as every step).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants