Skip to content

Conversation

@jacklovell
Copy link
Member

This PR contains re-writes of the caching objects for better efficiency. The most significant change is the removal of calls to numpy.linalg.solve, which will fix the performance issues in #233. There are a few other improvements, and the new low-level cython interpolation utilities in raysect are now used to generate the coefficient arrays for interpolating between cached sampled points instead of duplicating the code in Cherab.

This PR is initially marked as a draft to get some feedback on the implementation of the 1D caching object: once the implementation details for that are approved I'll apply them to the 2D and 3D objects too.

The caching objects are essentially on-demand samplers and interpolators. They have a number of small but important differences to the raysect interpolators:

  • Sampling is done on a uniform grid, rather than an arbitrary set of coordinates passed in at initialisation time. The uniformity allows for some optimisations not possible with arbitrary coodinates, in particular analytic calculations of the nearest coordinates to the requested point. This removes the need to do a bisect search to find the nearest coordinates, and that in turn eliminates the need to actually store all the coordinates which saves some memory.
  • The spatial extent of the function to be cached is potentially very large in comparison with the sample resolution desired. Unlike the raysect interpolators, the caller will not already have all the required sampled data stored in memory. This motivates an object which seeks to be as memory efficient as possible. The raysect interpolators cache the coefficients used for the spline interpolation, but for the 3D interpolator this produces a 6D array. At high sample resolutions this can easily exhaust the program's available memory. As such, I propose to take the performance hit of re-calculating the coefficients every time: this will reduce the storage requirements down to a 3D array of sampled values instead of the 6D array of coefficients.
  • The upper and lower bounds of the function are not known at initialisation time. The original caching objects provide a function_boundaries argument for the user to specify these bounds and normalise the results to mitigate against numerical issues. This seems error prone to me: it's all too easy to get the bounds wrong and end up with some incorrect normalisation. Whether this is detrimental to the accuracy or not is unclear to me (after all, the results are un-normalised before being returned), but I'm currently inclined to drop support for this for the sake of simplicity.

I'm happy to receive feedback on these design decisions.

Avoid inverting the spline matrix every time, and use some raysect
library functions for doing the interpolation between points.
Exploit uniform spacing of samples to:
* Avoid bisecting search for nearest index in sample array
* Drop the array of sampled x points and calculate them analytically
  instead. This will reduce memory usage (useful for higher dimension
  caching objects).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants