Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: unify Learner1D, Learner2D and LearnerND #220

Open
wants to merge 105 commits into
base: main
Choose a base branch
from

Conversation

jbweston
Copy link
Contributor

@jbweston jbweston commented Oct 6, 2019

While writing the adaptive paper we managed to write down a simple algorithm formulated in terms of abstract domains and subdomains.

Implementing such an algorithm in a learner should allow us to unify Learner1D, Learner2D and LearnerND.

@jbweston jbweston force-pushed the feature/unify-learners branch from cd579ae to 1e633cd Compare October 6, 2019 11:49
@jbweston
Copy link
Contributor Author

jbweston commented Oct 6, 2019

The following are not yet supported:

  • 2D and 3D domains
  • BalancingLearner (requires removing points from the interior of domains)

@akhmerov
Copy link
Contributor

akhmerov commented Oct 6, 2019

Also compared to the existing implementation, global rescaling is missing. (e.g. computing y-scale of the values, and normalizing the data by it)

@jbweston
Copy link
Contributor Author

jbweston commented Oct 6, 2019

Also compared to the existing implementation, global rescaling is missing. (e.g. computing y-scale of the values, and normalizing the data by it)

Should this perhaps be something that the loss function does itself? For example the isosurface loss needs the unmodified data. I could imagine extending the loss function signature to include the x and y scales.

@akhmerov
Copy link
Contributor

akhmerov commented Oct 6, 2019

I could imagine extending the loss function signature to include the x and y scales.

Indeed, but then the learner needs two extra hooks: one at each step to update global metrics, and another one to trigger loss recomputation for all subdomains once the global metrics changes sufficiently much so that old losses become obsolete.

@jbweston
Copy link
Contributor Author

jbweston commented Oct 6, 2019

Indeed, but then the learner needs two extra hooks: one at each step to update global metrics, and another one to trigger loss recomputation for all subdomains once the global metrics changes sufficiently much so that old losses become obsolete.

I added this now

@jbweston
Copy link
Contributor Author

jbweston commented Oct 8, 2019

TODO

  • don't evaluate boundary points in __init__
  • revisit the loss function signature
  • add tests

@basnijholt
Copy link
Member

All other learners implement pending_points which is a set.

Would that change anything? Now I see you set self.data[x] = None.

@jbweston
Copy link
Contributor Author

jbweston commented Oct 13, 2019

Now I believe the only 3 things left to do are:

  • decide on the loss function signature (at the moment the loss function gets all data etc.
  • implement scaling within the loss functions
  • whether to overwrite the existing LearnerND with this one (Bas says yes)

@jbweston
Copy link
Contributor Author

jbweston commented Oct 13, 2019

There's also the question of how to review this MR. It's a lot of code.

It may also be that this implementation is inferior to the current LearnerND, and we don't want to merge it at all. For example, the old LearnerND is 1180 lines, whereas this new implementation in 1350 (including the priority queue, domain definitions etc)

This case is very common when inserting 1 point and then splitting
at the same point.
This change gives a ~25% speedup to the new LearnerND.
vectors = np.subtract(coords[1:], coords[0])
if np.linalg.matrix_rank(vectors) < dim:
raise ValueError(
"Initial simplex has zero volumes "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily a single simplex at this point; this should read "Hull has a zero volume", right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This error message was already wrong before I touched it. Maybe I should update it

__all__ = ["Domain", "Interval", "ConvexHull"]


class Domain(metaclass=abc.ABCMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to add a class docstring with a description of the important properties of this object.

# in 'sub_intervals' in a SortedList.
self.bounds = (a, b)
self.sub_intervals = dict()
self.points = SortedList([a, b])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would in principle be sufficient to store only the boundaries of the interval and always query for pairs of neighboring points. The current implementation is quite OK OTOH.

Before this fix when telling more than 1 point it was possible
that we would attempt to remove subdomains from the queue that
were only produced "temporarily" when adding points.
These are not catching an interesting class of bugs for now.
We should reintroduce these when we come up with a firmer learner
API.
This does not test very degenerate domains, but for the moment we just
want to test that everything works for domains defined over hulls with
different numbers of points. Adaptive will typically not be used on
very degenerate domains.
We now use numpy.random for generating random points, which produces
much less degenerate examples than using Hypothesis' "floats" strategy.
At this point we don't want to test super awkward cases.
@akhmerov
Copy link
Contributor

Re: loss function format

In ND we can pass the following data:

  1. The original simplex coordinates
  2. An array of the following tuples:
  • Coordinates of the point being replaced
  • Index of the simplex in which a point is replaced
  • Number of the point being replaced

In 1D we probably should adopt the naturally ordered format.

@akhmerov akhmerov mentioned this pull request Dec 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants