-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map reduce index bug #790
Map reduce index bug #790
Conversation
… _jit_tree function to return indices as well, changed the reduce method to keep track of indices, added a line plt.show() to pounce.py
… _jit_tree function to return indices as well, changed the reduce method to keep track of indices, added a line plt.show() to pounce.py
…e method returns coreset with indices bigger than the leafsize as was present in #779. This test fails when that bug is present and passes when it is not present
… calculated index if base solver returns a Coresubset object and not just a Coreset object.
…y when _indices is not none. _jit_tree now returns jnp.arange(len(dataset))[indices] instead of indices so that it is within bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this bug, I should have caught this earlier! I think the issue is more fundamental to how Coresubset.coreset
operates, and the error can be fixed by the following change (subject to correcting the coreset materialization behaviour).
# old
_coreset = jtu.tree_map(jnp.concatenate, coreset_ensemble)
return _reduce_coreset(_coreset.coreset)
# proposed
_coreset = jtu.tree_map(jnp.concatenate, coreset_ensemble.coreset)
return _reduce_coreset(_coreset)
I'd like to have a look today to see if there is a nice change that can be made to the Coresubset.coreset
materialization behaviour, before the specific changes proposed here are considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having taken a look yesterday, I can't find a particularly satisfying solution to the materialization behaviour. We could add specific logic for handling "batched" coresets in the coresubset materialization method Coresubset.coreset
, but this would require us to handle a variety of different behaviours. E.G. what happens when we have unbatched nodes/indices, but a batched pre_coreset_data
, what happens when the nodes/indices are batched but pre_coreset_data
is not (this is the specific problem in our case), etc... If there is wider interest in properly handling batched data, then this may be worth revisiting in a separate issue, perhaps at the same time as #765.
In any case, I'm pleased you've identified the problem (which also indicates the existing MapReduce tests are insufficient) and proposed a solution in this PR, which I will be glad to review today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few more comments on top of those from @bk958178. Nothing major, but we might need to look at improving the solver tests (the scope of which may exceed this PR).
@qh681248 This is a significant bug fix. Don't forget to add it to the CHANGELOG. |
…dex] to padded_dataset[index] removed unnessary comments and changed variable names to be more descriptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qh681248 Thanks for addressing my previous comments. I've left a few further suggestions on your latest changes.
Please update the CHANGELOG.md. Under 'Fixed', add a bulletpoint describing the bug-fix you've addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tc85324 if we haven’t heard from you by Thursday, we’ll assume you’re okay with not reviewing these changes before the v0.3 release. Any changes you request later will be addressed and included in future releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qh681248 Please see below a few further comments to reduce the number of new lines added
Performance reviewCommit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this analytic test, @qh681248. Your description is very thorough. Please see some initial comments and suggested changes below.
I'll allow time for @tc85324 to review the analytic test as well, and share any thoughts or comments too.
@tc85324 if we haven’t heard from you by Thursday, we’ll assume you’re okay with not reviewing these changes before the v0.3 release. Any changes you request later will be addressed and included in future releases.
Unfortunately I don't have the time right now to review this properly. I did have a quick look though and the new tests appear to do what I was hoping for, thanks @qh681248!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Approving and merging
Dismissing as re-reviewed and happy with changes
PR Type
Description
In coreax/solvers/composite.py, added logic to keep track of indices in the reduce method of MapReduce class, this resolves the Fix MapReduce solver Indexing bug #779
How Has This Been Tested?
Added test_mapreduce_diverse_selection in tests/units/test_solvers.py. It only passes when MapReduce contains datapoints with indices higher than the size of the leaf_node.
Does this PR introduce a breaking change?
No
Screenshots
(Write your answer here.)
Checklist before requesting a review