Skip to content

Conversation

@qh681248
Copy link
Contributor

PR Type

  • Bugfix

Description

In coreax/solvers/composite.py, added logic to keep track of indices in the reduce method of MapReduce class, this resolves the Fix MapReduce solver Indexing bug #779

How Has This Been Tested?

Added test_mapreduce_diverse_selection in tests/units/test_solvers.py. It only passes when MapReduce contains datapoints with indices higher than the size of the leaf_node.

Does this PR introduce a breaking change?

No

Screenshots

(Write your answer here.)

Checklist before requesting a review

  • I have made sure that my PR is not a duplicate.
  • My code follows the style guidelines of this project.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have performed a self-review of my code.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

… _jit_tree function to return indices as well, changed the reduce method to keep track of indices, added a line plt.show() to pounce.py
… _jit_tree function to return indices as well, changed the reduce method to keep track of indices, added a line plt.show() to pounce.py
…e method returns coreset with indices bigger than the leafsize as was present in #779. This test fails when that bug is present and passes when it is not present
… calculated index if base solver returns a Coresubset object and not just a Coreset object.
@CLAassistant
Copy link

CLAassistant commented Sep 25, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions
Copy link
Contributor

Performance review

Commit be10c22 - Merge 08f666a into 41b3922

No significant changes to performance.

@github-actions
Copy link
Contributor

Performance review

Commit e4dfdbb - Merge 7d41a4b into 41b3922

No significant changes to performance.

…y when _indices is not none. _jit_tree now returns jnp.arange(len(dataset))[indices] instead of indices so that it is within bounds.
@github-actions
Copy link
Contributor

Performance review

Commit dde71ce - Merge 48a1037 into 41b3922

No significant changes to performance.

@qh681248 qh681248 marked this pull request as ready for review September 26, 2024 14:03
@qh681248 qh681248 linked an issue Sep 26, 2024 that may be closed by this pull request
@github-actions
Copy link
Contributor

Performance review

Commit cff8624 - Merge 1bd8501 into 41b3922

No significant changes to performance.

@bk958178 bk958178 self-requested a review October 1, 2024 08:42
@tc85324 tc85324 self-requested a review October 1, 2024 08:50
Copy link
Contributor

@tc85324 tc85324 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this bug, I should have caught this earlier! I think the issue is more fundamental to how Coresubset.coreset operates, and the error can be fixed by the following change (subject to correcting the coreset materialization behaviour).

           # old
            _coreset = jtu.tree_map(jnp.concatenate, coreset_ensemble)
            return _reduce_coreset(_coreset.coreset)
           # proposed
            _coreset = jtu.tree_map(jnp.concatenate, coreset_ensemble.coreset)
            return _reduce_coreset(_coreset)

I'd like to have a look today to see if there is a nice change that can be made to the Coresubset.coreset materialization behaviour, before the specific changes proposed here are considered.

Copy link
Contributor

@tc85324 tc85324 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having taken a look yesterday, I can't find a particularly satisfying solution to the materialization behaviour. We could add specific logic for handling "batched" coresets in the coresubset materialization method Coresubset.coreset, but this would require us to handle a variety of different behaviours. E.G. what happens when we have unbatched nodes/indices, but a batched pre_coreset_data, what happens when the nodes/indices are batched but pre_coreset_data is not (this is the specific problem in our case), etc... If there is wider interest in properly handling batched data, then this may be worth revisiting in a separate issue, perhaps at the same time as #765.

In any case, I'm pleased you've identified the problem (which also indicates the existing MapReduce tests are insufficient) and proposed a solution in this PR, which I will be glad to review today.

Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qh681248 See requested changes in the comments below.

@tc85324 Thanks in advance for adding your own review!

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2024

Performance review

Commit 047d814 - Merge 082ada7 into b042b34

No significant changes to performance.

tc85324
tc85324 previously requested changes Oct 3, 2024
Copy link
Contributor

@tc85324 tc85324 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few more comments on top of those from @bk958178. Nothing major, but we might need to look at improving the solver tests (the scope of which may exceed this PR).

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2024

Performance review

Commit a8198a2 - Merge e5f5122 into b042b34

No significant changes to performance.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2024

Performance review

Commit 8099ad0 - Merge da827a7 into b042b34

No significant changes to performance.

@tp832944
Copy link
Contributor

tp832944 commented Oct 3, 2024

@qh681248 This is a significant bug fix. Don't forget to add it to the CHANGELOG.

…dex] to padded_dataset[index]

 removed unnessary comments and changed variable names to be more descriptive
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2024

Performance review

Commit 2b40d81 - Merge c187818 into 093f3d1

No significant changes to performance.

Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qh681248 Thanks for addressing my previous comments. I've left a few further suggestions on your latest changes.

Please update the CHANGELOG.md. Under 'Fixed', add a bulletpoint describing the bug-fix you've addressed.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2024

Performance review

Commit 71c2173 - Merge cf537ed into ba4a3c1

No significant changes to performance.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2024

Performance review

Commit 2495aa6 - Merge c894e07 into 7aa60a7

No significant changes to performance.

@qh681248 qh681248 requested review from bk958178 and tc85324 October 9, 2024 15:48
@qh681248 qh681248 linked an issue Oct 10, 2024 that may be closed by this pull request
Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this analytic test, @qh681248. Your description is very thorough. Please see some initial comments and suggested changes below.

I'll allow time for @tc85324 to review the analytic test as well, and share any thoughts or comments too.

@github-actions
Copy link
Contributor

Performance review

Commit e1f439b - Merge a5fb1be into 3463539

No significant changes to performance.

@bk958178
Copy link
Contributor

Thanks for adding this analytic test, @qh681248. Your description is very thorough. Please see some initial comments and suggested changes below.

I'll allow time for @tc85324 to review the analytic test as well, and share any thoughts or comments too.

@tc85324 if we haven’t heard from you by Thursday, we’ll assume you’re okay with not reviewing these changes before the v0.3 release. Any changes you request later will be addressed and included in future releases.

Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qh681248 Please see below a few further comments to reduce the number of new lines added

@github-actions
Copy link
Contributor

Performance review

Commit a633bb1 - Merge 2a9a1c4 into 3463539

Statistically significant changes

  • basic_rpc:
    • OLD: compilation 0.6253 units ± 0.03252 units; execution 0.142 units ± 0.001823 units
    • NEW: compilation 0.6154 units ± 0.02592 units; execution 0.1583 units ± 0.001707 units
    • Significant increase in execution time (11.49%, p=5.977e-14)

Normalisation values for new data:
Compilation: 1 unit = 374.45 ms
Execution: 1 unit = 800.36 ms

Copy link
Contributor

@tc85324 tc85324 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this analytic test, @qh681248. Your description is very thorough. Please see some initial comments and suggested changes below.

I'll allow time for @tc85324 to review the analytic test as well, and share any thoughts or comments too.

@tc85324 if we haven’t heard from you by Thursday, we’ll assume you’re okay with not reviewing these changes before the v0.3 release. Any changes you request later will be addressed and included in future releases.

Unfortunately I don't have the time right now to review this properly. I did have a quick look though and the new tests appear to do what I was hoping for, thanks @qh681248!

Copy link
Contributor

@bk958178 bk958178 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Approving and merging

@bk958178 bk958178 dismissed tc85324’s stale review October 18, 2024 07:43

Dismissing as re-reviewed and happy with changes

@bk958178 bk958178 merged commit ca26716 into main Oct 18, 2024
23 checks passed
@bk958178 bk958178 deleted the MapReduce-index-bug branch October 18, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve mock_reduce on test for MapReduce Fix MapReduce solver Indexing bug

5 participants