Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning for multiple columns part 2: Find candidate parameters for multiple aggregation #524

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

dvadym
Copy link
Collaborator

@dvadym dvadym commented Aug 30, 2024

This PR introduces finding candidates during tuning for computing utility analysis for case when utility analysis for multiple metrics is computed and SUM can be computed for multiple columns . This covers cases when DP aggregations can be presented in the pseudo-SQL terms as

SELECT partition_key, DP_COUNT(), DP_SUM(column1), DP_SUM(columns2)
GROUP BY partition_key

@dvadym dvadym changed the title (WIP) Find candidate parameters for multiple aggregation Tuning for multiple columns part 2: Find candidate parameters for multiple aggregation Sep 4, 2024
# This is Select partitions case.
return self._get_strategy_for_select_partition(sensitivities.l0)

n_metrics = len(self._metrics)
# Having n metrics is equivalent of multiplying of contributing for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the equivalent of... or equivalent to

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

num=max_candidates)).astype(int)
# In order to ensure that max_sum_per_partition > 0, let us skip 0-th
# bin if max = 0.
# TODO(dvadym): better algorithm for finding candidates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some ideas here already?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have. There is work in this direction.

Copy link
Collaborator Author

@dvadym dvadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

num=max_candidates)).astype(int)
# In order to ensure that max_sum_per_partition > 0, let us skip 0-th
# bin if max = 0.
# TODO(dvadym): better algorithm for finding candidates.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have. There is work in this direction.

# This is Select partitions case.
return self._get_strategy_for_select_partition(sensitivities.l0)

n_metrics = len(self._metrics)
# Having n metrics is equivalent of multiplying of contributing for
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@dvadym dvadym merged commit 916bd8e into OpenMined:main Sep 10, 2024
6 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants