Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning for multiple columns part 3: Utility analysis for multiple aggregation #525

Merged
merged 13 commits into from
Sep 12, 2024

Conversation

dvadym
Copy link
Collaborator

@dvadym dvadym commented Sep 10, 2024

This PR introduces computing the utility analysis when there are several SUM aggregations. This covers cases when DP aggregations can be presented in the pseudo-SQL terms as

SELECT partition_key, DP_COUNT(), DP_SUM(column1), DP_SUM(columns2)
GROUP BY partition_key

This contains the following changes:

  1. In case of multi-sum min_sum_per_partition/max_sum_per_partition will be sequences (instead of floats)
  2. SumCombiner is created for each sum (i.e. for each coordinate of the tuples in 1)
  3. CompoundCombiner keeps track sparse representation as previously, SumCombiner receives 2d array of values for each columns, but it extracts the value for the proper column.

@dvadym dvadym changed the title (WIP) Tuning for multiple columns part 3: Utility analysis for multiple aggregation Tuning for multiple columns part 3: Utility analysis for multiple aggregation Sep 11, 2024
if size1 is None or size2 is None or size1 != size2:
raise ValueError("If elements of min_sum_per_partition and "
"max_sum_per_partition are sequences, then"
" they must have the same length.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace at the end of line like above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.

analysis/parameter_tuning.py Outdated Show resolved Hide resolved
analysis/per_partition_combiners.py Outdated Show resolved Hide resolved
analysis/per_partition_combiners.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@dvadym dvadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review!

if size1 is None or size2 is None or size1 != size2:
raise ValueError("If elements of min_sum_per_partition and "
"max_sum_per_partition are sequences, then"
" they must have the same length.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a multi-line string, which doesn't contain new-lines, so it doens't matter where to put spaces.

@dvadym dvadym merged commit b09c365 into OpenMined:main Sep 12, 2024
6 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants