Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] df.groupby(['a']).agg({'b': ['mean', 'mean']}) only produces a single column (as opposed to 2) #17649

Open
MarcoGorelli opened this issue Dec 21, 2024 · 3 comments
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@MarcoGorelli
Copy link
Contributor

MarcoGorelli commented Dec 21, 2024

Describe the bug
A clear and concise description of what the bug is.

Steps/Code to reproduce bug

df = cudf.DataFrame({'a': [1,1,2], 'b': [4,5,6]})
df.groupby(['a']).agg({'b': ['mean', 'mean']})

outputs

 	b
	mean
a 	
1 	4.5
2 	6.0

Expected behavior

what pandas does:

 	b
	mean 	mean
a 		
1 	4.5 	4.5
2 	6.0 	6.0

Spotted in Narwhals

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

24.10.01, google colab

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

@MarcoGorelli MarcoGorelli added the bug Something isn't working label Dec 21, 2024
@galipremsagar
Copy link
Contributor

To support this case we will need to add support for duplicate column names in cudf, which I don't will be supported in near future. @mroeschke does pandas intend to continue supporting duplicate column name use-case?

@MarcoGorelli
Copy link
Contributor Author

thanks for your response

tbf now that you've added support for NamedAgg in cuDF, we can probably rewrite how we do group_by in Narwhals for pandas-like libraries and avoid the duplicate column names completely

@mroeschke does pandas intend to continue supporting duplicate column name use-case?

the last I remember hearing about this was that duplicate column names are a fact of life and should continue to be supported (though to be honest I think they create more problems than they solve)

@mroeschke mroeschke added the Python Affects Python cuDF API. label Dec 30, 2024
@mroeschke
Copy link
Contributor

Yeah agreed with Marco that I think pandas will continue to support duplicate column names in the foreseeable future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants