Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: cols=:union argument (or something like it) for combine with AsTable #3005

Open
kleinschmidt opened this issue Feb 16, 2022 · 7 comments · May be fixed by #3258
Open

feature: cols=:union argument (or something like it) for combine with AsTable #3005

kleinschmidt opened this issue Feb 16, 2022 · 7 comments · May be fixed by #3258
Labels
Milestone

Comments

@kleinschmidt
Copy link
Contributor

I have a function myfun that operates on one or more rows from my dataframe, and returns a Tables compliant output. The columns will not always be the same for every input. I'd like to be able to do something like

combine(groupby(df, :group1, :group2), [:input1, :input2] => my_fun => AsTable)

But I get an error that the keys must all be the same, and there's no cols argument to control how that is handled.

If I want to do this manually, I can do something like (thanks to @bkamins for suggesting):

reduce(vcat, [insertcols!(DataFrame(myfun(v.input1, v.input2)), k...) for (k, v) in pairs(groupby(df, :group1, :group2)] ; cols=:union)

but that's pretty clunky and you miss out on the nice transform syntax (have to manually do v.input1 etc.).

I propose adding a cols kwarg to combine to control how keys are handled from AsTable, although that my be a bit punny.

@bkamins bkamins added this to the 1.x milestone Feb 16, 2022
@bkamins
Copy link
Member

bkamins commented Feb 16, 2022

I will have to think about the best design of this feature. The challenge is that assumption that all elements have the same set of columns is important for performance.

@kleinschmidt
Copy link
Contributor Author

It's also possible that another API function might better express this; unnest(df, :tblcol) is one possibility.

@bkamins
Copy link
Member

bkamins commented Feb 17, 2022

This is something that I was thinking about. In unnest(data_frame, :col), if we wanted to be consistent with AsTable we would rely on what keys returns to identify column names. But I am not sure if this is the best approach.

Let us discuss what logic for identification of fields would be best. The issue is that keys works nicely with Dict, but not with struct. But if we go for propertynames the problem arises in the opposite direction.

@bkamins
Copy link
Member

bkamins commented Feb 17, 2022

x-ref #2890

@bkamins
Copy link
Member

bkamins commented Feb 20, 2022

Is this what you want?

julia> df = DataFrame(nested=[(a=1, b=2), (b=3, c=4), (a=5, c=6)])
3×1 DataFrame
 Row │ nested
     │ NamedTup…      
─────┼────────────────
   1 │ (a = 1, b = 2)
   2 │ (b = 3, c = 4)
   3 │ (a = 5, c = 6)

julia> transform(df, :nested => Tables.dictrowtable => AsTable)
3×4 DataFrame
 Row │ nested          a        b        c       
     │ NamedTup…       Int64?   Int64?   Int64?  
─────┼───────────────────────────────────────────
   1 │ (a = 1, b = 2)        1        2  missing 
   2 │ (b = 3, c = 4)  missing        3        4
   3 │ (a = 5, c = 6)        5  missing        6

If yes, then we already have it. I have though opened JuliaData/Tables.jl#274 to allow for better control of resulting column order.

@bkamins
Copy link
Member

bkamins commented Mar 3, 2022

@kleinschmidt When JuliaData/Tables.jl#274 is merged - can you please confirm that it gives you the functionality you need?
Also do you think this pattern is enough or you would want to see a unnest function that would do something roughly like (of course the details will be more complex and that is why maybe adding unnest might be useful):

unnest(df, col::SingleColumnIndex) = select(df, Not(col), col => Tables.dictrowtable => AsTable)

@bkamins bkamins modified the milestones: 1.x, 1.5 Oct 14, 2022
@bkamins bkamins mentioned this issue Oct 14, 2022
@bkamins bkamins linked a pull request Dec 28, 2022 that will close this issue
@bkamins bkamins linked a pull request Dec 28, 2022 that will close this issue
@bkamins
Copy link
Member

bkamins commented Feb 5, 2023

x-ref #3116 (we will need to jointly make a decision how to handle this)

@bkamins bkamins linked a pull request Feb 5, 2023 that will close this issue
@bkamins bkamins modified the milestones: 1.5, 1.6 Feb 5, 2023
@bkamins bkamins modified the milestones: 1.6, 1.7 Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants