Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add groupby and docs #373

Merged
merged 6 commits into from
Dec 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/src/dplyr.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ DataFramesMeta.jl macro | By-row version | Description | `dplyr` equivalent
`@subset` | `@rsubset` | filter rows | `filter`
`@orderby` | `@rorderby` | re-order or arrange rows | `arrange`
`@combine` | | summarise values | `summarize` (but `@combine` is more flexible)
`groupby` | | allows for group operations in the "split-apply-combine" concept | `group_by`
`@groupby` | | allows for group operations in the "split-apply-combine" concept | `group_by`

# DataFramesMeta.jl Verbs In Action

Expand Down Expand Up @@ -341,15 +341,15 @@ DataFrames.jl also provides the function `describe` which performs many of these
describe(msleep)
```

## Group Operations using `groupby` and `@combine`
## Group Operations using `@groupby` and `@combine`

The `groupby` verb is an important function in DataFrames.jl (it does not live in DataFramesMeta.jl). As we mentioned before it's related to concept of "split-apply-combine". We literally want to split the data frame by some variable (e.g. taxonomic order), apply a function to the individual data frames and then combine the output.
The `@groupby` verb is the first step in the "split-apply-combine" workflow. We literally want to split the data frame by some variable (e.g. taxonomic order), apply a function to the individual data frames and then combine the output.

Let's do that: split the `msleep` data frame by the taxonomic order, then ask for the same summary statistics as above. We expect a set of summary statistics for each taxonomic order.

```@repl 1
@chain msleep begin
groupby(:order)
@groupby :order
@combine begin
:avg_sleep = mean(:sleep_total)
:min_sleep = minimum(:sleep_total)
Expand All @@ -363,7 +363,7 @@ Split-apply-combine can also be used with `@transform` to add new variables to a

```@repl 1
@chain msleep begin
groupby(:order)
@groupby :order
@transform :sleep_genus = :sleep_total .- mean(:sleep_total)
end
```
Expand Down
27 changes: 21 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ In addition, DataFramesMeta provides
* Row-wise versions of the above macros in the form of `@rtransform`, `@rtransform!`,
`@rselect`, `@rselect!`, `@rorderby`, `@rsubset`, and `@rsubset!`.
* `@rename` and `@rename!` for renaming columns
* `@groupby` for grouping data
* `@by`, for grouping and combining a data frame in a single step
* `@with`, for working with the columns of a data frame with high performance and
convenient syntax
Expand Down Expand Up @@ -64,7 +65,7 @@ data frame.

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@select(df, :x, :y)
@select(df, :x2 = 2 * :x, :y)
@select(gd, :x2 = 2 .* :y .* first(:y))
Expand Down Expand Up @@ -98,7 +99,7 @@ data frame.

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@transform(df, :x2 = 2 * :x, :y)
@transform(gd, :x2 = 2 .* :y .* first(:y))
@transform!(df, :x, :y)
Expand All @@ -115,7 +116,7 @@ Select row subsets. Operates on both a `DataFrame` and a `GroupedDataFrame`.
```julia
using Statistics
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
outside_var = 1;
@subset(df, :x .> 1)
@subset(df, :x .> outside_var)
Expand All @@ -134,11 +135,14 @@ acts like a `GroupedDataFrame` with one group.
Like `@select` and `@transform`, transformations are called with the keyword-like
syntax `:y = f(:x)`.

To group data together into a `GroupedDataFrame`, use `@groupby`, a short-hand for
the DataFrames.jl function `groupby`.

Examples:

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@combine(gd, :x2 = sum(:y))
@combine(gd, :x2 = :y .- sum(:y))
@combine(gd, $AsTable = (n1 = sum(:y), n2 = first(:y)))
Expand All @@ -161,6 +165,17 @@ gd = groupby(df, :x);
@combine(gd, $AsTable = (a = sum(:x), b = sum(:y)))
```

### `@by`

Perform the grouping and combining operations in one step with `@by`

```
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
@by df :x begin
:x = sum(:y)
end
```

## `@orderby`

Sort rows in a `DataFrame` by values in one of several columns or a
Expand Down Expand Up @@ -355,7 +370,7 @@ julia> @subset df @byrow begin
however, like with `ByRow` in DataFrames.jl, when `@byrow` is
used, functions do not take into account the grouping, so for
example the result of `@transform(df, @byrow :y = f(:x))` and
`@transform(groupby(df, :g), @byrow :y = f(:x))` is the same.
`@transform(@groupby(df, :g), @byrow :y = f(:x))` is the same.

## Propagating missing values with `@passmissing`

Expand Down Expand Up @@ -912,7 +927,7 @@ functions.
| `@subset` | `filter` | `Where` |
| `@transform` | `mutate` | `Select` (?) |
| `@by` | | `GroupBy` |
| `groupby` | `group_by` | `GroupBy` |
| `@groupby` | `group_by` | `GroupBy` |
| `@combine` | `summarise`/`do` | |
| `@orderby` | `arrange` | `OrderBy` |
| `@select` | `select` | `Select` |
Expand Down
1 change: 1 addition & 0 deletions src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ export @with,
@distinct, @rdistinct, @distinct!, @rdistinct!,
@eachrow, @eachrow!,
@byrow, @passmissing, @astable, @kwarg,
@groupby,
@based_on, @where # deprecated

const DOLLAR = raw"$"
Expand Down
42 changes: 42 additions & 0 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -860,7 +860,7 @@

### Examples

```jldoctest

Check failure on line 863 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:863-954 ```jldoctest julia> using DataFramesMeta, Statistics julia> df = DataFrame(x = 1:3, y = [2, 1, 2]); julia> globalvar = [2, 1, 0]; julia> @subset(df, :x .> 1) 2×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 2 1 2 │ 3 2 julia> @subset(df, :x .> globalvar) 2×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 2 1 2 │ 3 2 julia> @subset df begin :x .> globalvar :y .== 3 end 0×2 DataFrame julia> df = DataFrame(n = 1:20, x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 3, 1, 1, 2]); julia> g = groupby(df, :x); julia> @subset(g, :n .> mean(:n)) 8×2 DataFrame Row │ n x │ Int64 Int64 ─────┼────────────── 1 │ 12 1 2 │ 13 1 3 │ 15 2 4 │ 16 2 5 │ 17 3 6 │ 18 1 7 │ 19 1 8 │ 20 2 julia> @subset g begin :n .> mean(:n) :n .< 20 end 7×2 DataFrame Row │ n x │ Int64 Int64 ─────┼────────────── 1 │ 12 1 2 │ 13 1 3 │ 15 2 4 │ 16 2 5 │ 17 3 6 │ 18 1 7 │ 19 1 julia> df = DataFrame(a = [1, 2, missing], b = ["x", "y", missing]); julia> @subset(df, :a .== 1) 1×2 DataFrame Row │ a b │ Int64? String? ─────┼───────────────── 1 │ 1 x julia> @subset(df, :a .< 3; view = true) 2×2 SubDataFrame Row │ a b │ Int64? String? ─────┼───────────────── 1 │ 1 x 2 │ 2 y julia> @subset df begin :a .< 3 @kwarg view = true end 2×2 SubDataFrame Row │ a b │ Int64? String? ─────┼───────────────── 1 │ 1 x 2 │ 2 y ``` Subexpression: @subset df begin :x .> globalvar :y .== 3 end Evaluated output: 0×2 DataFrame Row │ x y │ Int64 Int64 ─────┴────────────── Expected output: 0×2 DataFrame diff = Warning: Diff output requires color. 0×2 DataFrameDataFrame Row │ x y │ Int64 Int64 ─────┴──────────────
julia> using DataFramesMeta, Statistics

julia> df = DataFrame(x = 1:3, y = [2, 1, 2]);
Expand Down Expand Up @@ -976,7 +976,7 @@
Use this function as an alternative to placing the `.` to broadcast row-wise operations.

### Examples
```jldoctest

Check failure on line 979 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:979-1009 ```jldoctest julia> using DataFramesMeta julia> df = DataFrame(A=1:5, B=["apple", "pear", "apple", "orange", "pear"]) 5×2 DataFrame Row │ A B │ Int64 String ─────┼─────────────── 1 │ 1 apple 2 │ 2 pear 3 │ 3 apple 4 │ 4 orange 5 │ 5 pear julia> @rsubset df :A > 3 2×2 DataFrame Row │ A B │ Int64 String ─────┼─────────────── 1 │ 4 orange 2 │ 5 pear julia> @rsubset df :A > 3 || :B == "pear" 3×2 DataFrame Row │ A B │ Int64 String ─────┼─────────────── 1 │ 2 pear 2 │ 4 orange 3 │ 5 pear ``` Subexpression: @rsubset df :A > 3 || :B == "pear" Evaluated output: 3×2 DataFrame Row │ A B │ Int64 String ─────┼─────────────── 1 │ 2 pear 2 │ 4 orange 3 │ 5 pear Expected output: 3×2 DataFrame Row │ A B │ Int64 String ─────┼─────────────── 1 │ 2 pear 2 │ 4 orange 3 │ 5 pear diff = Warning: Diff output requires color. 3×2 DataFrame DataFrame Row │ A B B │ Int64 String ─────┼─────────────── String ─────┼─────────────── 1 │ 2 pear pear 2 │ 4 orange orange 3 │ 5 pear
julia> using DataFramesMeta

julia> df = DataFrame(A=1:5, B=["apple", "pear", "apple", "orange", "pear"])
Expand Down Expand Up @@ -1128,7 +1128,7 @@

### Examples

```jldoctest

Check failure on line 1131 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:1131-1205 ```jldoctest julia> using DataFramesMeta, Statistics julia> df = DataFrame(x = 1:3, y = [2, 1, 2]); julia> globalvar = [2, 1, 0]; julia> @subset!(copy(df), :x .> 1) 2×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 2 1 2 │ 3 2 julia> @subset!(copy(df), :x .> globalvar) 2×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 2 1 2 │ 3 2 julia> @subset! copy(df) begin :x .> globalvar :y .== 3 end 0×2 DataFrame julia> df = DataFrame(n = 1:20, x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 3, 1, 1, 2]); julia> g = groupby(copy(df), :x); julia> @subset!(g, :n .> mean(:n)) 8×2 DataFrame Row │ n x │ Int64 Int64 ─────┼────────────── 1 │ 12 1 2 │ 13 1 3 │ 15 2 4 │ 16 2 5 │ 17 3 6 │ 18 1 7 │ 19 1 8 │ 20 2 julia> g = groupby(copy(df), :x); julia> @subset! g begin :n .> mean(:n) :n .< 20 end 7×2 DataFrame Row │ n x │ Int64 Int64 ─────┼────────────── 1 │ 12 1 2 │ 13 1 3 │ 15 2 4 │ 16 2 5 │ 17 3 6 │ 18 1 7 │ 19 1 julia> d = DataFrame(a = [1, 2, missing], b = ["x", "y", missing]); julia> @subset!(d, :a .== 1) 1×2 DataFrame Row │ a b │ Int64? String? ─────┼───────────────── 1 │ 1 x ``` Subexpression: @subset! copy(df) begin :x .> globalvar :y .== 3 end Evaluated output: 0×2 DataFrame Row │ x y │ Int64 Int64 ─────┴────────────── Expected output: 0×2 DataFrame diff = Warning: Diff output requires color. 0×2 DataFrameDataFrame Row │ x y │ Int64 Int64 ─────┴──────────────
julia> using DataFramesMeta, Statistics

julia> df = DataFrame(x = 1:3, y = [2, 1, 2]);
Expand Down Expand Up @@ -1312,7 +1312,7 @@

### Examples

```jldoctest

Check failure on line 1315 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:1315-1387 ```jldoctest julia> using DataFramesMeta, Statistics julia> d = DataFrame(x = [3, 3, 3, 2, 1, 1, 1, 2, 1, 1], n = 1:10, c = ["a", "c", "b", "e", "d", "g", "f", "i", "j", "h"]); julia> @orderby(d, -:n) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 10 h 2 │ 1 9 j 3 │ 2 8 i 4 │ 1 7 f 5 │ 1 6 g 6 │ 1 5 d 7 │ 2 4 e 8 │ 3 3 b 9 │ 3 2 c 10 │ 3 1 a julia> @orderby(d, invperm(sortperm(:c, rev = true))) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 9 j 2 │ 2 8 i 3 │ 1 10 h 4 │ 1 6 g 5 │ 1 7 f 6 │ 2 4 e 7 │ 1 5 d 8 │ 3 2 c 9 │ 3 3 b 10 │ 3 1 a julia> @orderby d begin :x abs.(:n .- mean(:n)) end 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a julia> @orderby d @byrow :x^2 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 1 a 9 │ 3 2 b 10 │ 3 3 c ``` Subexpression: @orderby d begin :x abs.(:n .- mean(:n)) end Evaluated output: 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 d 2 │ 1 6 g 3 │ 1 7 f 4 │ 1 9 j 5 │ 1 10 h 6 │ 2 4 e 7 │ 2 8 i 8 │ 3 3 b 9 │ 3 2 c 10 │ 3 1 a Expected output: 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a diff = Warning: Diff output requires color. 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e d 2 │ 1 6 f g 3 │ 1 7 g f 4 │ 1 9 i j 5 │ 1 10 j h 6 │ 2 4 d e 7 │ 2 8 h i 8 │ 3 3 c b 9 │ 3 2 b c 10 │ 3 1 a

Check failure on line 1315 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:1315-1387 ```jldoctest julia> using DataFramesMeta, Statistics julia> d = DataFrame(x = [3, 3, 3, 2, 1, 1, 1, 2, 1, 1], n = 1:10, c = ["a", "c", "b", "e", "d", "g", "f", "i", "j", "h"]); julia> @orderby(d, -:n) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 10 h 2 │ 1 9 j 3 │ 2 8 i 4 │ 1 7 f 5 │ 1 6 g 6 │ 1 5 d 7 │ 2 4 e 8 │ 3 3 b 9 │ 3 2 c 10 │ 3 1 a julia> @orderby(d, invperm(sortperm(:c, rev = true))) 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 9 j 2 │ 2 8 i 3 │ 1 10 h 4 │ 1 6 g 5 │ 1 7 f 6 │ 2 4 e 7 │ 1 5 d 8 │ 3 2 c 9 │ 3 3 b 10 │ 3 1 a julia> @orderby d begin :x abs.(:n .- mean(:n)) end 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 3 c 9 │ 3 2 b 10 │ 3 1 a julia> @orderby d @byrow :x^2 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 1 a 9 │ 3 2 b 10 │ 3 3 c ``` Subexpression: @orderby d @byrow :x^2 Evaluated output: 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 d 2 │ 1 6 g 3 │ 1 7 f 4 │ 1 9 j 5 │ 1 10 h 6 │ 2 4 e 7 │ 2 8 i 8 │ 3 1 a 9 │ 3 2 c 10 │ 3 3 b Expected output: 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e 2 │ 1 6 f 3 │ 1 7 g 4 │ 1 9 i 5 │ 1 10 j 6 │ 2 4 d 7 │ 2 8 h 8 │ 3 1 a 9 │ 3 2 b 10 │ 3 3 c diff = Warning: Diff output requires color. 10×3 DataFrame Row │ x n c │ Int64 Int64 String ─────┼────────────────────── 1 │ 1 5 e d 2 │ 1 6 f g 3 │ 1 7 g f 4 │ 1 9 i j 5 │ 1 10 j h 6 │ 2 4 d e 7 │ 2 8 h i 8 │ 3 1 a 9 │ 3 2 b c 10 │ 3 3 cb
julia> using DataFramesMeta, Statistics

julia> d = DataFrame(x = [3, 3, 3, 2, 1, 1, 1, 2, 1, 1], n = 1:10,
Expand Down Expand Up @@ -1407,7 +1407,7 @@
Use this function as an alternative to placing the `.` to broadcast row-wise operations.

### Examples
```jldoctest

Check failure on line 1410 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:1410-1447 ```jldoctest julia> using DataFramesMeta julia> df = DataFrame(x = [8,8,-8,7,7,-7], y = [-1, 1, -2, 2, -3, 3]) 6×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 8 -1 2 │ 8 1 3 │ -8 -2 4 │ 7 2 5 │ 7 -3 6 │ -7 3 julia> @rorderby df abs(:x) (:x * :y^3) Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 7 -3 2 │ -7 3 3 │ 7 2 4 │ 8 -1 5 │ 8 1 6 │ -8 -2 julia> @rorderby df :y == 2 ? -:x : :y 6×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 7 2 2 │ 7 -3 3 │ -8 -2 4 │ 8 -1 5 │ 8 1 6 │ -7 3 ``` Subexpression: @rorderby df abs(:x) (:x * :y^3) Evaluated output: 6×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 7 -3 2 │ -7 3 3 │ 7 2 4 │ 8 -1 5 │ 8 1 6 │ -8 -2 Expected output: Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 7 -3 2 │ -7 3 3 │ 7 2 4 │ 8 -1 5 │ 8 1 6 │ -8 -2 diff = Warning: Diff output requires color. 6×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 7 -3 2 │ -7 3 3 │ 7 2 4 │ 8 -1 5 │ 8 1 6 │ -8 -2
julia> using DataFramesMeta

julia> df = DataFrame(x = [8,8,-8,7,7,-7], y = [-1, 1, -2, 2, -3, 3])
Expand Down Expand Up @@ -2530,7 +2530,7 @@

### Examples

```jldoctest

Check failure on line 2533 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:2533-2553 ```jldoctest julia> using DataFramesMeta; julia> df = DataFrame(x = 1:10, y = 10:-1:1); julia> @distinct(df, :x .+ :y) 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 julia> @distinct df begin :x .+ :y end 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 ``` Subexpression: @distinct(df, :x .+ :y) Evaluated output: 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 1 10 Expected output: 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 diff = Warning: Diff output requires color. 1×2 DataFrame Row │ x y y │ Int64 Int64 ─────┼─────────────── Int64 ─────┼────────────── 1 │ 1 1 10

Check failure on line 2533 in src/macros.jl

View workflow job for this annotation

GitHub Actions / build

doctest failure in ~/work/DataFramesMeta.jl/DataFramesMeta.jl/src/macros.jl:2533-2553 ```jldoctest julia> using DataFramesMeta; julia> df = DataFrame(x = 1:10, y = 10:-1:1); julia> @distinct(df, :x .+ :y) 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 julia> @distinct df begin :x .+ :y end 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 ``` Subexpression: @distinct df begin :x .+ :y end Evaluated output: 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼────────────── 1 │ 1 10 Expected output: 1×2 DataFrame Row │ x y │ Int64 Int64 ─────┼─────────────── 1 │ 1 10 diff = Warning: Diff output requires color. 1×2 DataFrame Row │ x y y │ Int64 Int64 ─────┼─────────────── Int64 ─────┼────────────── 1 │ 1 1 10
julia> using DataFramesMeta;

julia> df = DataFrame(x = 1:10, y = 10:-1:1);
Expand Down Expand Up @@ -3008,3 +3008,45 @@
esc(rename!_helper(x, args...))
end

function groupby_helper(df, args...)
t = Expr(:tuple, args...)
:($groupby($df, ($Cols($t...))))
end

"""
groupby(df, args...)

Group a data frame by columns. An alias for

```
groupby(df, Cols(args...))
```

but with a few convenience features.

## Details

`@groupby` does not perform any transformations or allow the
generation of new columns. New column generation must be done
before `@groupby` is called.

`@groupby` allows mixing of `Symbol`
and `String` inputs, such that `@groupby df :A "B"`
is supported.

Arguments are not escaped and DataFramesMeta.jl rules for column
selection, such as `$DOLLAR` for escaping, do not apply.

## Examples
```julia-repl
julia> df = DataFrame(A = [1, 1], B = [3, 4], C = [6, 6]);
julia> @groupby df :A;
julia> @groupby df :A :B;
julia> @groupby df [:A, :B];
julia> @groupby df :A [:B, :C];
```
"""
macro groupby(df, args...)
esc(groupby_helper(df, args...))
end

17 changes: 17 additions & 0 deletions test/grouping.jl
Original file line number Diff line number Diff line change
Expand Up @@ -349,4 +349,21 @@ end
@test @select(g, :a, @byrow :t = :a ^ 2).t ≅ d.a .^ 2
end

@testset "@groupby" begin
df = DataFrame(a = [1, 2], b = [3, 4], c = [5, 6])
resa = groupby(df, [:a])
resab = groupby(df, [:a, :b])
resabc = groupby(df, [:a, :b, :c])
ab = [:a, :b]

@test @groupby(df, :a) == resa
@test @groupby(df, :a, :b) == resab
@test (@groupby df ab) == resab
@test (@groupby df :a 2) == resab
@test (@groupby df [:a, :b]) == resab
@test (@groupby df :a "b") == resab
@test (@groupby df All()) == resabc
@test (@groupby df Cols(:a, 2, "c")) == resabc
end

end # module
Loading