Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
pdeffebach committed Dec 22, 2023
1 parent b381960 commit 237462f
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 11 deletions.
10 changes: 5 additions & 5 deletions docs/src/dplyr.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ DataFramesMeta.jl macro | By-row version | Description | `dplyr` equivalent
`@subset` | `@rsubset` | filter rows | `filter`
`@orderby` | `@rorderby` | re-order or arrange rows | `arrange`
`@combine` | | summarise values | `summarize` (but `@combine` is more flexible)
`groupby` | | allows for group operations in the "split-apply-combine" concept | `group_by`
`@groupby` | | allows for group operations in the "split-apply-combine" concept | `group_by`

# DataFramesMeta.jl Verbs In Action

Expand Down Expand Up @@ -341,15 +341,15 @@ DataFrames.jl also provides the function `describe` which performs many of these
describe(msleep)
```

## Group Operations using `groupby` and `@combine`
## Group Operations using `@groupby` and `@combine`

The `groupby` verb is an important function in DataFrames.jl (it does not live in DataFramesMeta.jl). As we mentioned before it's related to concept of "split-apply-combine". We literally want to split the data frame by some variable (e.g. taxonomic order), apply a function to the individual data frames and then combine the output.
The `@groupby` verb is the first step in the "split-apply-combine" workflow. We literally want to split the data frame by some variable (e.g. taxonomic order), apply a function to the individual data frames and then combine the output.

Let's do that: split the `msleep` data frame by the taxonomic order, then ask for the same summary statistics as above. We expect a set of summary statistics for each taxonomic order.

```@repl 1
@chain msleep begin
groupby(:order)
@groupby :order
@combine begin
:avg_sleep = mean(:sleep_total)
:min_sleep = minimum(:sleep_total)
Expand All @@ -363,7 +363,7 @@ Split-apply-combine can also be used with `@transform` to add new variables to a

```@repl 1
@chain msleep begin
groupby(:order)
@groupby :order
@transform :sleep_genus = :sleep_total .- mean(:sleep_total)
end
```
Expand Down
27 changes: 21 additions & 6 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ In addition, DataFramesMeta provides
* Row-wise versions of the above macros in the form of `@rtransform`, `@rtransform!`,
`@rselect`, `@rselect!`, `@rorderby`, `@rsubset`, and `@rsubset!`.
* `@rename` and `@rename!` for renaming columns
* `@groupby` for grouping data
* `@by`, for grouping and combining a data frame in a single step
* `@with`, for working with the columns of a data frame with high performance and
convenient syntax
Expand Down Expand Up @@ -64,7 +65,7 @@ data frame.

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@select(df, :x, :y)
@select(df, :x2 = 2 * :x, :y)
@select(gd, :x2 = 2 .* :y .* first(:y))
Expand Down Expand Up @@ -98,7 +99,7 @@ data frame.

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@transform(df, :x2 = 2 * :x, :y)
@transform(gd, :x2 = 2 .* :y .* first(:y))
@transform!(df, :x, :y)
Expand All @@ -115,7 +116,7 @@ Select row subsets. Operates on both a `DataFrame` and a `GroupedDataFrame`.
```julia
using Statistics
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
outside_var = 1;
@subset(df, :x .> 1)
@subset(df, :x .> outside_var)
Expand All @@ -134,11 +135,14 @@ acts like a `GroupedDataFrame` with one group.
Like `@select` and `@transform`, transformations are called with the keyword-like
syntax `:y = f(:x)`.

To group data together into a `GroupedDataFrame`, use `@groupby`, a short-hand for
the DataFrames.jl function `groupby`.

Examples:

```julia
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
gd = groupby(df, :x);
gd = @groupby(df, :x);
@combine(gd, :x2 = sum(:y))
@combine(gd, :x2 = :y .- sum(:y))
@combine(gd, $AsTable = (n1 = sum(:y), n2 = first(:y)))
Expand All @@ -161,6 +165,17 @@ gd = groupby(df, :x);
@combine(gd, $AsTable = (a = sum(:x), b = sum(:y)))
```

### `@by`

Perform the grouping and combining operations in one step with `@by`

```
df = DataFrame(x = [1, 1, 2, 2], y = [1, 2, 101, 102]);
@by df :x begin
:x = sum(:y)
end
```

## `@orderby`

Sort rows in a `DataFrame` by values in one of several columns or a
Expand Down Expand Up @@ -355,7 +370,7 @@ julia> @subset df @byrow begin
however, like with `ByRow` in DataFrames.jl, when `@byrow` is
used, functions do not take into account the grouping, so for
example the result of `@transform(df, @byrow :y = f(:x))` and
`@transform(groupby(df, :g), @byrow :y = f(:x))` is the same.
`@transform(@groupby(df, :g), @byrow :y = f(:x))` is the same.

## Propagating missing values with `@passmissing`

Expand Down Expand Up @@ -912,7 +927,7 @@ functions.
| `@subset` | `filter` | `Where` |
| `@transform` | `mutate` | `Select` (?) |
| `@by` | | `GroupBy` |
| `groupby` | `group_by` | `GroupBy` |
| `@groupby` | `group_by` | `GroupBy` |
| `@combine` | `summarise`/`do` | |
| `@orderby` | `arrange` | `OrderBy` |
| `@select` | `select` | `Select` |
Expand Down
17 changes: 17 additions & 0 deletions test/grouping.jl
Original file line number Diff line number Diff line change
Expand Up @@ -349,4 +349,21 @@ end
@test @select(g, :a, @byrow :t = :a ^ 2).t d.a .^ 2
end

@testset "@groupby" begin
df = DataFrame(a = [1, 2], b = [3, 4], c = [5, 6])
resa = groupby(df, [:a])
resab = groupby(df, [:a, :b])
resabc = groupby(df, [:a, :b, :c])
ab = [:a, :b]

@test @groupby(df, :a) == resa
@test @groupby(df, :a, :b) == resab
@test (@groupby df ab) == resab
@test (@groupby df :a 2) == resab
@test (@groupby df [:a, :b]) == resab
@test (@groupby df :a "b") == resab
@test (@groupby df All()) == resabc
@test (@groupby df Cols(:a, 2, "c")) == resabc
end

end # module

0 comments on commit 237462f

Please sign in to comment.