diff --git a/docs/Project.toml b/docs/Project.toml index 6b701748f0..ebe348a76b 100755 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,6 +1,8 @@ [deps] CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597" +Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc" +DataFrameMacros = "75880514-38bc-4a95-a458-c2aea5a3a702" DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28" diff --git a/docs/src/index.md b/docs/src/index.md index 8c2fea1734..63d828d462 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -115,12 +115,15 @@ integrated they are with DataFrames.jl. A range of convenience functions for DataFrames.jl that augment `select` and `transform` to provide a user experience similar to that provided by [dplyr](https://dplyr.tidyverse.org/) in R. + - [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl): + Provides macro versions of the common DataFrames.jl functions similar to DataFramesMeta.jl, + with convenient syntax for the manipulation of multiple columns at once. - [Query.jl](https://github.com/queryverse/Query.jl): Query.jl provides a single framework for data wrangling that works with a range of libraries, including DataFrames.jl, other tabular data libraries (more on those below), and even non-tabular data. Provides many convenience functions analogous to those in dplyr in R or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). - - You can find more on both of these packages in the + - You can find more information on these packages in the [Data manipulation frameworks](@ref) section of this manual. - **And More!** - [Graphs.jl](https://github.com/JuliaGraphs/Graphs.jl): A pure-Julia, diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index 0d4e9c4990..47799c5d52 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -1,7 +1,7 @@ # Data manipulation frameworks -Two popular frameworks provide convenience methods to manipulate `DataFrame`s: -DataFramesMeta.jl and Query.jl. They implement a functionality similar to +Three frameworks provide convenience methods to manipulate `DataFrame`s: +DataFramesMeta.jl, DataFrameMacros.jl and Query.jl. They implement a functionality similar to [dplyr](https://dplyr.tidyverse.org/) or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). @@ -117,6 +117,84 @@ julia> @chain df begin You can find more details about how this package can be used on the [DataFramesMeta.jl GitHub page](https://github.com/JuliaData/DataFramesMeta.jl). +## DataFrameMacros.jl + +[DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) is +an alternative to DataFramesMeta.jl with an additional focus on convenient +solutions for the transformation of multiple columns at once. +The instructions below are for version 0.3 of DataFrameMacros.jl. + +First, install the DataFrameMacros.jl package: + +```julia +using Pkg +Pkg.add("DataFrameMacros") +``` + +In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. +There is also a `@groupby` which allows creating grouping columns on the fly +using the same syntax as `@transform`, for grouping by new columns +without writing them out twice. + +In the example below, you can also see some of DataFrameMacros.jl's multi-column +features, where `mean` is applied to both age columns at once by selecting +them with the `r"age"` regex. The new column names are then derived using the +`"{}"` shortcut which splices the transformed column names into a string. + +```jldoctest dataframemacros +julia> using DataFrames, DataFrameMacros, Chain, Statistics + +julia> df = DataFrame(name=["John", "Sally", "Roger"], + age=[54.0, 34.0, 79.0], + children=[0, 2, 4]) +3×3 DataFrame + Row │ name age children + │ String Float64 Int64 +─────┼─────────────────────────── + 1 │ John 54.0 0 + 2 │ Sally 34.0 2 + 3 │ Roger 79.0 4 + +julia> @chain df begin + @transform :age_months = :age * 12 + @groupby :has_child = :children > 0 + @combine "mean_{}" = mean({r"age"}) + end +2×3 DataFrame + Row │ has_child mean_age mean_age_months + │ Bool Float64 Float64 +─────┼────────────────────────────────────── + 1 │ false 54.0 648.0 + 2 │ true 56.5 678.0 +``` + +There's also the capability to reference a group of multiple columns as a single unit, +for example to run aggregations over them, with the `{{ }}` syntax. +In the following example, the first quarter is compared to the maximum of the other three: + +```jldoctest dataframemacros +julia> df = DataFrame(q1 = [12.0, 0.4, 42.7], + q2 = [6.4, 2.3, 40.9], + q3 = [9.5, 0.2, 13.6], + q4 = [6.3, 5.4, 39.3]) +3×4 DataFrame + Row │ q1 q2 q3 q4 + │ Float64 Float64 Float64 Float64 +─────┼──────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 + 2 │ 0.4 2.3 0.2 5.4 + 3 │ 42.7 40.9 13.6 39.3 + +julia> @transform df :q1_best = :q1 > maximum({{Not(:q1)}}) +3×5 DataFrame + Row │ q1 q2 q3 q4 q1_best + │ Float64 Float64 Float64 Float64 Bool +─────┼───────────────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 true + 2 │ 0.4 2.3 0.2 5.4 false + 3 │ 42.7 40.9 13.6 39.3 true +``` + ## Query.jl The [Query.jl](https://github.com/queryverse/Query.jl) package provides advanced diff --git a/docs/src/man/working_with_dataframes.md b/docs/src/man/working_with_dataframes.md index 88b7557385..01e6953759 100755 --- a/docs/src/man/working_with_dataframes.md +++ b/docs/src/man/working_with_dataframes.md @@ -738,6 +738,9 @@ operations: - the [DataFramesMeta.jl](https://github.com/JuliaStats/DataFramesMeta.jl) package provides interfaces similar to LINQ and [dplyr](https://dplyr.tidyverse.org) +- the [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) + package provides macros for most standard functions from DataFrames.jl, + with convenient syntax for the manipulation of multiple columns at once. See the [Data manipulation frameworks](@ref) section for more information.