Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at an astable flag #298

Merged
merged 29 commits into from
Sep 24, 2021
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a8701c8
initial attempt
pdeffebach Sep 14, 2021
9b997a6
finally working
pdeffebach Sep 15, 2021
d639560
start adding tests
pdeffebach Sep 15, 2021
b77e8ca
more tests
pdeffebach Sep 16, 2021
3cdf0d5
more tests
pdeffebach Sep 16, 2021
b878fbb
add docstring
pdeffebach Sep 16, 2021
2344a2e
tests pass
pdeffebach Sep 16, 2021
6557def
add ByRow in docstring
pdeffebach Sep 16, 2021
6002def
add type annotation
pdeffebach Sep 21, 2021
08a1c4b
better docs
pdeffebach Sep 21, 2021
581b2cf
more docs fixes
pdeffebach Sep 21, 2021
7cc8947
update index.md
pdeffebach Sep 21, 2021
0eca67d
Apply suggestions from code review
pdeffebach Sep 21, 2021
a4ab9a6
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 21, 2021
ab9bae4
clean named tuple creation
pdeffebach Sep 22, 2021
495f08a
add example with string
pdeffebach Sep 22, 2021
01cb5e7
grouping tests
pdeffebach Sep 22, 2021
01fb3b7
Update src/macros.jl
pdeffebach Sep 22, 2021
915191c
changes
pdeffebach Sep 23, 2021
a331fc2
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 23, 2021
2ce4d9e
fix some errors
pdeffebach Sep 23, 2021
57b4051
add macro check
pdeffebach Sep 23, 2021
da7674d
add errors for bad flag combo
pdeffebach Sep 23, 2021
285e3ac
better grouping tests
pdeffebach Sep 23, 2021
713eaf0
Update src/parsing_astable.jl
pdeffebach Sep 23, 2021
4e01c4a
add snipper to transform, select, combine, by
pdeffebach Sep 23, 2021
09c692a
add mutating tests
pdeffebach Sep 23, 2021
ae26da8
get rid of debugging printin
pdeffebach Sep 24, 2021
a7fd1a2
Apply suggestions from code review
pdeffebach Sep 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ version = "0.9.1"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"

[compat]
Chain = "0.4"
DataFrames = "1"
MacroTools = "0.5"
Reexport = "0.2, 1"
julia = "1"
Chain = "0.4"

[extras]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
32 changes: 30 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ In addition, DataFramesMeta provides
convenient syntax.
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros).
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations.
* `@astable` to create multiple columns within a single transformation.
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s
`%>%` in R.

Expand Down Expand Up @@ -396,11 +397,38 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str)
3 │ missing missing
```

## Creating multiple columns at once with `@astable`

Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple
new variables in the same operation, yet have them share
information.

In a single block, all assignments of the form `:y = f(:x)`
or `$y = f(:x)` at the top-level generate new columns. In the 2nd example, `y`
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
must be a string or `Symbol`.

```
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]);

julia> @transform df @astable begin
ex = extrema(:b)
:b_first = :b .- first(ex)
:b_last = :b .- last(ex)
end
3×4 DataFrame
Row │ a b b_first b_last
│ Int64 Int64 Int64 Int64
─────┼───────────────────────────────
1 │ 1 400 0 -200
2 │ 2 500 100 -100
3 │ 3 600 200 0
```


## [Working with column names programmatically with `$`](@id dollar)

DataFramesMeta provides the special syntax `$` for referring to
columns in a data frame via a `Symbol`, string, or column position as either
a literal or a variable.
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it given our recent discussion on Discourse, I think it is essential to mention when the $ reference is resolved.
Also maybe add an example when macros are used within a function? I think these are cases not trivial. This can be another PR of course

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this as another PR. In summary, you can't use other macros which use $. I will try and sort out if I can carve out an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear why I stress it so much. With DataFrames.jl my answer to users is: if you learn Julia Base then you will know exactly how DataFrames.jl works. With DataFramesMeta.jl unfortunately this is not the case as it is a DSL so we need to be very precise how things work in documentation.


```julia
df = DataFrame(A = 1:3, B = [2, 1, 2])
Expand Down
5 changes: 4 additions & 1 deletion src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ using Reexport

using MacroTools

using OrderedCollections: OrderedCollections

@reexport using DataFrames

@reexport using Chain
Expand All @@ -16,12 +18,13 @@ export @with,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@eachrow, @eachrow!,
@byrow, @passmissing,
@byrow, @passmissing, @astable,
@based_on, @where # deprecated

const DOLLAR = raw"$"

include("parsing.jl")
include("parsing_astable.jl")
include("macros.jl")
include("linqmacro.jl")
include("eachrow.jl")
Expand Down
155 changes: 133 additions & 22 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ end


"""
passmissing(args...)
@passmissing(args...)

Propograte missing values inside DataFramesMeta.jl macros.
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

Expand Down Expand Up @@ -350,6 +350,138 @@ macro passmissing(args...)
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros."))
end

"""
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
@astable(args...)

Return a `NamedTuple` from a single transformation inside the DataFramesMeta.jl
macros, `@select`, `@transform`, and their mutating and row-wise equivalents.

`@astable` acts on a single block. It works through all top-level expressions
and collects all such expressions of the form `:y = ...`, i.e. assignments to a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol`, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the
expression, all assignments are collected into a `NamedTuple` to be used
with the `AsTable` destination in the DataFrames.jl transformation
mini-language.

Concretely, the expressions

```
df = DataFrame(a = 1)

@rtransform df @astable begin
:x = 1
y = 50
:z = :x + y + :a
end
```

become the pair

```
function f(a)
x_t = 1
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
y = 50
z_t = x_t + y + a

(; x = x_t, z = z_t)
end

transform(df, [:a] => ByRow(f) => AsTable)
```

`@astable` has two major advantages at the cost of increasing complexity.
First, `@astable` makes it easy to create multiple columns from a single
transformation, which share a scope. For example, `@astable` allows
for the following (where `:x` and `:x_2` exist in the data frame already).

```
@transform df @astable begin
m = mean(:x)
:x_demeaned = :x .- m
:x2_demeaned = :x2 .- m
end
```

The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`,
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
which does not need to be calculated twice.

Second, `@astable` is useful when performing intermediate calculations
and storing their results in new columns. For example, the following fails.

```
@rtransform df begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
```

This because DataFrames.jl does not guarantee sequential evaluation of
transformations. `@astable` solves this problem

@rtransform df @astable begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end

Column assignment in `@astable` follows the same rules as
column assignment more generally. Construct a new column
from a string by escaping it with `$DOLLAR`, which can be a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol` or an `AbstractString`. References to existing
columns may be a `Symbol`, `AbstractString`, or an
integer.

### Examples

```
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> d = @rtransform df @astable begin
:x = 1
y = 5
:z = :x + y
end
3×4 DataFrame
Row │ a b x z
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 4 1 6
2 │ 2 5 1 6
3 │ 3 6 1 6

julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);

julia> @by df :a @astable begin
ex = extrema(:b)
:min_b = first(ex)
:max_b = last(ex)
end
2×3 DataFrame
Row │ a min_b max_b
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 6
2 │ 2 70 80

julia> @rtransform df @astable begin
f_a = first(:a)
$(DOLLAR)new_col = :a + :b + f_a
bkamins marked this conversation as resolved.
Show resolved Hide resolved
:y = :a * :b
end
4×4 DataFrame
Row │ a b New Column y
│ Int64 Int64 Int64 Int64
─────┼─────────────────────────────────
1 │ 1 5 7 5
2 │ 1 6 8 6
3 │ 2 70 74 140
4 │ 2 80 84 160
```

"""
macro astable(args...)
throw(ArgumentError("@astable only works inside DataFramesMeta macros."))
end

##############################################################################
##
## @with
Expand Down Expand Up @@ -1546,17 +1678,6 @@ function combine_helper(x, args...; deprecation_warning = false)

exprs, outer_flags = create_args_vector(args...)

fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

quote
Expand Down Expand Up @@ -1666,16 +1787,6 @@ end
function by_helper(x, what, args...)
# Only allow one argument when returning a Table object
exprs, outer_flags = create_args_vector(args...)
fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

Expand Down
13 changes: 10 additions & 3 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ is_macro_head(ex::Expr, name) = ex.head == :macrocall && ex.args[1] == Symbol(na

const BYROW_SYM = Symbol("@byrow")
const PASSMISSING_SYM = Symbol("@passmissing")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false))
const ASTABLE_SYM = Symbol("@astable")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false), ASTABLE_SYM => Ref(false))

extract_macro_flags(ex, exprflags = deepcopy(DEFAULT_FLAGS)) = (ex, exprflags)
function extract_macro_flags(ex::Expr, exprflags = deepcopy(DEFAULT_FLAGS))
Expand Down Expand Up @@ -269,7 +270,13 @@ function fun_to_vec(ex::Expr;
return ex_col
end

if no_dest
if final_flags[ASTABLE_SYM][]
src, fun = get_source_fun_astable(ex; exprflags = final_flags)

return :($src => $fun => AsTable)
end

if no_dest # subset and with
src, fun = get_source_fun(ex, exprflags = final_flags)
return quote
$src => $fun
Expand Down Expand Up @@ -359,7 +366,7 @@ function create_args_vector(arg; wrap_byrow::Bool=false)
outer_flags[BYROW_SYM][] = true
end

if arg isa Expr && arg.head == :block
if arg isa Expr && arg.head == :block && !outer_flags[ASTABLE_SYM][]
x = MacroTools.rmlines(arg).args
else
x = Any[arg]
Expand Down
Loading