Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at an astable flag #298

Merged
merged 29 commits into from
Sep 24, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a8701c8
initial attempt
pdeffebach Sep 14, 2021
9b997a6
finally working
pdeffebach Sep 15, 2021
d639560
start adding tests
pdeffebach Sep 15, 2021
b77e8ca
more tests
pdeffebach Sep 16, 2021
3cdf0d5
more tests
pdeffebach Sep 16, 2021
b878fbb
add docstring
pdeffebach Sep 16, 2021
2344a2e
tests pass
pdeffebach Sep 16, 2021
6557def
add ByRow in docstring
pdeffebach Sep 16, 2021
6002def
add type annotation
pdeffebach Sep 21, 2021
08a1c4b
better docs
pdeffebach Sep 21, 2021
581b2cf
more docs fixes
pdeffebach Sep 21, 2021
7cc8947
update index.md
pdeffebach Sep 21, 2021
0eca67d
Apply suggestions from code review
pdeffebach Sep 21, 2021
a4ab9a6
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 21, 2021
ab9bae4
clean named tuple creation
pdeffebach Sep 22, 2021
495f08a
add example with string
pdeffebach Sep 22, 2021
01cb5e7
grouping tests
pdeffebach Sep 22, 2021
01fb3b7
Update src/macros.jl
pdeffebach Sep 22, 2021
915191c
changes
pdeffebach Sep 23, 2021
a331fc2
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 23, 2021
2ce4d9e
fix some errors
pdeffebach Sep 23, 2021
57b4051
add macro check
pdeffebach Sep 23, 2021
da7674d
add errors for bad flag combo
pdeffebach Sep 23, 2021
285e3ac
better grouping tests
pdeffebach Sep 23, 2021
713eaf0
Update src/parsing_astable.jl
pdeffebach Sep 23, 2021
4e01c4a
add snipper to transform, select, combine, by
pdeffebach Sep 23, 2021
09c692a
add mutating tests
pdeffebach Sep 23, 2021
ae26da8
get rid of debugging printin
pdeffebach Sep 24, 2021
a7fd1a2
Apply suggestions from code review
pdeffebach Sep 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ version = "0.9.1"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"

[compat]
Chain = "0.4"
DataFrames = "1"
MacroTools = "0.5"
Reexport = "0.2, 1"
julia = "1"
Chain = "0.4"

[extras]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
5 changes: 4 additions & 1 deletion src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ using Reexport

using MacroTools

using OrderedCollections: OrderedCollections

@reexport using DataFrames

@reexport using Chain
Expand All @@ -16,12 +18,13 @@ export @with,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@eachrow, @eachrow!,
@byrow, @passmissing,
@byrow, @passmissing, @astable,
@based_on, @where # deprecated

const DOLLAR = raw"$"

include("parsing.jl")
include("parsing_astable.jl")
include("macros.jl")
include("linqmacro.jl")
include("eachrow.jl")
Expand Down
114 changes: 93 additions & 21 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,99 @@ macro passmissing(args...)
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros."))
end

"""
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
astable(args...)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

Return a `NamedTuple` from a transformation inside DataFramesMeta.jl macros.

`@astable` acts on a single block. It works through all top-level expressions
and collects all such expressions of the form `:y = x`, i.e. assignments to a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol`, which is a syntax error outside of the macro. At the end of the
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
expression, all assignments are collected into a `NamedTuple` to be used
with the `AsTable` destination in the DataFrames.jl transformation
mini-language.

Concretely, the expressions

```
df = DataFrame(a = 1)

@rtransform df @astable begin
:x = 1
y = 50
:z = :x + y + :a
end
```

becomes the pair
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

```
function f(a)
x_t = 1
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
y = 50
z_t = x_t + y + a

(; x = x_t, z = z_t)
end

transform(df, [:a] => ByRow(f) => AsTable)
```

`@astable` is useful when performing intermediate calculations
yet store their results in new columns. For example, the following fails.

```
@rtransform df begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
```

This because DataFrames.jl does not guarantee sequential evaluation of
transformations. `@astable` solves this problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is an interesting side-effect, the main goal of AsTable is to allow returning multiple columns from a single "function". Probably worth mentioning? For example it's useful with extrema to compute the minimum and the maximum at the same time.


@rtransform df @astable begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end

### Examples

```
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> d = @rtransform df @astable begin
:x = 1
y = 5
:z = :x + y
end
3×4 DataFrame
Row │ a b x z
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 4 1 6
2 │ 2 5 1 6
3 │ 3 6 1 6

julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);

julia> @by df :a @astable begin
$(DOLLAR)"Mean of b" = mean(:b)
$(DOLLAR)"Standard deviation of b" = std(:b)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example can be achieved without @astable, right? Maybe do m = mean(:b); std(:b, mean=m) to illustrate the power of this function? Or, simpler, call extrema(:b) to create two columns.

Also, I wouldn't use long column names with spaces in them: better illustrate a single feature at a time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great. changed.

2×3 DataFrame
Row │ a Mean of b Standard deviation of b
│ Int64 Float64 Float64
─────┼───────────────────────────────────────────
1 │ 1 5.5 0.707107
2 │ 2 75.0 7.07107
```

"""
macro astable(args...)
throw(ArgumentError("@astable only works inside DataFramesMeta macros."))
end

##############################################################################
##
## @with
Expand Down Expand Up @@ -1546,17 +1639,6 @@ function combine_helper(x, args...; deprecation_warning = false)

exprs, outer_flags = create_args_vector(args...)

fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

quote
Expand Down Expand Up @@ -1666,16 +1748,6 @@ end
function by_helper(x, what, args...)
# Only allow one argument when returning a Table object
exprs, outer_flags = create_args_vector(args...)
fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

Expand Down
13 changes: 10 additions & 3 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ is_macro_head(ex::Expr, name) = ex.head == :macrocall && ex.args[1] == Symbol(na

const BYROW_SYM = Symbol("@byrow")
const PASSMISSING_SYM = Symbol("@passmissing")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false))
const ASTABLE_SYM = Symbol("@astable")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false), ASTABLE_SYM => Ref(false))

extract_macro_flags(ex, exprflags = deepcopy(DEFAULT_FLAGS)) = (ex, exprflags)
function extract_macro_flags(ex::Expr, exprflags = deepcopy(DEFAULT_FLAGS))
Expand Down Expand Up @@ -269,7 +270,13 @@ function fun_to_vec(ex::Expr;
return ex_col
end

if no_dest
if final_flags[ASTABLE_SYM][]
src, fun = get_source_fun_astable(ex; exprflags = final_flags)

return :($src => $fun => AsTable)
end

if no_dest # subet and with
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
src, fun = get_source_fun(ex, exprflags = final_flags)
return quote
$src => $fun
Expand Down Expand Up @@ -359,7 +366,7 @@ function create_args_vector(arg; wrap_byrow::Bool=false)
outer_flags[BYROW_SYM][] = true
end

if arg isa Expr && arg.head == :block
if arg isa Expr && arg.head == :block && !outer_flags[ASTABLE_SYM][]
x = MacroTools.rmlines(arg).args
else
x = Any[arg]
Expand Down
95 changes: 95 additions & 0 deletions src/parsing_astable.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
function conditionally_add_symbols!(inputs_to_function, lhs_assignments, col)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
# if it's already been assigned at top-level,
# don't add it to the inputs
if haskey(lhs_assignments, col)
return lhs_assignments[col]
else
return addkey!(inputs_to_function, col)
end
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
end

replace_syms_astable!(inputs_to_function, lhs_assignments, x) = x
replace_syms_astable!(inputs_to_function, lhs_assignments, q::QuoteNode) =
conditionally_add_symbols!(inputs_to_function, lhs_assignments, q)

function replace_syms_astable!(inputs_to_function, lhs_assignments, e::Expr)
if onearg(e, :^)
return e.args[2]
end

col = get_column_expr(e)
if col !== nothing
return conditionally_add_symbols!(inputs_to_function, lhs_assignments, col)
elseif e.head == :.
return replace_dotted_astable!(inputs_to_function, lhs_assignments, e)
else
return mapexpr(x -> replace_syms_astable!(inputs_to_function, lhs_assignments, x), e)
end
end

protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e) = e
protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e::Expr) =
replace_syms!(inputs_to_function, lhs_assignments, e)

function replace_dotted_astable!(inputs_to_function, lhs_assignments, e)
x_new = replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[1])
y_new = protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[2])
Expr(:., x_new, y_new)
end

is_column_assigment(ex) = false
function is_column_assigment(ex::Expr)
ex.head == :(=) && (get_column_expr(ex.args[1]) !== nothing)
end

# Taken from MacroTools.jl
# No docstring so assumed untable
block(ex) = isexpr(ex, :block) ? ex : :($ex;)

function get_source_fun_astable(ex; exprflags = deepcopy(DEFAULT_FLAGS))
inputs_to_function = Dict{Any, Symbol}()
lhs_assignments = OrderedCollections.OrderedDict{Any, Symbol}()

# Make sure all top-level assignments are
# in the args vector
ex = block(MacroTools.flatten(ex))
exprs = map(ex.args) do arg
if is_column_assigment(arg)
lhs = get_column_expr(arg.args[1])
rhs = arg.args[2]
new_ex = replace_syms_astable!(inputs_to_function, lhs_assignments, arg.args[2])
if haskey(inputs_to_function, lhs)
new_lhs = inputs_to_function[lhs]
lhs_assignments[lhs] = new_lhs
else
new_lhs = addkey!(lhs_assignments, lhs)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
end

Expr(:(=), new_lhs, new_ex)
else
replace_syms_astable!(inputs_to_function, lhs_assignments, arg)
end
end
source = :(DataFramesMeta.make_source_concrete($(Expr(:vect, keys(inputs_to_function)...))))

inputargs = Expr(:tuple, values(inputs_to_function)...)
nt_iterator = (:(Symbol($k) => $v) for (k, v) in lhs_assignments)
nt_expr = Expr(:tuple, Expr(:parameters, nt_iterator...))
body = Expr(:block, Expr(:block, exprs...), nt_expr)

fun = quote
$inputargs -> begin
$body
end
end

# TODO: Add passmissing support by
# checking if any input arguments missing,
# and if-so, making a named tuple with
# missing values
if exprflags[BYROW_SYM][]
fun = :(ByRow($fun))
end

return source, fun
end
Loading