New user feedback from an R veteran 

Following a discussion with @davidanthoff in the Julia Slack who referred me back to `Query.jl` in a discussion about some changes to `DataFramesMeta.jl` and I offered to give some new user feedback.

>#### Background
>
>I've got ~3 years of pretty intensive R developer experience. At my workplace, R is the preferred language, and in using it there I've developed a pretty deep understanding of the `tidyverse` set of packages and a lot of my software opinions stem from digging through the internals of those packages and all my feedback comes through a heavily `dplyr`-centric data handling worldview. I have about 3 months of Julia experience where I went on a `DataFramesMeta.jl` refactoring bender. I'm still learning every day about the Julia ecosystem, and there are certainly huge gaps in my ecosystem and technical knowledge. These are my initial impressions about `Query.jl`, and specifically the "standalone query commands."

### Query.jl Getting Started "Standalong Query Commands"

#### First glance syntax impressions

- I like the native `|>` usage as it makes me feel like I can interweave these macros with other packages or my own lambdas quite easily.
- The `_` feels weird to me. I'm aware of `Lazy.jl` and it's `@_` macro, but in the context of a DataFrame it feels clunky to preface column names with it. On the other hand, it's cool to have direct access to (I assume) the whole `DataFrame` (or `Row`?). _edit_: I realized later on that this isn't coming from `Lazy.jl`, but is reimplemented - I think to allow for `:__` usage. The duplication here gives me some mild code smell vibes. 
- I like the agnostic approach to data, but I'm skeptical that it can be both a rich syntax for operating on tabular data while also remaining agnostic. 
- Having to "collect" the query result back into a `DataFrame` at the tail end of a pipeline feels a bit weird. It would be nice if it defaulted to being  endomorphic when it prints to console, perhaps only trying to coerce the first n elements as to not fail on large data.

#### Operators

##### `@map`

- The `{ ... }` feels very uncomfortable to me. It somewhat erodes my trust in idiomatic Julia syntax. It took me quite a while to hunt down this expansion in `helper_namedtuples_replacement` and I _think_ I get what's going on now. As far as I can tell this is done to avoid dispatching on a multi-argument function call. My gut feeling is that there must be away to get around this. 
- The `@groupby(...) |> @map(...)` example had me confused for a bit since it uses `mean(_.b)` to somehow calculate a mean across multiple rows, yet I could only access elements rowwise otherwise. I'm still getting my bearings here, but I had to really reevaluate my assumptions to digest this one. 
- I haven't figured out how to do columnwise operations. For example, doing `(lag(a) .+ a .+ lead(a)) ./ 3` (a crude running average). The closest I've come is by "grouping" everything and doing grouped operations, though that results in a single row of arrays. 
    ```julia
    df |> @groupby(1) |> @map({a = _.b .* _.a})
    ```
- The macro expansion seems to only make an exception for anonymous functions (`expr.head == :->`), but doesn't accept unary function objects. This is really nice in situations when you have a complicated function that you don't want to write out inside a data processing step or want to reuse. 
    ```
    @map(df, x -> x)  # returns dataframe
    @map(df, identity)  # returns array of "identity"
    ```
- A bit pedantic, but the documentation for each verb says that each verb takes an anonymous function, but the expression is not (at least before macro expansion) a function. e.g. `_^2` is not a function. I know it gets expanded out to an anonymous function, but it might be nice to acknowledge that it also accepts these `_`-style lambdas.

##### `@filter`

- `@filter` feels quite intuitive. This is a place where the rowwise behavior really shines and the filtering operations read really nicely. 
- Just trying to push the limits here, if I wanted to filter on rows where any `Number` columns are >40
   ```julia
   df = DataFrame(name=["John", "Sally", "Kirk"], age=[23., 42., 59.], children=[3,5,2])
   df |> @filter(any(v > 40 for v=_ if typeof(v) <: Real)) |> DataFrame
   ```
   This feels a bit clunky, but it seems like it's definitely not the intended use case. It's good to know it's at least possible. 

##### `@groupby`

- I tried to learn more about the `element_selector` using `?@groupby` but documentation was minimal. 
- I errored when I passed it a lambda function as the `element_selector`, but I was able to evaluate an expression with a `_`. This functionality seems like it introduces a lot of complexity to "groupby" while being functionally equivalent to the more readable`@groupby(...) |> @map(...)`.

##### `@orderby_*` and `@thenby_*`

- These feel verbose, but also I really like how readable they are once composed
- My instinct form a `dplyr` perspective is that these must be able to be expressed more succinctly, but it certainly is clear.

##### `@groupjoin`, `@join`

- The join operations use of `__` feels like new syntax is getting a bit heavy
- Creating the new columns inside of a join also feels like the function scope is a bit too big. My preference would again be to break out a `map` call. 
- The `outer_selector` and `inner_selector` language is a bit confusing as "outer" and "inner" are typically used to describe the overlap of the join, not the source dataset. 

##### `@mapmany`

- I think I'd need more experience querying other data types to weigh on this one, but at first glance it seems useful for the `Dict` example. I'm sure with heavily nested data structures - perhaps something read out of a `.json` file or something like that - this would be really useful. 

##### `@take` and `@drop` 

- Love me some good functional basics. Glad to see these bases are covered for a lazy collection.

##### `@unique`

- Another staple, especially for filtering unique rows. 

##### `@select`

- This one feels very `dplyr`y to me (and that's a good thing - I think it has some fantastic `select` syntax). 
- I like the use of the `!` operator. I forgot Julia natively composes it with functions. _edit_ upon further investigation it looks like these are handled via macro expansion and converted into "not_*" versions of each function. I think I'd prefer to see it lean on Base Julia where possible. 
- The macro handling of specific functions by name ("startswith", "endswith", "occursin") always feels a bit clunky to me, and means that it's less composable with outside functions. 

##### `@rename`

- Reads very clearly. I love the `Pair` operator specifically for the rename function. I can never remember which thing gets renamed to what in the `dplyr` world (`rename(a = b)`) and this is just so clear.
- I don't think there's a way around it, but the symbol notation introduces another way to refer to columns. Between `_.a`, `__.a`, and now `:a`.

##### `@mutate`

- Now this looks familiar! For me the jury is still out on the rowwise operations. Rowwise is good 90% of the time, but those 10% can look _really_ nasty without some syntax to support it. Common tasks like renormalizing data to standard deviations around a mean value are very simple transformations conceptually that get really muddy without columnwise transforms. 

#### Impressions after toying around

- The lazy data processing is quite cool. I really like that the query result only evaluates a head to print.
    ```
    @time 1:1e8 |> @map(_ * 2)   # ~10x faster
    # vs
    @time 1:1e8 |> @map(_ * 2) |> collect
    ```

#### After digging through some source code

- The query expansion macros and helpers source code feel very overwhelming and difficult to contribute to. 
- Looking at `query_expression_translation_phase_4` specifically, the individual handling for specific macros by name makes me concerned that the package would be quite difficult to extend. In retrospect, I do recall looking into `Query.jl` when diving into Julia's macro system and I think the complexity here was bit daunting, leading me to look into `DataFramesMeta.jl` instead. I found its macros more approachable as a starting point for learning.
- Even if long at times, I appreciate the clarity of the internal function names. Even if the code is sometimes complex, it was usually interpretable because of how fluently the function names read. 


### Closing thoughts

Really cool package! The versatility to process data agnosticaly is really ambitious and it seems like you've brought it to a pretty polished state. `@select`, `@mutate` and `@filter` definitely have that "dplyr feel". 

Syntactically the `_` and `__` feel a bit weird, but that might just take some getting used to. I think some sort of syntax for accessing columns of data would be nice, but I can't imagine how that would look, or what would be the performance costs of doing so in a lazily evaluated query engine.

Most importantly, the package _feels_ nice to use. It feels pretty snappy. I'm always comforted knowing that data is being lazily evaluated and minimally computed to print only 10 rows out to console. It's nice knowing I'm not going to kill my session by accidentally trying to compute something on a tens of millions of records.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New user feedback from an R veteran #295

Background

Query.jl Getting Started "Standalong Query Commands"

First glance syntax impressions

Operators

`@map`

`@filter`

`@groupby`

`@orderby_` and `@thenby_`

`@groupjoin`, `@join`

`@mapmany`

`@take` and `@drop`

`@unique`

`@select`

`@rename`

`@mutate`

Impressions after toying around

After digging through some source code

Closing thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New user feedback from an R veteran #295

Description

Background

Query.jl Getting Started "Standalong Query Commands"

First glance syntax impressions

Operators

@map

@filter

@groupby

@orderby_* and @thenby_*

@groupjoin, @join

@mapmany

@take and @drop

@unique

@select

@rename

@mutate

Impressions after toying around

After digging through some source code

Closing thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`@map`

`@filter`

`@groupby`

`@orderby_` and `@thenby_`

`@groupjoin`, `@join`

`@mapmany`

`@take` and `@drop`

`@unique`

`@select`

`@rename`

`@mutate`