Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data table functions #1158

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Data table functions #1158

wants to merge 2 commits into from

Conversation

kanishkan91
Copy link
Contributor

Adding 2 functions with documentation,

  1. fast_group_by- A faster alternative to the traditional dplyr alternative. It makes use of data.table. It groups data, performs a function, ungroups. Essentially performs a group_by, mutate and ungroup. It can be used within dplyr pipes. Speed increases exponentially with the increase in the volume of underlying data.

  2. data_table_bind- A faster alternative to bind_rows that takes advantage of data.table's data processing capabilities. Returns a tibble after binding all input datasets.

@codecov
Copy link

codecov bot commented Mar 26, 2020

Codecov Report

Merging #1158 into master will decrease coverage by 0.59%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1158      +/-   ##
==========================================
- Coverage   95.00%   94.40%   -0.60%     
==========================================
  Files          11       11              
  Lines        1421     1430       +9     
==========================================
  Hits         1350     1350              
- Misses         71       80       +9     
Impacted Files Coverage Δ
R/utils.R 94.82% <0.00%> (-3.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8156b3...705e40d. Read the comment docs.

Copy link
Member

@bpbond bpbond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style changes only. Thanks @kanishkan91 !

I wonder if we should look for opportunities to use this throughout the codebase--for example, in the current slowest chunks. Thoughts @pralitp ?

fast_group_by<- function(df,by,colname="value",func= "sum"){


#Convert relevant column to numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with rest of codebase, please add a space after all these #s

df<- df[, (colname) := (get(func)(get(colname))), by]

#Save back to tibble
df<- as_tibble(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make line 537 the last one of the function: as_tibble(df)

df <- rbindlist(list_for_bind,use.names=TRUE)

#Return as tibble
df<-as_tibble(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

list_for_bind =list(...)

#bind into one dataframe using rbindlist
df <- rbindlist(list_for_bind,use.names=TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df <- rbindlist(list_for_bind,use.names=TRUE)
df <- rbindlist(list_for_bind, use.names = TRUE)

#' @importFrom dplyr %>%
#' @author kbn 24 Mar 2020
#' @export
fast_group_by<- function(df,by,colname="value",func= "sum"){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fast_group_by<- function(df,by,colname="value",func= "sum"){
fast_group_by <- function(df, by, colname = "value", func = "sum"){

Base automatically changed from master to main January 19, 2021 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants