-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data table functions #1158
base: main
Are you sure you want to change the base?
Data table functions #1158
Conversation
1. fast_group_by 2. data_table_bind
Codecov Report
@@ Coverage Diff @@
## master #1158 +/- ##
==========================================
- Coverage 95.00% 94.40% -0.60%
==========================================
Files 11 11
Lines 1421 1430 +9
==========================================
Hits 1350 1350
- Misses 71 80 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style changes only. Thanks @kanishkan91 !
I wonder if we should look for opportunities to use this throughout the codebase--for example, in the current slowest chunks. Thoughts @pralitp ?
fast_group_by<- function(df,by,colname="value",func= "sum"){ | ||
|
||
|
||
#Convert relevant column to numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with rest of codebase, please add a space after all these #
s
df<- df[, (colname) := (get(func)(get(colname))), by] | ||
|
||
#Save back to tibble | ||
df<- as_tibble(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make line 537 the last one of the function: as_tibble(df)
df <- rbindlist(list_for_bind,use.names=TRUE) | ||
|
||
#Return as tibble | ||
df<-as_tibble(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
list_for_bind =list(...) | ||
|
||
#bind into one dataframe using rbindlist | ||
df <- rbindlist(list_for_bind,use.names=TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df <- rbindlist(list_for_bind,use.names=TRUE) | |
df <- rbindlist(list_for_bind, use.names = TRUE) |
#' @importFrom dplyr %>% | ||
#' @author kbn 24 Mar 2020 | ||
#' @export | ||
fast_group_by<- function(df,by,colname="value",func= "sum"){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast_group_by<- function(df,by,colname="value",func= "sum"){ | |
fast_group_by <- function(df, by, colname = "value", func = "sum"){ |
Adding 2 functions with documentation,
fast_group_by- A faster alternative to the traditional dplyr alternative. It makes use of data.table. It groups data, performs a function, ungroups. Essentially performs a group_by, mutate and ungroup. It can be used within dplyr pipes. Speed increases exponentially with the increase in the volume of underlying data.
data_table_bind- A faster alternative to bind_rows that takes advantage of data.table's data processing capabilities. Returns a tibble after binding all input datasets.