-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data table functions #1158
base: main
Are you sure you want to change the base?
Data table functions #1158
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -494,3 +494,81 @@ screen_forbidden <- function(fn) { | |||||
} | ||||||
rslt | ||||||
} | ||||||
|
||||||
#' fast_group_by | ||||||
#' | ||||||
#' A version of group_by that uses data.table instead of dplyr. Creates groups, runs a user specified function, ungroups and returns | ||||||
#' a processed tibble. Please use this function for grouping only numeric data. | ||||||
#' | ||||||
#' This group_by function that uses data.table offers a much higher speed | ||||||
#' especially when working with high volume datasets. This function can also be | ||||||
#' called within dplyr pipes. data.table will also inherently ensure consistency | ||||||
#' between LHS and RHS. The function will perform a combination of a group_by , mutate and ungroup. | ||||||
#' | ||||||
#' Example- | ||||||
#' | ||||||
#' A group_by with dplyr - grouped_data <- data -> group_by(iso,year,glu_code) -> mutate(value=sum(value)) -> ungroup() | ||||||
#' | ||||||
#' Same group_by with data.table - grouped_data <- fast_group_by(data, by=c("iso","year","glu_code"),colname = "value", func = "sum" ) | ||||||
#' | ||||||
#' @param df The tibble on which the group_by is to be performed | ||||||
#' @param by A vector of strings with the criteria for the group_by. | ||||||
#' @param colname A string with the column name on which the grouping is to be performed | ||||||
#' @param func A string with the function to be performed. Default is set to "sum" | ||||||
#' @return A tibble with the aggregated data. | ||||||
#' @importFrom data.table as.data.table | ||||||
#' @importFrom tibble as_tibble | ||||||
#' @importFrom dplyr %>% | ||||||
#' @author kbn 24 Mar 2020 | ||||||
#' @export | ||||||
fast_group_by<- function(df,by,colname="value",func= "sum"){ | ||||||
|
||||||
|
||||||
#Convert relevant column to numeric | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For consistency with rest of codebase, please add a space after all these |
||||||
df[,colname]<- as.numeric(df[,colname]) | ||||||
|
||||||
#Store as data.table | ||||||
df <- as.data.table(df) | ||||||
|
||||||
#Complete operations | ||||||
df<- df[, (colname) := (get(func)(get(colname))), by] | ||||||
|
||||||
#Save back to tibble | ||||||
df<- as_tibble(df) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would just make line 537 the last one of the function: |
||||||
|
||||||
return(df) | ||||||
} | ||||||
|
||||||
#' data_table_bind | ||||||
#' | ||||||
#' A binding function that uses data.table. This can be used as a replacement for rbind or bind_rows. | ||||||
#' | ||||||
#' This binding function takes advantage of the data processing capabilities of data.table. This can be | ||||||
#' called within dplyr pipes. | ||||||
#' | ||||||
#'Example- | ||||||
#' | ||||||
#'Bind 2 datasets (x,y) with same columns using the following, | ||||||
#' | ||||||
#'Bound_dataset<- data_table_bind(x,y) | ||||||
#' | ||||||
#' @param ... The tibbles to be merged. | ||||||
#' @importFrom data.table as.data.table rbindlist | ||||||
#' @importFrom tibble as_tibble | ||||||
#' @return A tibble with combined data. | ||||||
#' @author kbn 24 Mar 2020 | ||||||
#' @export | ||||||
data_table_bind<-function(...){ | ||||||
|
||||||
#Create a list for binding | ||||||
list_for_bind =list(...) | ||||||
|
||||||
#bind into one dataframe using rbindlist | ||||||
df <- rbindlist(list_for_bind,use.names=TRUE) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
#Return as tibble | ||||||
df<-as_tibble(df) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto |
||||||
|
||||||
return(df) | ||||||
|
||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.