Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data table functions #1158

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -494,3 +494,81 @@ screen_forbidden <- function(fn) {
}
rslt
}

#' fast_group_by
#'
#' A version of group_by that uses data.table instead of dplyr. Creates groups, runs a user specified function, ungroups and returns
#' a processed tibble. Please use this function for grouping only numeric data.
#'
#' This group_by function that uses data.table offers a much higher speed
#' especially when working with high volume datasets. This function can also be
#' called within dplyr pipes. data.table will also inherently ensure consistency
#' between LHS and RHS. The function will perform a combination of a group_by , mutate and ungroup.
#'
#' Example-
#'
#' A group_by with dplyr - grouped_data <- data -> group_by(iso,year,glu_code) -> mutate(value=sum(value)) -> ungroup()
#'
#' Same group_by with data.table - grouped_data <- fast_group_by(data, by=c("iso","year","glu_code"),colname = "value", func = "sum" )
#'
#' @param df The tibble on which the group_by is to be performed
#' @param by A vector of strings with the criteria for the group_by.
#' @param colname A string with the column name on which the grouping is to be performed
#' @param func A string with the function to be performed. Default is set to "sum"
#' @return A tibble with the aggregated data.
#' @importFrom data.table as.data.table
#' @importFrom tibble as_tibble
#' @importFrom dplyr %>%
#' @author kbn 24 Mar 2020
#' @export
fast_group_by<- function(df,by,colname="value",func= "sum"){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fast_group_by<- function(df,by,colname="value",func= "sum"){
fast_group_by <- function(df, by, colname = "value", func = "sum"){



#Convert relevant column to numeric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with rest of codebase, please add a space after all these #s

df[,colname]<- as.numeric(df[,colname])

#Store as data.table
df <- as.data.table(df)

#Complete operations
df<- df[, (colname) := (get(func)(get(colname))), by]

#Save back to tibble
df<- as_tibble(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make line 537 the last one of the function: as_tibble(df)


return(df)
}

#' data_table_bind
#'
#' A binding function that uses data.table. This can be used as a replacement for rbind or bind_rows.
#'
#' This binding function takes advantage of the data processing capabilities of data.table. This can be
#' called within dplyr pipes.
#'
#'Example-
#'
#'Bind 2 datasets (x,y) with same columns using the following,
#'
#'Bound_dataset<- data_table_bind(x,y)
#'
#' @param ... The tibbles to be merged.
#' @importFrom data.table as.data.table rbindlist
#' @importFrom tibble as_tibble
#' @return A tibble with combined data.
#' @author kbn 24 Mar 2020
#' @export
data_table_bind<-function(...){

#Create a list for binding
list_for_bind =list(...)

#bind into one dataframe using rbindlist
df <- rbindlist(list_for_bind,use.names=TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df <- rbindlist(list_for_bind,use.names=TRUE)
df <- rbindlist(list_for_bind, use.names = TRUE)


#Return as tibble
df<-as_tibble(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto


return(df)

}