-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updating code base to reflect changes to private model repo (#107)
v1.7.0 - Data file names now mirror the script names that created the files - Features on food inspections are now calculated separately - Features on business inspections are now calculated separately - The model code merges in the features, does not calculate features - Added script to adjust the public sanitarian data to match the schema of the private sanitarian file - More aggressive filtering functions - Separates out the violation matrix calculation into the parsing step and classification step (which, as it turns out will be useful for the new inspection format) - Refactoring model result / evaluation steps to accommodate future analysis * adding prefix number to code and data, closes #100 * syncing and updating startup script, closes #101 * split violation matrix calculation into two steps, closes #102 * updated help example to remove unused variable * adding nokey function, needed for new violation matrix calculation * guard against too few categories in GenerateOtherLicenseInfo, closes 103 * updating filter functions to match model * starting work described in #104 to split feature creation * refactoring code for model compatibility * simplifying initialization
- Loading branch information
Showing
47 changed files
with
711 additions
and
670 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,55 @@ | ||
## INSTALL THESE DEPENDENCIES | ||
install.packages("devtools", | ||
dependencies = TRUE, | ||
repos='http://cran.us.r-project.org') | ||
install.packages("Rcpp", | ||
dependencies = TRUE, | ||
repos='http://cran.us.r-project.org') | ||
|
||
## Update two packages not on CRAN using the devtools package. | ||
devtools::install_github(repo = 'geneorama/geneorama') | ||
devtools::install_github(repo = 'yihui/printr') | ||
##------------------------------------------------------------------------------ | ||
## INSTALL DEPENDENCIES IF MISSING | ||
##------------------------------------------------------------------------------ | ||
|
||
if(!"devtools" %in% rownames(installed.packages())){ | ||
install.packages("devtools", | ||
dependencies = TRUE, | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
if(!"Rcpp" %in% rownames(installed.packages())){ | ||
install.packages("Rcpp", | ||
dependencies = TRUE, | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
if(!"RSocrata" %in% rownames(installed.packages())){ | ||
install.packages("RSocrata", | ||
dependencies = TRUE, | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
if(!"data.table" %in% rownames(installed.packages())){ | ||
install.packages("data.table", | ||
dependencies = TRUE, | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
if(!"geneorama" %in% rownames(installed.packages())){ | ||
devtools::install_github('geneorama/geneorama') | ||
} | ||
|
||
if(!"printr" %in% rownames(installed.packages())){ | ||
devtools::install_github(repo = 'yihui/printr') | ||
} | ||
|
||
##------------------------------------------------------------------------------ | ||
## UPDATE DEPENDENCIES IF MISSING | ||
##------------------------------------------------------------------------------ | ||
|
||
## Update to RSocrata 1.7.2-2 (or later) | ||
## which is only on github as of March 8, 2016 | ||
devtools::install_github(repo = 'chicago/RSocrata') | ||
if(installed.packages()["RSocrata","Version"] < "1.7.2-2"){ | ||
install.packages("RSocrata", | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
## Needs recent version for foverlaps | ||
if(installed.packages()["data.table","Version"] < "1.10.0"){ | ||
install.packages("data.table", | ||
repos = "https://cloud.r-project.org/") | ||
} | ||
|
||
if(installed.packages()["geneorama","Version"] < "1.5.0"){ | ||
devtools::install_github('geneorama/geneorama') | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
##============================================================================== | ||
## INITIALIZE | ||
##============================================================================== | ||
## Remove all objects; perform garbage collection | ||
rm(list=ls()) | ||
gc(reset=TRUE) | ||
|
||
## Load libraries & project functions | ||
geneorama::loadinstall_libraries(c("data.table", "MASS")) | ||
geneorama::sourceDir("CODE/functions/") | ||
## Import shift function | ||
shift <- geneorama::shift | ||
|
||
##============================================================================== | ||
## LOAD CACHED RDS FILES | ||
##============================================================================== | ||
foodInspect <- readRDS("DATA/13_food_inspections.Rds") | ||
|
||
## Apply row filter to remove invalid data | ||
foodInspect <- filter_foodInspect(foodInspect) | ||
|
||
## Remove violations from food inspection, violations are caputured in the | ||
## violation matrix data | ||
foodInspect$Violations <- NULL | ||
|
||
## Import violation matrix which lists violations by categories: | ||
## Critical, serious, and minor violations | ||
violation_dat <- readRDS("DATA/21_food_inspection_violation_matrix.Rds") | ||
|
||
##============================================================================== | ||
## CALCULATE FEATURES | ||
##============================================================================== | ||
|
||
## Facility_Type_Clean: Anything that is not "restaurant" or "grocery" is "other" | ||
foodInspect[ , Facility_Type_Clean := | ||
categorize(x = Facility_Type, | ||
primary = list(Restaurant = "restaurant", | ||
Grocery_Store = "grocery"), | ||
ignore.case = TRUE)] | ||
## Join in the violation matrix | ||
foodInspect <- merge(x = foodInspect, | ||
y = violation_dat, | ||
by = "Inspection_ID") | ||
## Create pass / fail flags | ||
foodInspect[ , pass_flag := ifelse(Results=="Pass",1, 0)] | ||
foodInspect[ , fail_flag := ifelse(Results=="Fail",1, 0)] | ||
## Set key to ensure that records are treated CHRONOLOGICALLY... | ||
setkey(foodInspect, License, Inspection_Date) | ||
## Then find previous info by "shifting" the columns (grouped by License) | ||
foodInspect[ , pastFail := shift(fail_flag, -1, 0), by = License] | ||
foodInspect[ , pastCritical := shift(criticalCount, -1, 0), by = License] | ||
foodInspect[ , pastSerious := shift(seriousCount, -1, 0), by = License] | ||
foodInspect[ , pastMinor := shift(minorCount, -1, 0), by = License] | ||
|
||
## Calcualte time since last inspection. | ||
## If the time is NA, this means it's the first inspection; add an inicator | ||
## variable to indicate that it's the first inspection. | ||
foodInspect[i = TRUE , | ||
j = timeSinceLast := as.numeric( | ||
Inspection_Date - shift(Inspection_Date, -1, NA)) / 365, | ||
by = License] | ||
foodInspect[ , firstRecord := 0] | ||
foodInspect[is.na(timeSinceLast), firstRecord := 1] | ||
foodInspect[is.na(timeSinceLast), timeSinceLast := 2] | ||
foodInspect[ , timeSinceLast := pmin(timeSinceLast, 2)] | ||
|
||
##============================================================================== | ||
## SAVE RDS | ||
##============================================================================== | ||
setkey(foodInspect, Inspection_ID) | ||
saveRDS(foodInspect, file.path("DATA/23_food_insp_features.Rds")) | ||
|
||
|
||
|
Oops, something went wrong.