-
Notifications
You must be signed in to change notification settings - Fork 26
Name That Function
When you hit this function, do this...
Use gather
and spread
respectively, BUT note that many data reshapes are unnecessary in the new data system.
Use group_by
and summarise
It depends. For matching GCAM region IDs, and any other operation where you want an error thrown if there's no match, use left_join_error_no_match
.
Use left_join
if it's OK for NA
values to appear after the join (i.e., you know that everything might not match).
Use inner_join
if you want only the rows that are common to the two data sets (i.e., rows that appear in one data set but not the other will be dropped).
Use semi_join
if you want to filter a data set to the rows that have matches in another data set, but you don't actually want to add the data from the other data set. You can think of this as a generalization of the %in%
operator. A row is "in" the other data frame if it has a match.
Other cases?
Careful here.
The new function is called repeat_add_columns
and can operate in a pipeline, e.g. x %>% repeat_add_vector(y)
.
This can be replaced by a call to tidyr::complete
. Here's code sample, in which every combination of region & commodity will be included, with missing values assigned to 0:
DATA_FRAME %>%
complete(GCAM_region_ID = unique(iso_GCAM_regID$GCAM_region_ID),
GCAM_commodity = unique(FAO_ag_items_cal_SUA$GCAM_commodity),
fill = list(value = 0))
(Note that instead of writing tidyr::complete
you can also add @importFrom tidyr complete
to the function's header, and then just use complete
.)
This should never be necessary. Apart from the fact that the collapse
argument to paste
does this for you, vecpaste
in the current code base is almost invariably used in conjunction with match
to find corresponding rows in two data frames. Use one of the join
functions above instead. Example from LA100.0_LDS_preprocessing
:
# This used to be a complicated vecpaste call
L100.LDS_ag_HA_ha %>%
semi_join(L100.LDS_ag_prod_t, by = c("iso", aglu.GLU, "GTAP_crop")) ->
L100.LDS_ag_HA_ha