Skip to content

Name That Function

Ben Bond-Lamberty edited this page Mar 31, 2017 · 29 revisions

When you hit this function, do this...

Built-in (R and packages) functions

melt and dcast

Use gather and spread respectively, BUT note that many data reshapes are unnecessary in the new data system.

aggregate

Use group_by and summarise

match

It depends. For matching GCAM region IDs, and any other operation where you want an error thrown if there's no match, use left_join_error_no_match.

Use left_join if it's OK for NA values to appear after the join (i.e., you know that everything might not match).

Use inner_join if you want only the rows that are common to the two data sets (i.e., rows that appear in one data set but not the other will be dropped).

Use semi_join if you want to filter a data set to the rows that have matches in another data set, but you don't actually want to add the data from the other data set. You can think of this as a generalization of the %in% operator. A row is "in" the other data frame if it has a match.

Other cases?

merge

Careful here.

Data system-specific functions

interpolate_and_melt

repeat_and_add_vector

The new function is called repeat_add_columns and can operate in a pipeline, e.g. x %>% repeat_add_vector(y).

translate_to_full_table

This can be replaced by a call to tidyr::complete. Here's code sample, in which every combination of region & commodity will be included, with missing values assigned to 0:

DATA_FRAME %>% 
  complete(GCAM_region_ID = unique(iso_GCAM_regID$GCAM_region_ID),                                             
           GCAM_commodity = unique(FAO_ag_items_cal_SUA$GCAM_commodity),
           fill = list(value = 0)) 

(Note that instead of writing tidyr::complete you can also add @importFrom tidyr complete to the function's header, and then just use complete.)

vecpaste

This should never be necessary. Apart from the fact that the collapse argument to paste does this for you, vecpaste in the current code base is almost invariably used in conjunction with match to find corresponding rows in two data frames. Use one of the join functions above instead. Example from LA100.0_LDS_preprocessing:

# This used to be a complicated vecpaste call
L100.LDS_ag_HA_ha %>%
  semi_join(L100.LDS_ag_prod_t, by = c("iso", aglu.GLU, "GTAP_crop")) ->
  L100.LDS_ag_HA_ha
Clone this wiki locally