Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recapFactorFlip equivalent #390

Open
Wunsei opened this issue May 1, 2022 · 2 comments
Open

recapFactorFlip equivalent #390

Wunsei opened this issue May 1, 2022 · 2 comments

Comments

@Wunsei
Copy link

Wunsei commented May 1, 2022

redcapAPI has the helpful redcapFactorFlip function. Does REDCapR have an equivalent? If no, how might I replicate the same thing?

@wibeasley
Copy link
Member

yeah, @nutterb has done some cool things in redcapAPI.

Here's the code for his function: https://github.com/nutterb/redcapAPI/blob/master/R/redcapFactorFlip.R. I haven't studied it too closely, but it looks like he gets the raw values & labels and makes them factors. Here are the parts that strike me as most relevant: https://github.com/nutterb/redcapAPI/search?q=redcapLabels

Regarding how to replicate it in REDCapR, I'm guess it would start with REDCapR::redcap_metadata_read() --run that example from your own machine to see get a feel for the returned dataset. The selection_choice_or_calculations column exposes values like "0, Female | 1, Male" for gender and "0, Unknown / Not Reported | 1, NOT Hispanic or Latino | 2, Hispanic or Latino" for ethnicity. That could be split at each pipe (ie, |), and a regex could pull out the number and the value.

Here's a proof of concept that I haven't tested. Suppose the relevant parts of the metadata dataset are:

ds <- 
  tibble::tribble(
    ~field_name, ~select_choices_or_calculations,
    "record_id", NA_character_,
    "age"      , NA_character_,
    "gender"   , "0, Female | 1, Male",
    "race"     , "1, American Indian/Alaska Native | 2, Asian | 3, Native Hawaiian or Other Pacific Islander | 4, Black or African American | 5, White | 6, Unknown / Not Reported",
    "ethnicity", "0, Unknown / Not Reported | 1, NOT Hispanic or Latino | 2, Hispanic or Latino" 
)

Something like this would extract each level. Notice rematch2, which is one of my favorite & underappreciated packages, does the hard work.

pattern <- "^(?<level>\\d+),(?<label>.+)$"
ds |> 
  dplyr::select(
    field     = field_name,
    choice    = select_choices_or_calculations
  ) |> 
  tidyr::drop_na(choice) |> 
  tidyr::separate_rows(
    choice, 
    sep     = " \\| "
  ) |> 
  rematch2::bind_re_match(
    choice, 
    pattern
  )

Here's resulting dataset, which (I think) can be piped into a purrr function to apply the levels & labels to each factor variable.

       field                                       choice level                                      label
1     gender                                    0, Female     0                                     Female
2     gender                                      1, Male     1                                       Male
3       race             1, American Indian/Alaska Native     1              American Indian/Alaska Native
4       race                                     2, Asian     2                                      Asian
5       race 3, Native Hawaiian or Other Pacific Islander     3  Native Hawaiian or Other Pacific Islander
6       race                 4, Black or African American     4                  Black or African American
7       race                                     5, White     5                                      White
8       race                    6, Unknown / Not Reported     6                     Unknown / Not Reported
9  ethnicity                    0, Unknown / Not Reported     0                     Unknown / Not Reported
10 ethnicity                    1, NOT Hispanic or Latino     1                     NOT Hispanic or Latino
11 ethnicity                        2, Hispanic or Latino     2                         Hispanic or Latino

Does that help? If there's enough interest, I'll put it into the package. I'd love fed back from anyone who is interested in this potential. @pbchase, you usually have an opinion?

@skadauke
Copy link

I also think being able to move between raw data and labels would be useful. We built the parse_labels function that takes a string from selection_choice_or_calculations and returns a tibble:

parse_labels <- function(string){
  out <- string %>%
    strsplit(" \\| |, ") %>% # split either by ' | ' or ', '
    unlist() %>%
    matrix(
      ncol = 2,
      byrow = TRUE,
      dimnames = list(
        c(),               # row names
        c("raw", "label")) # column names
    ) %>%
    dplyr::as_tibble()
  
  out
}

string <- "1, American Indian/Alaska Native | 2, Asian | 3, Native Hawaiian or Other Pacific Islander | 4, Black or African American | 5, White | 6, Unknown / Not Reported"

parse_labels(string)
#> # A tibble: 6 × 2
#>   raw   label                                    
#>   <chr> <chr>                                    
#> 1 1     American Indian/Alaska Native            
#> 2 2     Asian                                    
#> 3 3     Native Hawaiian or Other Pacific Islander
#> 4 4     Black or African American                
#> 5 5     White                                    
#> 6 6     Unknown / Not Reported 

This function should work well in a purrr context.

One improvement I would suggest to your above code is not to filter on choice == NA but on categorical field types (dropdown, radio, checkbox, +/- yesno, +/- truefalse) because that field is also used to report calculations for calculated field and I'm assuming you don't want those to go into your result set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants