back to regexes in `checkbox_choices()` #504

wibeasley · 2023-07-15T20:00:11Z

@BlairCooper, maybe a hybrid of the two approaches should replace #500. (These attempts are still in dev branches --nothing has been pulled into the main branch yet.)

Maybe we still use a two-stage approach, where sptrsplit() uses pipes to separate the choices/rows. Then use our own regex to separate id from label.

I gave up on readr:read_csv() for this purpose because I couldn't get it working if the label contained commas. I'd have to preprocess the entries to enclose the label in quotes for read_csv() to operate as intended --but we'd need a regex to do that. So we might as well use a single regex to separate them into columns and avoid read_csv(). Tell me if anyone disagrees or sees something I'm overlooking.

@BlairCooper, here's the heart of the function's current version (still on a dev branch). I tried to avoid using base::trimws() and had a working regex (ie, ^\\s*+(.+?),\\s*+(.*?)\\s*$). But I couldn't figure out how to drop empty rows, like your scenario in #502.

You mentioned that this approach is 0.5sec vs 0.3sec. I think I'm ok with that because moving data across the network always dwarfs the computation time on the client machine.

  pattern <- "^(.+?),\\s*+(.*)$"

  select_choices %>%
    strsplit(split = "|", fixed = TRUE) %>%
    magrittr::extract2(1) %>%
    base::trimws() %>%
    tibble::as_tibble() %>% # default column name is `value`
    dplyr::filter(.data$value != "") %>%
    dplyr::transmute(
      id    = sub(pattern, "\\1", .data$value, perl = TRUE),
      label = sub(pattern, "\\2", .data$value, perl = TRUE),
    )

Are there any scenarios this approach can't handle correctly? The only outstanding case is if the label has pipes (#503). In theory the strsplit() could use a regex to distinguish pipes between choices versus pipes within labels. I haven't figured out how though, because the id is so flexible (eg, letter(s) & number(s)).

@BlairCooper or anyone else, any thoughts on performance, accuracy, or maintainability?

The text was updated successfully, but these errors were encountered:

ref #504

BlairCooper · 2023-07-15T20:43:12Z

Looks like this works for all the scenarios except pipes in the label. For that to work it seems like you'd need to first split on the pipe, then check if each entry looks like a value,label pair (without any trimming). If it doesn't look like a pair its part of the label for the preceding option so add it back on.

Of course, throw in commas and pipes into a label and there's no hope.

@BlairCooper

ref #504 @BlairCooper wrote https://github.com/OuhscBbmc/REDCapR/pull/502/files

wibeasley self-assigned this Jul 15, 2023

wibeasley added a commit that referenced this issue Jul 15, 2023

new test: labels contain commas

86fe7a2

ref #504

wibeasley added a commit that referenced this issue Jul 15, 2023

back to regexes

16fbe86

ref #504

wibeasley mentioned this issue Jul 15, 2023

use readr::read_csv() in checkbox_choices() #500

Closed

wibeasley added a commit that referenced this issue Jul 15, 2023

incorporate another test scenario from @BlairCooper

5f75564

ref #504 @BlairCooper wrote https://github.com/OuhscBbmc/REDCapR/pull/502/files

This was referenced Jul 15, 2023

Handle trailing newline in REDCap checkbox options #502

Closed

Trailing newline in Checkbox option causes choice to omitted #501

Closed

Checkbox choices 2 #505

Merged

wibeasley closed this as completed in #505 Jul 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

back to regexes in `checkbox_choices()` #504

back to regexes in `checkbox_choices()` #504

wibeasley commented Jul 15, 2023

BlairCooper commented Jul 15, 2023

back to regexes in checkbox_choices() #504

back to regexes in checkbox_choices() #504

Comments

wibeasley commented Jul 15, 2023

BlairCooper commented Jul 15, 2023

back to regexes in `checkbox_choices()` #504

back to regexes in `checkbox_choices()` #504