Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

back to regexes in checkbox_choices() #504

Closed
wibeasley opened this issue Jul 15, 2023 · 1 comment · Fixed by #505
Closed

back to regexes in checkbox_choices() #504

wibeasley opened this issue Jul 15, 2023 · 1 comment · Fixed by #505
Assignees

Comments

@wibeasley
Copy link
Member

@BlairCooper, maybe a hybrid of the two approaches should replace #500. (These attempts are still in dev branches --nothing has been pulled into the main branch yet.)

Maybe we still use a two-stage approach, where sptrsplit() uses pipes to separate the choices/rows. Then use our own regex to separate id from label.

I gave up on readr:read_csv() for this purpose because I couldn't get it working if the label contained commas. I'd have to preprocess the entries to enclose the label in quotes for read_csv() to operate as intended --but we'd need a regex to do that. So we might as well use a single regex to separate them into columns and avoid read_csv(). Tell me if anyone disagrees or sees something I'm overlooking.

@BlairCooper, here's the heart of the function's current version (still on a dev branch). I tried to avoid using base::trimws() and had a working regex (ie, ^\\s*+(.+?),\\s*+(.*?)\\s*$). But I couldn't figure out how to drop empty rows, like your scenario in #502.

You mentioned that this approach is 0.5sec vs 0.3sec. I think I'm ok with that because moving data across the network always dwarfs the computation time on the client machine.

  pattern <- "^(.+?),\\s*+(.*)$"

  select_choices %>%
    strsplit(split = "|", fixed = TRUE) %>%
    magrittr::extract2(1) %>%
    base::trimws() %>%
    tibble::as_tibble() %>% # default column name is `value`
    dplyr::filter(.data$value != "") %>%
    dplyr::transmute(
      id    = sub(pattern, "\\1", .data$value, perl = TRUE),
      label = sub(pattern, "\\2", .data$value, perl = TRUE),
    )

Are there any scenarios this approach can't handle correctly? The only outstanding case is if the label has pipes (#503). In theory the strsplit() could use a regex to distinguish pipes between choices versus pipes within labels. I haven't figured out how though, because the id is so flexible (eg, letter(s) & number(s)).

@BlairCooper or anyone else, any thoughts on performance, accuracy, or maintainability?

@wibeasley wibeasley self-assigned this Jul 15, 2023
wibeasley added a commit that referenced this issue Jul 15, 2023
wibeasley added a commit that referenced this issue Jul 15, 2023
@BlairCooper
Copy link
Contributor

Looks like this works for all the scenarios except pipes in the label. For that to work it seems like you'd need to first split on the pipe, then check if each entry looks like a value,label pair (without any trimming). If it doesn't look like a pair its part of the label for the preceding option so add it back on.

Of course, throw in commas and pipes into a label and there's no hope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants