consider making `guess_numerical_mark()` work with a vector rather than a file path #13

cjyetman · 2022-11-26T13:01:57Z

guess_numerical_mark() currently takes a filepath/s to a CSV and must read in the file and find the appropriate column/variable before achieving its primary function of guessing the numerical marks within a set of character strings that represent numbers. It could instead accept any character vector to process and leave the file read in and column selection to whatever calls it. The potential downside is that the calling function would then need to read in the data in a specific way (as a character vector with no manipulation) and find the column to be processed by guess_numerical_mark(), which may not be something that needs to happen otherwise.

Not 100% convinced this is the right thing to do.

related #7

The text was updated successfully, but these errors were encountered:

jdhoffa · 2022-11-26T14:43:11Z

One option could be to refactor out a portion that works with a vector, and then wrap it in a function that also does the reading:

guess_numerical_mark()
read_and_guess_numberical_mark()

or something like that. Have your cake and eat it too.

cjyetman · 2024-02-07T11:57:39Z

Reviewing this, the existing paradigm in this repo is that the functions work on files. That makes me hesitant to change the functionality of guess_numerical_mark() to accept only a vector. That being said, conceptually this paradigm may not have been ideal, and I can see value in having very specific purpose utility functions that do very specific things on very specific inputs, possibly using a very specific naming scheme, and utilizing those utility functions in other higher level functions that combine functionality of the various utility functions.

For the time being, I will let this brew a while before deciding if this would be worth the effort.

edit: For clarification, I fully grasp that this could be achieved as @jdhoffa has suggested, I'm primarily concerned about the naming of the functions and what that would imply should be done to many of the other functions in this repo in order to retain consistency.

jdhoffa · 2024-02-07T12:06:43Z

For the record, I am absolutely in favor of careful consideration of the names and purposes of the functions!

The proposed solution was half an idea, but I think it's important to consider these things (especially names lol) as they are often trapdoor decisions that have long-term and potentially unexpected consequences (see the whole r2dii suite for example)

AlexAxthelm · 2024-02-08T11:29:02Z

I'm generally a fan of having a call stack along the lines of

foo <- function(filepaths) {
  # some checks on the filepaths
  vapply(
    x = filepaths,
    FUN = file_foo
  )
}

file_foo <- function(single_filepath) {
  stopifnot(length(single_filepath) == 1) # but with more informative message
  contents <- readLines(single_filepath, n = 1L)
  val <- string_foo(contents)
  return(val)
}

string_foo <- function(string) {
  grepl(pattern = "foo", x = string)
}

so in this case, the functions might be named guess_numerical_mark, file_guess_numerical_mark, and string_guess_numerical_mark (suffixing the input type would also work, i.e. guess_numerical_mark_string())

cjyetman · 2024-02-08T11:53:30Z

Yeah, totally agree (retrospectively). My only concern is how much effort it will be to apply that universally and consistently.

jdhoffa · 2024-02-08T15:41:36Z

Just some more thoughts on this:

I feel like whatever the outcome here, we should always lean towards fewer exported functions a package to limit maintenance burden
In that case, I would imagine that the "most useful" exported function would return a data-frame?
But internally, I think separating the function in this way would be clear, as you outline here, would make sense

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider making `guess_numerical_mark()` work with a vector rather than a file path #13

consider making `guess_numerical_mark()` work with a vector rather than a file path #13

cjyetman commented Nov 26, 2022 •

edited

Loading

jdhoffa commented Nov 26, 2022

cjyetman commented Feb 7, 2024 •

edited

Loading

jdhoffa commented Feb 7, 2024

AlexAxthelm commented Feb 8, 2024

cjyetman commented Feb 8, 2024

jdhoffa commented Feb 8, 2024

consider making guess_numerical_mark() work with a vector rather than a file path #13

consider making guess_numerical_mark() work with a vector rather than a file path #13

Comments

cjyetman commented Nov 26, 2022 • edited Loading

jdhoffa commented Nov 26, 2022

cjyetman commented Feb 7, 2024 • edited Loading

jdhoffa commented Feb 7, 2024

AlexAxthelm commented Feb 8, 2024

cjyetman commented Feb 8, 2024

jdhoffa commented Feb 8, 2024

consider making `guess_numerical_mark()` work with a vector rather than a file path #13

consider making `guess_numerical_mark()` work with a vector rather than a file path #13

cjyetman commented Nov 26, 2022 •

edited

Loading

cjyetman commented Feb 7, 2024 •

edited

Loading