Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider making guess_numerical_mark() work with a vector rather than a file path #13

Open
cjyetman opened this issue Nov 26, 2022 · 6 comments

Comments

@cjyetman
Copy link
Member

cjyetman commented Nov 26, 2022

guess_numerical_mark() currently takes a filepath/s to a CSV and must read in the file and find the appropriate column/variable before achieving its primary function of guessing the numerical marks within a set of character strings that represent numbers. It could instead accept any character vector to process and leave the file read in and column selection to whatever calls it. The potential downside is that the calling function would then need to read in the data in a specific way (as a character vector with no manipulation) and find the column to be processed by guess_numerical_mark(), which may not be something that needs to happen otherwise.

Not 100% convinced this is the right thing to do.

related #7

@jdhoffa
Copy link
Member

jdhoffa commented Nov 26, 2022

One option could be to refactor out a portion that works with a vector, and then wrap it in a function that also does the reading:

  • guess_numerical_mark()
  • read_and_guess_numberical_mark()

or something like that. Have your cake and eat it too.

@cjyetman
Copy link
Member Author

cjyetman commented Feb 7, 2024

Reviewing this, the existing paradigm in this repo is that the functions work on files. That makes me hesitant to change the functionality of guess_numerical_mark() to accept only a vector. That being said, conceptually this paradigm may not have been ideal, and I can see value in having very specific purpose utility functions that do very specific things on very specific inputs, possibly using a very specific naming scheme, and utilizing those utility functions in other higher level functions that combine functionality of the various utility functions.

For the time being, I will let this brew a while before deciding if this would be worth the effort.

edit: For clarification, I fully grasp that this could be achieved as @jdhoffa has suggested, I'm primarily concerned about the naming of the functions and what that would imply should be done to many of the other functions in this repo in order to retain consistency.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 7, 2024

For the record, I am absolutely in favor of careful consideration of the names and purposes of the functions!

The proposed solution was half an idea, but I think it's important to consider these things (especially names lol) as they are often trapdoor decisions that have long-term and potentially unexpected consequences (see the whole r2dii suite for example)

@AlexAxthelm
Copy link
Collaborator

I'm generally a fan of having a call stack along the lines of

foo <- function(filepaths) {
  # some checks on the filepaths
  vapply(
    x = filepaths,
    FUN = file_foo
  )
}

file_foo <- function(single_filepath) {
  stopifnot(length(single_filepath) == 1) # but with more informative message
  contents <- readLines(single_filepath, n = 1L)
  val <- string_foo(contents)
  return(val)
}

string_foo <- function(string) {
  grepl(pattern = "foo", x = string)
}

so in this case, the functions might be named guess_numerical_mark, file_guess_numerical_mark, and string_guess_numerical_mark (suffixing the input type would also work, i.e. guess_numerical_mark_string())

@cjyetman
Copy link
Member Author

cjyetman commented Feb 8, 2024

Yeah, totally agree (retrospectively). My only concern is how much effort it will be to apply that universally and consistently.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 8, 2024

Just some more thoughts on this:

  • I feel like whatever the outcome here, we should always lean towards fewer exported functions a package to limit maintenance burden
  • In that case, I would imagine that the "most useful" exported function would return a data-frame?
  • But internally, I think separating the function in this way would be clear, as you outline here, would make sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants