Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output file or directory in rsmpredict #396

Open
desilinguist opened this issue Mar 9, 2020 — with Slack · 3 comments
Open

Output file or directory in rsmpredict #396

desilinguist opened this issue Mar 9, 2020 — with Slack · 3 comments

Comments

Copy link
Member

Currently, rsmpredict supports an undocumented option of specifying an output directory instead of file if the output_file does not have a .csv or .xlsx extension. However, there are several inconsistencies:

  1. This option is not documented so the docstring is inaccurate.
  2. The output file format is controlled by the file_format setting in the rsmpredict configuration file and the extension of the specified file is totally ignored, if specified.
  3. The directory bit is untested in addition to being undocumented.
  4. The .tsv file format is not represented in the check that determines whether the output is a file or a directory.

I think a reasonable solution would be to:

  1. Get rid of the directory output option entirely.
  2. Make it so that the output argument is called output_prefix with the file format specified in the configuration file overriding the file format on the command line and an appropriate warning generated.
@desilinguist desilinguist added the bug label Mar 9, 2020 — with Slack
@desilinguist
Copy link
Member Author

Actually, now that I think a bit more about it, I think we should strive for consistency. So, here's an alternative I'd prefer:

  • Make rsmpredict also use an output directory.
  • Make --features into a boolean flag so that the pre-processed features are always saved in the given output directory with a fixed name, just like the predictions.

I think this is much simpler than what I had suggested above.

@aloukina
Copy link
Collaborator

aloukina commented Mar 10, 2020

I see the point about consistency, although I can see myself being very annoyed as a user: if I am running multiple experiments, I will end up with lots of directories each containing a single file with the same name. I personally prefer to have one directory with many files.
How about we take consistency even further and add a new field prediction_id that will be used as a prefix for the predictions file/other outputs files? (We could also make it optional and by default set to be equal to experiment_id)? Then if we also add -f option already available to other tools, I'll be able to continue doing what I want and we'll have consistent approach across the tools?

@desilinguist
Copy link
Member Author

Hmm, yes I can see how that can be quite annoying. I like your suggestion! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants