Improve Handling of `cpp_options` #1022

katrinabrock · 2024-08-08T11:33:45Z

Addresses several issues with the opencl tests:

Model object does not correctly identify whether model was compiled with stan_opencl. Therefore, models were not recompiled within the tests when recompilation is needed. Resolved this by supplying a different exe_file for each tests so recompilation will always happen.
Tests referenced a metadata attribute that was not defined. I've added checks for opencl related attributes that are set when opencl is functioning properly.
Version test used a data input that was never created.

Submission Checklist

Run unit tests
Declare copyright holder and agree to license (see below)

Summary

Please describe the purpose of the pull request.

Copyright and Licensing

Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Max Planck Institute of Animal Behavior

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

SteveBronder · 2024-08-08T20:19:18Z

It sounds like the actual issue here is that cmdstanr does not correctly identify whether model was compiled with stan_opencl, should we be changing that inside of the model class instead of changing this test?

jgabry · 2024-08-14T17:13:15Z

@SteveBronder did you see this draft PR that's related to this issue? #1023 Perhaps these should be combined?

SteveBronder · 2024-08-15T14:37:24Z

Yes I think it would be good to merge these two PRs together into one

katrinabrock · 2024-08-20T11:58:19Z

Sounds good, I'll polish that one up and combine them.

Any idea if the opencl tests are actually running in CI (as in is CMDSTANR_OPENCL_TESTS set? If not, why not?

Also trying to get hold of a windows machine so I can repo the windows unit test failures. If there's any way to run these on ubuntu with docker or similar, would love to hear.

katrinabrock · 2024-08-20T12:36:55Z

guessing the test failures are due to this: r-lib/actions#217

katrinabrock · 2024-08-20T16:02:37Z

Sorry for the multiple pings in a day....but I found another related deepish problem.

This test seems to be erroneously passing in the master branch: https://github.com/stan-dev/cmdstanr/blob/master/tests/testthat/test-threads.R#L162-L166 As in, both STAN_THREADS is not successfully getting set to false (due to this issue: stan-dev/cmdstan#1293), and mod$sample is not successfully checking whether or not the binary has that flag set....so the test is passing even though it shouldn't be.

MRE (from master branch):

> mod <- cmdstan_model(
+   file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan"),
+   cpp_options = list(stan_threads = FALSE),
+   force_recompile = TRUE
+ )
Compiling Stan program...
> mod$cpp_options()$STAN_THREADS
[1] TRUE

The fix is that somewhere we need to convert falsy values at the R level to completely unset the ENV variable at the bash/cpp level. I'm not sure exactly where in the code it makes the most sense to do this, but I'll take a stab at it. Alternatively, we could inform users that they need to set cpp_options = list(stan_threads = '') or cpp_options = list(stan_threads = NULL).

andrjohns · 2024-08-20T16:46:06Z

The fix is that somewhere we need to convert falsy values at the R level to completely unset the ENV variable at the bash/cpp level

No we shouldn't do anything here, since this is behaviour in Cmdstan - not cmdstanr. The options a user specifies should match those that are then set in make/local, otherwise a user would get different behaviour from the same options depending on the interface

katrinabrock · 2024-08-21T05:57:28Z

ok, I'll update the test and the docs then. That's much easier. :-)

katrinabrock · 2024-08-21T14:00:04Z

Sorry for the wall of text, but I need some input before moving forward.

~~I believe this PR working.~~ [EDIT: Nope, there are still test failures, but that doesn't change the rest of this comment.]

It accomplishes:

opencl test fixes described in original description
fixes STAN_THREADS in make/local not respected due to capitalisation conflict #765 (makes cmdrstan check compile args accurately)
makes behavior, warnings, and docs more consistent with cmdstan's interpretation of flags

Does not solve

No automatic recompile when provided cpp_options don't match binary

Looking for input on two topics:

1. What should the behavior be when a model object is from an already compiled model that was compiled with different args than provided in the `cmdstan_model` function?

Some Options:

Silently do nothing at creation, complain or fail at sampling (current behavior)
Warn at creation, keep current behavior at sampling
Automatically recompile at creation

More context: To me, auto-recompile makes the most sense from a user experience perspective. However, this is not the current behavior of cmdstan itself. Currently, cmdstan considers a model up to date even if build args have changed. Not sure if this is a bug or expected behavior that is not explicitly documented. Most relevant doc I could find.

2. How should we standardize storage and checking of cpp_options/build args/compile info? Relatedly, what should be the output of `mod$cpp_options()`?

There are two aspects at issue:

should the names be uppercase (as passed to cmdstan) or lowercase (consistent with R style)?
when the flag is not set, should the value be NULL/empty/absent (as passed to cmdstan) or FALSE/false (as retrieved from cmdstan model info)?

Some Options:

The status-quo is definitely bad. It is inconsistent on whether the names (keys) are uppercase or lowercase, leading to inaccurate check results.
The version I have in this PR currently uses lowercase (per this comment: STAN_THREADS in make/local not respected due to capitalisation conflict #765 (comment)) and FALSE (consistent with the current output of model_compile_info. Note: this changes the user-facing output of user-facing mod$cpp_options().
Standardizing on uppercase would change the user-facing output of mod$cpp_options() less, (but still some in certain cases).
We could set both uppercase and lowercase for maximum backward compatibility.
An imo cleaner, but bigger change would be to consider cpp_options and model_compile_info as more separate, but related concepts. cpp_options being the R/pre-compile version, model_compile_info being the post-compile version. We could compare whether they are consistent with one another, but not explicitly assign one to the other as done currently here:

cmdstanr/R/model.R

Line 275 in 8b72eac

cpp_options <- model_compile_info(self$exe_file())

. This proposal would mean adding a mod$compile_info() method based on the binary associated with the model, and leave in place mod$cpp_options() based on the arguments the user provides. Then we could complain, error out, recompile, or what have you at various points if they're inconsistent with one another.

katrinabrock · 2024-08-22T14:57:25Z

One more wrinkle...prior to cmdstan v2.27.0, this "model info" does not exist. In that case, if the binary is already compiled, we have know way of knowing what flags were used.

For that matter...I think we are assuming throughout that the version of cmdstan the user currently has configured is the same as the version the binaries were compiled with. Currently, from what I can tell, we are not checking this assumption at all. We could check using model info when available, but before 2.27, all bets are off.

I'll have to think a bit about a clean way to resolve this. I'd love to hear if others have ideas.

jgabry · 2024-08-22T19:56:48Z

@andrjohns I'm a bit swamped with work at the moment and haven't had a chance to look into this yet. Do you by chance have time to comment on this?

katrinabrock · 2024-09-12T07:12:21Z

@jgabry @andrjohns I have a version I'm pretty happy with now. It stores what the user requested as cpp options and what the binary has configured separately, and (unless compile=FALSE) re compiles if there is a mismatch. I ran the CI on my fork and only the windows tests are failing. If someone on the team is able to give some feedback, especially on the change in functionality, that would be great! I'd love to get this in mergeable state. In any case, I will just use this version for my own purposes.

bob-carpenter · 2024-09-26T15:17:25Z

Hi, @andrjohns, @SteveBronder, and @jgabry: @katrinabrock showed up at the Stan meeting this week. She's working on this PR. She says she can fix the Windows issue, but before doing that, needs feedback on whether her changes are targeting what you think the behavior should be.

@katrinabrock: If you could remind everyone which specific things you need answered, that'd help.

@mitzimorris is going to take a first pass at working on requirements with @katrinabrock, but then there will probably be some questions on the final R integration.

katrinabrock · 2024-09-27T10:56:11Z

@mitzimorris @SteveBronder Here's my explanation of this change as promised:

Initial Symptom

In the status quo, when a model object is creating with an existing exe_file, the build args for this binary are essentially ignored (even though they are collected with the info command). In this case, when sample (or similar) is run, it does arg checking with inaccurate information about whether STAN_OPENCL and STAN_THREADS are set. This results in both false-positive and false-negative erroring and warning. (Error when everything is fine and no error when something is wrong.) Simple example here: katrinabrock@7697378#diff-f9bf409eca909f7147a92f3deaf3fdeb15bc4590ef2e6ad46caf9bfedf863160R127 this testcase fails in the status quo.

My Changes

User-facing Changes in `initialize`and `compile` logic

If an exe_file exists when initialize is called and a recompile does not occur, cmdstanr uses the info command to get as much info as possible about the version and cpp options used to create this binary. When sample and other methods are subsequently run, it uses this info to determine if the args provided to sample (or sister method) are compatible with cpp options that were provided at compile time. (This is the main thing that was not working before.)
If an exe_file exists when initialize is called and the user provided cpp_options that do not match the output of info, a recompile is triggered. (Unless compile=FALSE, in that case, there is just a warning).
If the user provides cpp_options to initialize or compile, they are "remembered" for subsequent compilation/recompilations. For example, in my version cmdstan_model(..., cpp_options=SOMETHING) and cmdstan_model(..., cpp_options=SOMETHING, compile=FALSE); mod$compile() have the same outcome. In master they will have different outcomes: SOMETHING would be used in the first snippet, and ignored in the second.

Other User-facing Changes

For error and warning logic that involves checking cmdstan version. cmdstanr no longer assumes that the version it has currently configured is the same as the version the model was compiled with. Instead, it uses the output of info if available. If info is unavailable, it uses the version currently configured, but in that case only generates warnings, not errors.
- This change is only applied to cpp_option related functions.
mod$cpp_options() give a deprecation warning as it was a mixture between user provided cpp options and the output of info. I attempted to leave the behavior of this function unchanged. Did not thoroughly test the unchanged-ness.
mod$exe_info() gives the data from the info command as an R object.
- With update=TRUE the command is re-run before returning. The user should only need to explicitly add update=TRUE if they re-compiled outside of that model object (e.g. by running cmdstan directly).
mod$precompiled_options() gives the options that will be used if compile() method is run without explicitly passing cpp_options. If the user has provided cpp_options in past runs of initialize or compile, it will be those. names are standardized to lowercase. Values are unchanged.
If the user provides FALSE on an option we know is a flag, warn them that this will result in the flag being set. (e.g. list(opencl=FALSE) turns opencl ON).

Developer-facing changes to CmdStanModel

Previously private$precompile_cpp_options_ only held the user-specified options until a R-triggered compile takes place, then it would be set to null. Now, we keep this information around whether or these options have been applied to the binary.
Previously private$exe_file_ was only set if the exe_file exists (either passed by the user or in an R-triggered compile). Now, if the exe_file does not exist, this variable will still be set to the path that we think it will be written to.
private$exe_info_ new attribute separate from cpp_options_ and precompile_cpp_options_. Holds info gleaned from cmdstan's post-compile info command.

Test related changes

added with_mock_cli, expect_mock_compile, and expect_no_mock_compile that allow developers to test the cmdstanr recompile logic that with developer-defined outputs of cmdstan. This both enables a cleaner separation between cmdstan logic and cmdstanr logic and also allows tests of recompile logic to run much faster.
since logic around stan features now uses model-specific stan version where possible, fake_cmdstan_version can now update this model-specific version as well as the package-level version
For a bunch of tests that are not intended to test recompile logic, I added recompile=TRUE. These tests are intended to test something that happens after compilatio. Sometimes (even with my changes) a binary is leftover from a previous test and a recompile was not properly triggered. This resulted in a mismatch between the binary and the model object.

SteveBronder · 2024-09-27T21:03:16Z

Thank you! These changes totally make sense now and handle a lot of the issues we have from dealing with a make build system. Monday I will give this a harder read through and have some Qs

andrjohns · 2024-09-30T15:13:37Z

Apologies for the extended delay @katrinabrock! I'm catching back up on Stan dev this week, and have better access to an OpenCL system now. I'll put aside some time to review on Wednesday

SteveBronder · 2024-10-16T21:38:08Z

@katrinabrock hi my apologies for the delay. Looking at the tests it looks like the new ones for the compile options are failing on windows?

katrinabrock · 2024-10-17T10:34:17Z

Hi @SteveBronder,

Thanks for taking a look. That's correct. As mentioned in @bob-carpenter's comment above, right now, I'm not looking for approval of the code. I'd like to make sure that the functionality I'm trying to implement is the functionality desired by the core dev team.

Does everything I've listed in the comment above from three weeks ago seem like the right set of changes?

If so, I'll go ahead and diagnosis and resolve the windows issue. If you would like me to make a different set of logic changes, I'd like to settle on what the logic should be and implement that before worrying about the cross platform aspect.

bob-carpenter · 2024-10-21T14:38:59Z

Thanks, @katrinabrock, that sounds totally reasonable.

jgabry · 2024-11-08T18:34:40Z

@SteveBronder when you have a chance do you mind taking another look, based on @katrinabrock's most recent comment. I think you're much more familiar with all these CmdStan C++ options than I am. It sounded like @katrinabrock was looking for confirmation that the outlined approach in #1022 (comment) is reasonable. It seems reasonable to me but it would be good to get either you (Steve) or @andrjohns to confirm too.

katrinabrock · 2024-11-10T10:56:56Z

I think @mitzimorris could also look over the logic and see if there are any concerns since we would likely want to have cmdstanpy and cmdstanr in sync.

- Test new logic with mocked CLI

Co-authored-by: Steve Bronder <[email protected]>

katrinabrock · 2024-12-18T15:42:26Z

@SteveBronder @jgabry I think I've addressed everything from @SteveBronder's last review. Based on his last comment, I think that review was partial and he wanted to look at the tests a bit more.

Based on the comments thus far, seems like the overall approach is sound so it will be worthwhile to fix the windows issue. I haven't done that yet, but I plan to.

andrjohns

Thanks for all of this work @katrinabrock! Given the amount of review that's already happened and how long it's taken me to look at it, I'm not going to hold this up with any substantive requests.

Just a minor style change, can you update all of the single-line if statements to be split across multiple lines? For example:

 if (!file.exists(private$exe_file_)) return(NULL)

to:

 if (!file.exists(private$exe_file_)) {
   return(NULL)
 }

I'll work through any windows/WSL issues now

bob-carpenter · 2024-12-27T18:13:31Z

@andrjohns: Is there an automatic formatter for R we can configure to do this automatically? We got tired of having our PRs dominated by formatting, so we now use black for Python and ClangFormat for C++.

katrinabrock · 2024-12-30T09:56:03Z

Looks like the unit tests are failing now. Even on linux 🤔 .

Important part

@andrjohns I can probably solve the windows issue myself. I'd rather you spend your time looking at the functionality and implementation. I'm a bit concerned that I haven't really gotten substantive critiques yet. Steve's comments were mostly clarifications, formatting, and small issues. (Maybe because the substance is already perfect? 🤷‍♀️ 😁 😈 ) Those kinds of comments are valuable, I'd just like to make sure that some folks on the core dev team understand the changes deeply enough that they'll be patched instead of reverted if some small bug reports come in after rollout. I'm certainly willing to help, but I don't want to be the only one who understands this now 1.1K line piece of the codebase.

btw - I'm still open to re-breaking this up into a bunch of smaller PRs.

Less important part

@bob-carpenter Yes, lintr package can search for issues and styler package will fix them for you.
@andrjohns Here is the .lintr file that I used to check my diff. (I fixed manually because I only wanted to touch the lines I was already changing.)

linters: linters_with_defaults(
    line_length_linter(85),
    object_usage_linter = NULL
  )

Currently, afaik there are no default linters that even allow you to set a requirement disallowing single line if. I think if this is something you'd like enforced in your project, you should build a custom linter for it. I can do a Q&D version of it for this PR.

katrinabrock mentioned this pull request Aug 8, 2024

STAN_THREADS in make/local not respected due to capitalisation conflict #765

Open

katrinabrock changed the title ~~Fix opencl~~ Fix opencl tests Aug 8, 2024

This was referenced Aug 21, 2024

Make compile(..., cpp_options()) consistent with makefile options #1019

Open

Issue 765 #1023

Closed

katrinabrock force-pushed the fix-opencl-tests branch from 96cce2c to 0eb126e Compare August 21, 2024 12:52

katrinabrock changed the title ~~Fix opencl tests~~ Improve Handling of cpp_options Aug 21, 2024

katrinabrock marked this pull request as draft August 21, 2024 12:54

katrinabrock mentioned this pull request Aug 21, 2024

Expected? cmdstan considers binary up-to-date when make is run with different build args stan-dev/cmdstan#1294

Open

bob-carpenter marked this pull request as ready for review September 26, 2024 15:13

katrinabrock and others added 22 commits December 18, 2024 10:36

Re-write cpp arg handling code

0491e9d

- Test new logic with mocked CLI

fixup remove browser call

1f316d5

fixup: with changes, make opencl tests pass

964c7f6

WIP

d7e7da6

wip

81cbdd0

WIP

c2fdcc5

Tests Passing?

55b13f9

squash some bugs

c4f36cb

...how about now?

3e7926d

fixup

c1382c5

fixup

252a4e7

fixup

496359e

defaults list()-> NULL

9885a06

remove/rename exe_info variable

e589db6

Update R/model.R

93401eb

Co-authored-by: Steve Bronder <[email protected]>

Update R/model.R

d69675f

Co-authored-by: Steve Bronder <[email protected]>

add tests of user_header prescidence

a327449

fix typo

e2a2e5f

add verbose arg to info call

454a351

Add commend detailing mock logic

40a0e01

move exe file path resolution to separate function

9e071fe

remove files removed in master

f993531

katrinabrock force-pushed the fix-opencl-tests branch from 5fb5f9b to f993531 Compare December 18, 2024 14:22

lint

66aa1b4

katrinabrock requested a review from SteveBronder December 18, 2024 15:41

fix incomplete method renaming

93bd277

andrjohns reviewed Dec 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Handling of `cpp_options` #1022

Improve Handling of `cpp_options` #1022

katrinabrock commented Aug 8, 2024

SteveBronder commented Aug 8, 2024

jgabry commented Aug 14, 2024 •

edited

Loading

SteveBronder commented Aug 15, 2024

katrinabrock commented Aug 20, 2024

katrinabrock commented Aug 20, 2024

katrinabrock commented Aug 20, 2024

andrjohns commented Aug 20, 2024

katrinabrock commented Aug 21, 2024

katrinabrock commented Aug 21, 2024 •

edited

Loading

katrinabrock commented Aug 22, 2024

jgabry commented Aug 22, 2024

katrinabrock commented Sep 12, 2024

bob-carpenter commented Sep 26, 2024 •

edited

Loading

katrinabrock commented Sep 27, 2024 •

edited

Loading

SteveBronder commented Sep 27, 2024

andrjohns commented Sep 30, 2024

SteveBronder commented Oct 16, 2024 •

edited

Loading

katrinabrock commented Oct 17, 2024

bob-carpenter commented Oct 21, 2024

jgabry commented Nov 8, 2024

katrinabrock commented Nov 10, 2024

katrinabrock commented Dec 18, 2024

andrjohns left a comment

bob-carpenter commented Dec 27, 2024

katrinabrock commented Dec 30, 2024

Improve Handling of cpp_options #1022

Are you sure you want to change the base?

Improve Handling of cpp_options #1022

Conversation

katrinabrock commented Aug 8, 2024

Submission Checklist

Summary

Copyright and Licensing

SteveBronder commented Aug 8, 2024

jgabry commented Aug 14, 2024 • edited Loading

SteveBronder commented Aug 15, 2024

katrinabrock commented Aug 20, 2024

katrinabrock commented Aug 20, 2024

katrinabrock commented Aug 20, 2024

andrjohns commented Aug 20, 2024

katrinabrock commented Aug 21, 2024

katrinabrock commented Aug 21, 2024 • edited Loading

1. What should the behavior be when a model object is from an already compiled model that was compiled with different args than provided in the cmdstan_model function?

2. How should we standardize storage and checking of cpp_options/build args/compile info? Relatedly, what should be the output of mod$cpp_options()?

katrinabrock commented Aug 22, 2024

jgabry commented Aug 22, 2024

katrinabrock commented Sep 12, 2024

bob-carpenter commented Sep 26, 2024 • edited Loading

katrinabrock commented Sep 27, 2024 • edited Loading

Initial Symptom

My Changes

User-facing Changes in initializeand compile logic

Other User-facing Changes

Developer-facing changes to CmdStanModel

Test related changes

SteveBronder commented Sep 27, 2024

andrjohns commented Sep 30, 2024

SteveBronder commented Oct 16, 2024 • edited Loading

katrinabrock commented Oct 17, 2024

bob-carpenter commented Oct 21, 2024

jgabry commented Nov 8, 2024

katrinabrock commented Nov 10, 2024

katrinabrock commented Dec 18, 2024

andrjohns left a comment

Choose a reason for hiding this comment

bob-carpenter commented Dec 27, 2024

katrinabrock commented Dec 30, 2024

Important part

Less important part

Improve Handling of `cpp_options` #1022

Improve Handling of `cpp_options` #1022

jgabry commented Aug 14, 2024 •

edited

Loading

katrinabrock commented Aug 21, 2024 •

edited

Loading

1. What should the behavior be when a model object is from an already compiled model that was compiled with different args than provided in the `cmdstan_model` function?

2. How should we standardize storage and checking of cpp_options/build args/compile info? Relatedly, what should be the output of `mod$cpp_options()`?

bob-carpenter commented Sep 26, 2024 •

edited

Loading

katrinabrock commented Sep 27, 2024 •

edited

Loading

User-facing Changes in `initialize`and `compile` logic

SteveBronder commented Oct 16, 2024 •

edited

Loading