Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for cran 226 #1087

Merged
merged 3 commits into from
Sep 7, 2023
Merged

Changes for cran 226 #1087

merged 3 commits into from
Sep 7, 2023

Conversation

bgoodri
Copy link
Contributor

@bgoodri bgoodri commented Sep 7, 2023

Summary:

These are Andrew's changes, of which the substantive ones are already in develop I think but this branched off an earlier point.

Intended Effect:

Get RStan 2.26 CRAN-ready

How to Verify:

Side Effects:

Documentation:

Reviewer Suggestions:

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@andrjohns
Copy link
Contributor

@bgoodri after the StanHeaders_2.26 branch was merged it looks like there were a couple of merges to develop that changed the stanc.js file in some way. After reverting those changes, all now passes

@bgoodri bgoodri merged commit 5ef1bc3 into develop Sep 7, 2023
@WardBrian
Copy link
Member

@andrjohns would those changes have been from the lexer-patch branch?

@andrjohns
Copy link
Contributor

@andrjohns would those changes have been from the lexer-patch branch?

No, the stanc.js in the Stan headers branch was already from that lexer-patch branch. This was just some git merging gone awry

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

The javascript parser is taking several minutes to fail (without a helpful error message) on some complicated rstanarm models, which I don't remember happening before. Are we sure we got the right deconflicted stanc.js? This is true even if I first canonicalize them with CmdStan 2.33, such as
jm.txt In other words, if you build rstan from the develop branch

rstan::stanc("jm.txt")

is not working.

@andrjohns
Copy link
Contributor

andrjohns commented Sep 8, 2023

@bgoodri I tested locally and with a github actions run: https://github.com/stan-dev/rstan/actions/runs/6117301408

And I can't replicate the failure.

The parser can be a bit slow with rstan 2.26, but is improved significantly with the 2.31 parser (several optimisations had been made). But if you're not sure then just revert to using V8, it's not essential enough to be delaying submission.

Additionally I saw that you moved the stanc.js back from StanHeaders to rstan. We have to keep the JS in StanHeaders otherwise the c++ generated by stanc in rstan won't always be compatible with the headers in StanHeaders during updates. This would cause the rstan 2.26 and StanHeaders 2.32 combination to break.

The original PR is here: #1047 and the relevant issue here: #1046

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

Does fe8e55a look right? When I first try V8, then the Stan programs in rstanarm parse.

Slow is tolerable but less so failing on Stan program that the C++ parser parsed. But the RStudio editor tries to parse after every keystroke, so slow is not ideal. @WardBrian Can a javascript parser from 2.32 be created that has a quicker option to just parse and return a status code or error message that does not go on to generate C++ or anything that would waste milliseconds?

@andrjohns
Copy link
Contributor

Does fe8e55a look right? When I first try V8, then the Stan programs in rstanarm parse.

Slow is tolerable but less so failing on Stan program that the C++ parser parsed. But the RStudio editor tries to parse after every keystroke, so slow is not ideal. @WardBrian Can a javascript parser from 2.32 be created that has a quicker option to just parse and return a status code or error message that does not go on to generate C++ or anything that would waste milliseconds?

I don't think this is important enough to be a blocker for 2.26. Why not just revert to V8, make the submission, and we can sort this out later?

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023 via email

@andrjohns
Copy link
Contributor

It actually did get accepted to CRAN without the V8

Great! But just a reminder that the StanHeaders 2.32 submission will currently fail revdeps since you've moved the stanc.js file out of StanHeaders and into rstan. As mentioned in the linked PR above, the c++ generated by the 2.26 stanc is NOT compatible with the 2.32 headers because of a significant change to indexing in the stan repo.

You'll need to either revert to having stanc.js provided by StanHeaders (so that the generated c++ is always consistent with the headers), or additionally bundle the 2.32 stanc.js in rstan and detect which to use based on the installed StanHeaders version

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

Yeah, we are going to have to do something about that, as well as the V8 / QuickJSR thing. V8 is a problem for some Linux servers and CRAN doesn't want packages to rely to heavily on it. But if people can write Stan programs that QuickJSR does not parse (on some OS?), then it seems like we have to at least have a V8 option.

@andrjohns
Copy link
Contributor

andrjohns commented Sep 8, 2023

Yeah, we are going to have to do something about that

Whichever approach we use doesn't have to be permanent, just enough to get past the large version jump. Did you have a preference for either of the two options I mentioned? If you let me know which one you prefer then I can put together a PR for the 2.32 CRAN submission

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

I think what we have now in develop, fe8e55a , would work, right? It does a requireNamespace("V8") and if that fails, makes stanc_ctx with QuickJSR. Then it tries to validate the stanc.js from StanHeaders, and if that fails takes the stanc.js from RStan. So, if we had a StanHeaders based on 2.32, it would use that (and on CRAN, we could presume V8 is found).

@andrjohns
Copy link
Contributor

Yep as long as the stanc.js from StanHeaders is preferenced it doesn't really matter whether V8 or QuickJSR is used, I was just going off the version that's currently on CRAN.

Would you be able to submit an rstan update to CRAN to use that change, then I can run the revdeps with StanHeaders 2.32 and see if any final changes are needed?

@WardBrian
Copy link
Member

WardBrian commented Sep 8, 2023

@WardBrian Can a javascript parser from 2.32 be created that has a quicker option to just parse and return a status code or error message that does not go on to generate C++ or anything that would waste milliseconds

Theoretically, though from my previous experience the majority of the time is actually spent in the front-end of the compiler, so I doubt this would have the desired effect.

To add another data point:

Version Runtime Time to compile jm.stan
2.33 native executable 190ms
2.33 nodejs 19.9 1,403ms
2.32 QuickJSR via rstan::stanc 18,275ms
2.26 native executable 460ms
2.26 nodejs 19.9 3,240ms
2.26 QuickJSR via rstan::stanc 69,348ms

So it's definitely a big slowdown to use QuickJS, but it does seem to eventually work.

If your results are much worse @bgoodri, is it possible you're using a QuickJS which was compiled with debug flags (like this issue: r-lib/pkgbuild#154)?

@andrjohns
Copy link
Contributor

Here are the times with QuickJSR & 2.32:

> system.time(t <- rstan::stanc("~/Downloads/jm.txt"))
   user  system elapsed 
 13.627   0.148  13.793 

@WardBrian
Copy link
Member

@andrjohns if there is a easy way for me to get 2.32 I can run it on my machine so we have consistent hardware behind those numbers (that seems about right, though)

@andrjohns
Copy link
Contributor

There are pre-built binaries or source packages in the latest actions run from the experimental branch: https://github.com/stan-dev/rstan/actions/runs/6112653972

@andrjohns
Copy link
Contributor

andrjohns commented Sep 8, 2023

And as another interesting comparison point, using the C bytecode sources I get the following timings:

andrew@Andrews-MacBook-Air stanc3-bytecode % time ./stanc jm.txt 
./stanc jm.txt  0.95s user 0.01s system 99% cpu 0.965 total

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

In most cases, a few seconds of compile time does not matter, but I am a bit worried about RStudio's quasi realtime syntax checking getting a backlog of calls to the javascript parser that take seconds rather than milliseconds.

@bgoodri
Copy link
Contributor Author

bgoodri commented Sep 8, 2023

Also, @andrjohns if we had a StanHeaders based on 2.32, couldn't we have rstantools call the canonicalizer on the .stan files as part of the package's Makevars{.win} before it generates the C++? It seems like that would reduce the number of CRAN packages that require manual intervention to migrate their array syntax.

@WardBrian
Copy link
Member

Thanks @andrjohns - my machine was ~3.7x faster with 2.32's javascript parser.

This is actually super nice for me to see, I was only ever directly profiling the native executable, so the fact that the JS also got so much faster over those versions is really nice.

@andrjohns
Copy link
Contributor

Also, @andrjohns if we had a StanHeaders based on 2.32, couldn't we have rstantools call the canonicalizer on the .stan files as part of the package's Makevars{.win} before it generates the C++? It seems like that would reduce the number of CRAN packages that require manual intervention to migrate their array syntax.

Yep, this was discussed over here: stan-dev/rstantools#113

@WardBrian
Copy link
Member

StanHeaders based on 2.32, couldn't we have rstantools call the canonicalizer on the .stan files as part of the package's Makevars{.win} before it generates the C++

Are you able to then write the updated file back to disk? If not, this strategy wouldn't work if StanHeaders updated to a version past 2.33, where the old syntax is completely removed (i.e., even the cannonicalizer won't necessarily recognize it)

@andrjohns
Copy link
Contributor

Thanks @andrjohns - my machine was ~3.7x faster with 2.32's javascript parser.

This is actually super nice for me to see, I was only ever directly profiling the native executable, so the fact that the JS also got so much faster over those versions is really nice.

Absolutely, this transition away from V8 wouldn't really be all that feasible without them, so the work is definitely appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants