Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working weights are not updated properly in bayesglm.R #110

Open
NikTuzov opened this issue Apr 16, 2019 · 25 comments
Open

Working weights are not updated properly in bayesglm.R #110

NikTuzov opened this issue Apr 16, 2019 · 25 comments

Comments

@NikTuzov
Copy link

Hello:

I believe there is a bug in bayesglm.R. I attached bayesglm_A_1.R where I inserted a few cat statements. If you run it using the code from the other file, you'll see from the output that the last two weights that correspond to pseudo-data are initialized but never updated.

The problem is that in .bayesglm.fit.loop.updateState() the working weights are computed as:

w.star <- c(w, sqrt(state$dispersion)/priors$scale)

but to update them one needs to use state$prior.sd in the denominator, which should be equal to priors$scale only in the beginning. It looks like you only need to change one line to fix it.

Regards,
Nik Tuzov

bayesglm_issue.zip

@gfinak
Copy link
Member

gfinak commented Apr 16, 2019

Thanks for this report. @amcdavid can you confirm?

@gfinak
Copy link
Member

gfinak commented Apr 16, 2019

Running this I also found it differs from the output of arm::bayesglm.fit in version 1.10.1, and the change of
w.star <- c(w, sqrt(state$dispersion)/priors$scale)
to
w.star <- c(w, sqrt(state$dispersion)/state$prior.sd)
leads to a non-convergence warning.
@amcdavid this raises the question whether we should import the function from arm, perhaps have an option to run the old code for back-compatible reproducibility.

@amcdavid
Copy link
Member

Thanks Nik for the report. I think I agree that this may be a bug, but I definitely don't understand this code well (because we didn't write it). As such, I am hesitant to make changes without a failing unit test. Do we have an example, and an expected result to show what exactly is wrong about this, and verify that changing this line fixes it?

This is code from Gelman et al's arm package that we directly included into MAST. (We did this rather than Importing arm because there were changes being made to arm by one of the authors without regression testing and this was breaking things in MAST.) As such, I can only guess at what it's doing. On line 434 we seem to be augmenting the weights for a weighted least squares with pseudo observations that represent the prior. Then on lines 473-475 we seem to be doing something to represent the t-prior as a scale mixture. As far as I can tell, state$prior.sd is never used again in the main loop, and I suppose it should be.

@gfinak
Copy link
Member

gfinak commented Apr 16, 2019

The main loop calls .bayesglm.fit.loop.main.ideal which calls another function, passing in priors$scale to update the state$prior.sd variable.
The latest arm code seems to be entirely in Fortran, Neve mind, I seem to have had a stroke.I don't find these utility functions there at all.

@amcdavid
Copy link
Member

Where do you see fortran in the repo? I don't see any. FWIW, here's the diff between our version of arm and the latest: cran/arm@82ebb3c

I would rather move away from arm altogether and implement the logistic model in stan or rstanarm. Arm is legacy code, and doesn't have any unit or regression testing, whereas stan and rstanarm are tested and more modern. They may be faster anyways, since they are in C. If we do that we can add an option to use the "old version".

@gfinak
Copy link
Member

gfinak commented Apr 16, 2019

Sorry, you're right, not sure what the heck I saw.

@amcdavid
Copy link
Member

amcdavid commented Apr 16, 2019

And actually this is the diff between our version of bayesglm and the latest version. It has been refactored to the point of being unrecognizable. Presumably for the better, but I am hesitant to mess with things given that lack of unit tests.

@gfinak
Copy link
Member

gfinak commented Apr 16, 2019

Okay, I think we have to update things. I like the idea of using rstanarm.

@amcdavid
Copy link
Member

👍 It's been on my todo for a while. I probably won't be able to get to it until middle of next week, so feel free to take a stab. In any case, we wouldn't want to push anything to bioconductor until after the bioconductor release.

@NikTuzov
Copy link
Author

Hello Greg and Andrew:

Thanks for responding.
Can you say whether that bug or any other major bug was present in MAST around spring 2017 when Soneson and Robinson worked on their paper in which they evaluated MAST performance.

Nik

@gfinak
Copy link
Member

gfinak commented Apr 17, 2019

It's not clear to me yet what the impact of this particular bug was on that paper, but it was present at the time. That said, it was a bug in arm as well, when MAST was released. It's obviously a priority to address it.

@NikTuzov
Copy link
Author

Please make sure a similar issue is not present in rstanarm or whatever package you are going to use.

@gfinak
Copy link
Member

gfinak commented Apr 17, 2019

@NikTuzov Let me be clear, that this was a bug in arm.

@suyusung
Copy link

suyusung commented Apr 21, 2019

@NikTuzov We have decided to revert bayesglm back to an older version (when Andrew just wrote it from scratch) because the new one was buggy. We made several mistakes when we tried to clean the codes up using many functions embedded in the bayesglm(). In any case, bayesglm() and bayesglm.fit() in arm_1.10.1 should be a good (if not buggy) version to use. Having said this, I use the zip file in your post to test it, the current bayesglm() is working. I'm not sure if I overlooked anything mentioned in your post. If you can be more clear about the bug and whether or not the bug still affects the current version, you can help us make arm better. Thanks!

@NikTuzov
Copy link
Author

@gfinak
Hello Greg:

What is the present state of things? Are the weights updated properly?

Regards,
Nik

@gfinak
Copy link
Member

gfinak commented Mar 18, 2020

I've pushed a fix to a feature branch reintroducing dependence on the latest arm package.
That's waiting for code review.

@amcdavid
Copy link
Member

amcdavid commented Mar 18, 2020 via email

@gfinak
Copy link
Member

gfinak commented Mar 18, 2020

If it's a bug, why?

@amcdavid
Copy link
Member

amcdavid commented Mar 18, 2020 via email

@gfinak
Copy link
Member

gfinak commented Nov 19, 2020

@amcdavid any thoughts on merging this particular fix now that we've had several Bioc release cycles?

@NikTuzov
Copy link
Author

That would be much appreciated.

@amcdavid
Copy link
Member

amcdavid commented Nov 20, 2020

I would still like to have a failing test, or at least a diff of old vs new results to understand the impact of this change. Beyond it being "wrong" can @NikTuzov explain why this is important to him? That said, I don't feel that strongly against it as long as we indicate a reproducibility-breaking change somewhere prominent (maybe in a message that is printed out ala tidyverse).

@NikTuzov
Copy link
Author

NikTuzov commented Jan 4, 2021

Sorry I didn't reply back then.
It is important because I implemented MAST in Partek Flow:

https://www.partek.com/partek-flow/

Back then, I didn't want to fix it on my own because the result wouldn't have matched the official R package. Is the fixed version available now?

@NikTuzov
Copy link
Author

Hello Greg:

I take it the fix is still on this topic branch:

https://github.com/RGLab/MAST/tree/fix/bayesglm

Are there any plans to push it into master?

Regards,
Nik

@amcdavid
Copy link
Member

I would still like to see a failing unit test, and some larger scale integration tests so we fully understand the implication of changing the backend. Really we probably need these integration tests implemented at some point no matter what, but it's been very much on the backburner for me.

I credit that the code does not reflect the intended model, but at this point it's been used for so long I don't think we should change it without the appropriate groundwork laid. You are of course, free to fork if you need this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants