Working weights are not updated properly in bayesglm.R #110

NikTuzov · 2019-04-16T20:10:33Z

Hello:

I believe there is a bug in bayesglm.R. I attached bayesglm_A_1.R where I inserted a few cat statements. If you run it using the code from the other file, you'll see from the output that the last two weights that correspond to pseudo-data are initialized but never updated.

The problem is that in .bayesglm.fit.loop.updateState() the working weights are computed as:

w.star <- c(w, sqrt(state$dispersion)/priors$scale)

but to update them one needs to use state$prior.sd in the denominator, which should be equal to priors$scale only in the beginning. It looks like you only need to change one line to fix it.

Regards,
Nik Tuzov

bayesglm_issue.zip

gfinak · 2019-04-16T20:11:55Z

Thanks for this report. @amcdavid can you confirm?

gfinak · 2019-04-16T22:39:04Z

Running this I also found it differs from the output of arm::bayesglm.fit in version 1.10.1, and the change of
w.star <- c(w, sqrt(state$dispersion)/priors$scale)
to
w.star <- c(w, sqrt(state$dispersion)/state$prior.sd)
leads to a non-convergence warning.
@amcdavid this raises the question whether we should import the function from arm, perhaps have an option to run the old code for back-compatible reproducibility.

amcdavid · 2019-04-16T22:40:35Z

Thanks Nik for the report. I think I agree that this may be a bug, but I definitely don't understand this code well (because we didn't write it). As such, I am hesitant to make changes without a failing unit test. Do we have an example, and an expected result to show what exactly is wrong about this, and verify that changing this line fixes it?

This is code from Gelman et al's arm package that we directly included into MAST. (We did this rather than Importing arm because there were changes being made to arm by one of the authors without regression testing and this was breaking things in MAST.) As such, I can only guess at what it's doing. On line 434 we seem to be augmenting the weights for a weighted least squares with pseudo observations that represent the prior. Then on lines 473-475 we seem to be doing something to represent the t-prior as a scale mixture. As far as I can tell, state$prior.sd is never used again in the main loop, and I suppose it should be.

gfinak · 2019-04-16T22:46:41Z

The main loop calls .bayesglm.fit.loop.main.ideal which calls another function, passing in priors$scale to update the state$prior.sd variable.
~~The latest arm code seems to be entirely in Fortran,~~ Neve mind, I seem to have had a stroke.I don't find these utility functions there at all.

amcdavid · 2019-04-16T22:48:11Z

Where do you see fortran in the repo? I don't see any. FWIW, here's the diff between our version of arm and the latest: cran/arm@82ebb3c

I would rather move away from arm altogether and implement the logistic model in stan or rstanarm. Arm is legacy code, and doesn't have any unit or regression testing, whereas stan and rstanarm are tested and more modern. They may be faster anyways, since they are in C. If we do that we can add an option to use the "old version".

gfinak · 2019-04-16T22:57:12Z

Sorry, you're right, not sure what the heck I saw.

amcdavid · 2019-04-16T23:01:01Z

And actually this is the diff between our version of bayesglm and the latest version. It has been refactored to the point of being unrecognizable. Presumably for the better, but I am hesitant to mess with things given that lack of unit tests.

gfinak · 2019-04-16T23:03:07Z

Okay, I think we have to update things. I like the idea of using rstanarm.

amcdavid · 2019-04-16T23:06:04Z

👍 It's been on my todo for a while. I probably won't be able to get to it until middle of next week, so feel free to take a stab. In any case, we wouldn't want to push anything to bioconductor until after the bioconductor release.

NikTuzov · 2019-04-17T14:50:21Z

Hello Greg and Andrew:

Thanks for responding.
Can you say whether that bug or any other major bug was present in MAST around spring 2017 when Soneson and Robinson worked on their paper in which they evaluated MAST performance.

Nik

gfinak · 2019-04-17T15:11:37Z

It's not clear to me yet what the impact of this particular bug was on that paper, but it was present at the time. That said, it was a bug in arm as well, when MAST was released. It's obviously a priority to address it.

NikTuzov · 2019-04-17T16:48:13Z

Please make sure a similar issue is not present in rstanarm or whatever package you are going to use.

gfinak · 2019-04-17T16:48:57Z

@NikTuzov Let me be clear, that this was a bug in arm.

suyusung · 2019-04-21T10:44:22Z

@NikTuzov We have decided to revert bayesglm back to an older version (when Andrew just wrote it from scratch) because the new one was buggy. We made several mistakes when we tried to clean the codes up using many functions embedded in the bayesglm(). In any case, bayesglm() and bayesglm.fit() in arm_1.10.1 should be a good (if not buggy) version to use. Having said this, I use the zip file in your post to test it, the current bayesglm() is working. I'm not sure if I overlooked anything mentioned in your post. If you can be more clear about the bug and whether or not the bug still affects the current version, you can help us make arm better. Thanks!

NikTuzov · 2020-03-18T16:23:51Z

@gfinak
Hello Greg:

What is the present state of things? Are the weights updated properly?

Regards,
Nik

gfinak · 2020-03-18T19:32:13Z

I've pushed a fix to a feature branch reintroducing dependence on the latest arm package.
That's waiting for code review.

amcdavid · 2020-03-18T19:44:06Z

We should consider keeping the old, albeit buggy behavior around, at least for a bioconductor version or two.

…

On Mar 18, 2020, at 3:32 PM, Greg Finak ***@***.***> wrote: I've pushed a fix to a feature branch reintroducing dependence on the latest arm package. That's waiting for code review. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALLAHSEHQMEY2S3HTCXRWDRIEOUXANCNFSM4HGOOKQQ>.

gfinak · 2020-03-18T19:48:51Z

If it's a bug, why?

amcdavid · 2020-03-18T20:21:32Z

1. Because it will break reproducibility, and many users aren't sophisticated enough to maintain parallel installs of packages. 2. it's been there for several years. 3. It corresponds to a well defined model -- in effect we have been using Gaussian priors rather than t priors. 4. it is reflected in the independent benchmarks that have been run. At a minimum. we need to understand the impact of the change so we know what to put in the NEWS.

…

On Mar 18, 2020, at 3:49 PM, Greg Finak ***@***.***> wrote: If it's a bug, why? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALLAHQVCYI43H7XAISZBDTRIEQTHANCNFSM4HGOOKQQ>.

gfinak · 2020-11-19T16:50:27Z

@amcdavid any thoughts on merging this particular fix now that we've had several Bioc release cycles?

NikTuzov · 2020-11-19T16:55:06Z

That would be much appreciated.

amcdavid · 2020-11-20T03:27:00Z

I would still like to have a failing test, or at least a diff of old vs new results to understand the impact of this change. Beyond it being "wrong" can @NikTuzov explain why this is important to him? That said, I don't feel that strongly against it as long as we indicate a reproducibility-breaking change somewhere prominent (maybe in a message that is printed out ala tidyverse).

NikTuzov · 2021-01-04T20:12:09Z

Sorry I didn't reply back then.
It is important because I implemented MAST in Partek Flow:

https://www.partek.com/partek-flow/

Back then, I didn't want to fix it on my own because the result wouldn't have matched the official R package. Is the fixed version available now?

NikTuzov · 2021-03-12T18:01:43Z

Hello Greg:

I take it the fix is still on this topic branch:

https://github.com/RGLab/MAST/tree/fix/bayesglm

Are there any plans to push it into master?

Regards,
Nik

amcdavid · 2021-03-15T01:26:12Z

I would still like to see a failing unit test, and some larger scale integration tests so we fully understand the implication of changing the backend. Really we probably need these integration tests implemented at some point no matter what, but it's been very much on the backburner for me.

I credit that the code does not reflect the intended model, but at this point it's been used for so long I don't think we should change it without the appropriate groundwork laid. You are of course, free to fork if you need this functionality.

gfinak mentioned this issue Mar 18, 2020

return to dependence on ARM 1.10.1 #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working weights are not updated properly in bayesglm.R #110

Working weights are not updated properly in bayesglm.R #110

NikTuzov commented Apr 16, 2019

gfinak commented Apr 16, 2019

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019

gfinak commented Apr 16, 2019 •

edited

Loading

amcdavid commented Apr 16, 2019

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019 •

edited

Loading

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019

NikTuzov commented Apr 17, 2019

gfinak commented Apr 17, 2019

NikTuzov commented Apr 17, 2019

gfinak commented Apr 17, 2019

suyusung commented Apr 21, 2019 •

edited

Loading

NikTuzov commented Mar 18, 2020

gfinak commented Mar 18, 2020

amcdavid commented Mar 18, 2020 via email

gfinak commented Mar 18, 2020

amcdavid commented Mar 18, 2020 via email

gfinak commented Nov 19, 2020

NikTuzov commented Nov 19, 2020

amcdavid commented Nov 20, 2020 •

edited

Loading

NikTuzov commented Jan 4, 2021

NikTuzov commented Mar 12, 2021

amcdavid commented Mar 15, 2021

Working weights are not updated properly in bayesglm.R #110

Working weights are not updated properly in bayesglm.R #110

Comments

NikTuzov commented Apr 16, 2019

gfinak commented Apr 16, 2019

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019

gfinak commented Apr 16, 2019 • edited Loading

amcdavid commented Apr 16, 2019

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019 • edited Loading

gfinak commented Apr 16, 2019

amcdavid commented Apr 16, 2019

NikTuzov commented Apr 17, 2019

gfinak commented Apr 17, 2019

NikTuzov commented Apr 17, 2019

gfinak commented Apr 17, 2019

suyusung commented Apr 21, 2019 • edited Loading

NikTuzov commented Mar 18, 2020

gfinak commented Mar 18, 2020

amcdavid commented Mar 18, 2020 via email

gfinak commented Mar 18, 2020

amcdavid commented Mar 18, 2020 via email

gfinak commented Nov 19, 2020

NikTuzov commented Nov 19, 2020

amcdavid commented Nov 20, 2020 • edited Loading

NikTuzov commented Jan 4, 2021

NikTuzov commented Mar 12, 2021

amcdavid commented Mar 15, 2021

gfinak commented Apr 16, 2019 •

edited

Loading

amcdavid commented Apr 16, 2019 •

edited

Loading

suyusung commented Apr 21, 2019 •

edited

Loading

amcdavid commented Nov 20, 2020 •

edited

Loading