Skip to content

Latest commit

 

History

History
139 lines (119 loc) · 5.24 KB

Assignment8.md

File metadata and controls

139 lines (119 loc) · 5.24 KB

For this assignment I have tried to re-create the #glm() assignment from two weeks ago using bayesien tools. I did not include the Male vs Female differences, because I was not able to find a way to combine continuous and discrete variables that did not break everything.

#Packages Used ----
library(dplyr)
library(R2jags)
library(dotwhisker)
library(emdbook)
library(lattice)

I removed unnecessary columns from the data because the model was complaining a lot when I left them as is. I have re-analyzed the data using the frequentest approach below.

# Data and Frequentest model results ----
Plot_ready <- read.csv('Cor_data.csv')

Plot_ready <- Plot_ready %>% 
  select(-X, -TIME_PERIOD)
  

summary(glm(new_HIV~poly(M_to_C, 2),Plot_ready,family=gaussian(link = "identity")))
## 
## Call:
## glm(formula = new_HIV ~ poly(M_to_C, 2), family = gaussian(link = "identity"), 
##     data = Plot_ready)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1660.13   -588.84     19.32    490.21   1582.97  
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       24796.8      207.7 119.369  < 2e-16 ***
## poly(M_to_C, 2)1  35996.8      929.0  38.748  < 2e-16 ***
## poly(M_to_C, 2)2  -8007.1      929.0  -8.619  1.3e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 863055.1)
## 
##     Null deviance: 1374557363  on 19  degrees of freedom
## Residual deviance:   14671937  on 17  degrees of freedom
## AIC: 334.87
## 
## Number of Fisher Scoring iterations: 2

To create the model on the .bug file I regenerated the polynomial prediction. Since the rate of transmission from mother to child is per 100, I chose 50 as the mean and left the precision wide open. I was not too sure what would be more appropriate. I was unclear and sort of still am on whether the numbers within #dnorm() are the values of that parameter or the values w.r.t. to the response variable. I tried several different numbers large and small and the RHat and n.eff did not change much, so I was not sure how one evaluates the quality of one model over the other. In the end i just stuck to my original assumptions as stated below.

I kept the distributions as normal primarily because that was what I assumed when I created the GLM about this dataset. I did try different distribution, but several of them particularly the uniform and logistic for some other types of data made the model complain a lot. I was not too clear how to interpret the errors for non-continuous data.

I ran the model with JAGS and plotted the #dotwhisker plot. Clearly the numbers don’t match up well to the frequentest numbers from the start of this exercise. Perhaps my assumptions were too small? For a polynomial model like this one, I do wonder how one goes about plotting this the way the #geom_smooth() plots it to the dataset.

I am tempted to go back and change the priors, but I think that defeats the purpose of them being priors. If that is the case, I wonder what one is meant to do at this point. How can I improve upon this to make it more meaningful to my dataset. In this particular example I did not include other predictor variable, I suspect the number may be more fluid if there was more of those as well.

# R2jags model setup ----
set.seed(1234)
N <- 20
jags1 <- jags(model.file='ass8.bug',
              parameters=c("mM_to_C", "mM_to_C2", "int"),
              data = list(M_to_C = Plot_ready$M_to_C, new_HIV = Plot_ready$new_HIV, N = N),
              n.chains = 3,
              inits=NULL); jags1
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 20
##    Unobserved stochastic nodes: 4
##    Total graph size: 130
## 
## Initializing model

## Inference for Bugs model at "ass8.bug", fit using jags,
##  3 chains, each with 2000 iterations (first 1000 discarded)
##  n.sims = 3000 iterations saved
##          mu.vect sd.vect     2.5%     25%     50%     75%   97.5%  Rhat n.eff
## int        6.257 937.521 -161.092 -34.779  36.012 103.366 232.671 1.001  3000
## mM_to_C   18.697   4.062   11.970  16.344  18.605  21.164  26.010 1.001  3000
## mM_to_C2 298.190 146.689   70.284 217.968 295.874 371.561 517.797 1.001  3000
## deviance 392.846   6.515  380.390 388.941 393.092 397.170 404.233 1.001  3000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 21.2 and DIC = 414.1
## DIC is an estimate of expected predictive error (lower deviance is better).
mm <- as.mcmc.bugs(jags1$BUGSoutput)
print(xyplot(mm,layout=c(2,3)))

print(densityplot(mm,layout=c(2,3)))

print(dwplot(jags1))