Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared contributions in ANOVA #120

Open
wevertonbio opened this issue Jun 7, 2023 · 4 comments
Open

Shared contributions in ANOVA #120

wevertonbio opened this issue Jun 7, 2023 · 4 comments

Comments

@wevertonbio
Copy link

wevertonbio commented Jun 7, 2023

Hi Maximilian,

I have a question regarding the shared contributions in ANOVA. If I understand correctly, after reviewing the get_shared_anova function, the shared contributions are calculated in the following manner:

F_ABS = Full - F_BS - F_AB- F_AS- F_A- F_B - F_S

A = F_A + F_AB*abs(F_A)/(abs(F_A)+abs(F_B)) + F_AS*abs(F_A)/(abs(F_S)+abs(F_A))+ F_ABS*abs(F_A)/(abs(F_A)+abs(F_B)+abs(F_S))
B = F_B + F_AB*abs(F_B)/(abs(F_A)+abs(F_B)) + F_BS*abs(F_B)/(abs(F_S)+abs(F_B)) + F_ABS*abs(F_B)/(abs(F_A)+abs(F_B)+abs(F_S))
S = F_S + F_AS*abs(F_S)/(abs(F_S)+abs(F_A)) + F_BS*abs(F_S)/(abs(F_S)+abs(F_B))+ F_ABS*abs(F_S)/(abs(F_A)+abs(F_B)+abs(F_S))

However, in this recent paper, the authors mentioned that they also followed the approach of Leibold et al. (2022), but they calculated the shared R2 differently by employing a Type III ANOVA. They explained that:

“The fractional variance explained by each model component and interaction is determined using a Type III variance partitioning approach (Peres-Neto et al. 2006), working backwards from the most complex model to identify single effects and overlap terms (Fig. 1b). Then, to more efficiently summarise these seven quantities, we again follow the approach of Leibold et al. and group the sub-partitions of the total variance explained into three partitions of interest. The sub-partitions overlapping with codistribution (Venn segments ‘f', ‘g' and ‘e' in Fig. 1b) are assigned to the spatial and environmental partitions, respectively, because the flexible latent variables are likely to efficiently capture residual variance in the combined models. The variance explained by the combination of environmental and spatial predictors (Venn segments ‘g' and ‘d') is split 50:50 between them.”

They used the following formula to compute the shared contributions:

a <- Full - F_BS
b <- Full - F_AB
c <- Full - F_AS
d <- Full - F_A - (a+b)
e <- Full - F_S - (b + c)
f <- Full - F_B - (a + c)
g <- Full - (a + b + c + d + e + f)

A <- a + f + ((g+d)/2)
B <- c
S <- b + e + ((g + d)/2)

I ran a simulation to see the difference between the approaches, and it appears that the method used in this paper in Ecography gives greater contribution to spatial factor and less contribution to environmental and biotic factors:

#### Testing ANOVA - sjSDM x Other paper####
library(sjSDM)

#Simulate community
com <- simulate_SDM(env = 10L, species = 20L, sites = 100L, 
                   link = "probit", response = "pa", correlation = T)
X = com$env_weights
Y = com$response

# add spatial residuals (create coordinates and use spatial distance matrix to draw autocorrelated residuals for each species)
XYcoords = matrix(rnorm(200), 100, 2)+2
WW = as.matrix(dist(XYcoords))
spatialResiduals = mvtnorm::rmvnorm( 5L, sigma = exp(-WW))

#Trend surface model - linear
#The idea of the trend surface model is to use the spatial coordinates within a polynom:
  
colnames(XYcoords) = c("XX", "YY")
model = sjSDM(Y, 
              env = linear(X, ~.), 
              spatial = linear(XYcoords, ~0+XX+YY+XX:YY+I(XX^2)+I(YY^2)), 
              iter = 100L)
summary(model)
 #Anova
an = anova(model)
  #See plot
plotInternalStructure(an)

  #Calculate shared contributions following paper in Ecography
a_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_BS
b_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_AB
c_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_AS
d_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_A - (a_sp + b_sp)
e_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_S - (b_sp + c_sp)
f_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_B - (a_sp + c_sp)
g_sp <- an$species$R2_Nagelkerke$Full - (a_sp + b_sp + c_sp + d_sp + e_sp + f_sp)

A_sp <- a_sp + f_sp + ((g_sp+d_sp)/2)
B_sp <- c_sp
S_sp <- b_sp + e_sp + ((g_sp + d_sp)/2)

#Save results in dataframe to compare
df_compare <- data.frame(Env_paper = A_sp, 
                    Env_sjSDM = an$species$R2_Nagelkerke_shared$A,
                    Codist_paper = B_sp,
                    Codist_sjSDM = an$species$R2_Nagelkerke_shared$B,
                    Spa_paper = S_sp,
                    Spa_sjSDM = an$species$R2_Nagelkerke_shared$S,
                    Full = an$spe$R2_Nagelkerke_shared$R2)
print(df_compare)

      Env_paper Env_sjSDM Codist_paper Codist_sjSDM Spa_paper     Spa_sjSDM      Full
1   0.190634790 0.3691559   0.05590992  0.113425606 0.2757595  0.0397227666 0.5223043
2   0.017480792 0.4948966   0.07118038  0.090479062 0.5008093  0.0040948023 0.5894705
3   0.027128123 0.2497077   0.17326470  0.298559501 0.3527546  0.0048802866 0.5531474
4   0.159386082 0.3960366   0.06710510  0.137696320 0.3216548  0.0144130710 0.5481460
5   0.237982042 0.3397704   0.07936645  0.139714579 0.1948392  0.0327026591 0.5121877
6   0.075866454 0.3271695   0.14152702  0.158512732 0.2848715  0.0165827027 0.5022650
7  -0.003220567 0.2763486   0.17390710  0.183803383 0.2962960  0.0068305070 0.4669825
8   0.139811892 0.2710453   0.05353578  0.149188795 0.2338785  0.0069920706 0.4272262
9   0.198103766 0.4048185   0.07980100  0.088562475 0.2484805  0.0330042365 0.5263852
10  0.173289407 0.4833002   0.05406371 -0.011254905 0.3128330  0.0681407875 0.5401861
11  0.185398791 0.3117312   0.13317844  0.134051711 0.1514607  0.0242549863 0.4700379
12  0.057875243 0.3154176   0.05498587  0.077921105 0.2968093  0.0163316959 0.4096704
13  0.070870719 0.3365080   0.15657247  0.065102842 0.2256606  0.0514929720 0.4531038
14  0.088952103 0.4628139   0.13291419  0.104145631 0.3567033  0.0116100571 0.5785696
15  0.021959734 0.3312204   0.08202430  0.164399928 0.3917573  0.0001210638 0.4957414
16  0.203409760 0.3730823   0.14039992  0.111934582 0.1886704  0.0474631494 0.5324800
17  0.252148740 0.4156097   0.08893330  0.073765853 0.1843577  0.0360641844 0.5254398
18  0.184886042 0.3961516   0.07554385  0.128826183 0.2601085 -0.0044393576 0.5205384
19  0.289131778 0.4818471   0.05900857  0.006999452 0.1720485  0.0313423072 0.5201888
20  0.099654769 0.3503399   0.09499948  0.113529195 0.2873677  0.0181528901 0.4820220

The sjSDM results make more sense to me, but I'd like to know if I'm understanding the analysis correctly or if I'm missing something. Is there a conceptual distinction between the two methods? If so, what is the "best" method?

Thanks!

@florianhartig
Copy link
Member

Hi Weverton,

these are good questions - as coauthor of the Leibold paper, so let me try to clarify:

  1. In Leibold et al., 2022, we did something close but not identical to how it is currently in sJSDM, and we called this a type III Anova. The difference to the current sjSDM implementation is that the shared fractions are partitioned equally (see Appendix), while in sjSDM, Max is partitioning (if I'm not mistaken) proportional to the unique fractions. However, both are modifications of the classical type III Anova.

  2. It seems that the paper you references here is doing a classical type III Anova where they discard the shared fractions. I suspect the authors just read the main text in our paper and assumed we are doing a classical type III Anova and thus referenced the Leibold paper.

  3. I would argue that in spirit, all these options are type III Anovas, but it is true that in reality, values can be quite different so it is probably confusing to call all of them type III Anovas and we maybe need to find a better name, such as "modified type III" for what is currently in sjSDM. In any case, as you say, from experimenting we think distributing the shared fractions delivers more sensible results, although I have to admit that we have no study / paper that is rigorously showing that this is a good idea (such a study may be forthcoming, but I don't know yet)

@dansmi-hub
Copy link

I just wanted to ask if the above Anova function is bugged or If I have a bad installation.

I initially thought I just had a very bad fitting model and came looking here for solutions, but I've just tried replicating the above code verbatim and my results are very different as shown below. Nagelkerke values almost looks normalised?

> an$species$R2_Nagelkerke |> str()
List of 16
 $ F_A      : num [1:20] -0.677 -0.677 -0.69 -0.488 -0.34 ...
 $ F_B      : num [1:20] -0.1066 -0.1992 -0.1112 -0.3338 -0.0485 ...
 $ F_S      : num [1:20] -0.00704 -0.043 0.00213 -0.03505 -0.02269 ...
 $ F_AB     : num [1:20] 0.0038 -0.0508 -0.1633 0.1108 -0.1711 ...
 $ F_AS     : num [1:20] -0.02453 -0.01002 -0.00804 -0.01503 -0.03843 ...
 $ F_BS     : num [1:20] -0.01564 0.00589 -0.05253 0.04669 -0.06492 ...
 $ F_ABS    : num [1:20] -0.0198 0.0183 0.0668 -0.0703 0.0989 ...
 $ A        : num [1:20] -0.746 -0.747 -0.849 -0.438 -0.468 ...
 $ B        : num [1:20] -0.142 -0.23 -0.27 -0.21 -0.178 ...
 $ S        : num [1:20] -0.0686 -0.0281 0.012 -0.072 -0.0191 ...
 $ AB       : num [1:20] -6.89 -6.14 -6.65 -7.5 -7.98 ...
 $ AS       : num [1:20] -7.67 -7.21 -7.51 -9.95 -8.21 ...
 $ BS       : num [1:20] -12.1 -10.5 -11.9 -11.2 -10.8 ...
 $ Full     : num [1:20] -0.976 -1.173 -1.158 -0.892 -0.677 ...
 $ Saturated: num [1:20] -2.74 -2.66 -2.87 -2.75 -2.65 ...
 $ Null     : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...

Unsure if McFadden is affected by this also but results look more reasonable.

> an$species$R2_McFadden |> str()
List of 16
 $ F_A      : num [1:20] 0.377 0.383 0.374 0.29 0.218 ...
 $ F_B      : num [1:20] 0.0739 0.1346 0.0752 0.21 0.0353 ...
 $ F_S      : num [1:20] 0.00512 0.03119 -0.00152 0.02512 0.01669 ...
 $ F_AB     : num [1:20] -0.00278 0.03668 0.10785 -0.08561 0.11749 ...
 $ F_AS     : num [1:20] 0.01769 0.00738 0.00571 0.01088 0.02804 ...
 $ F_BS     : num [1:20] 0.01133 -0.00438 0.0365 -0.03486 0.04678 ...
 $ F_ABS    : num [1:20] 0.0143 -0.0137 -0.0493 0.0495 -0.0774 ...
 $ A        : num [1:20] 0.407 0.414 0.438 0.265 0.286 ...
 $ B        : num [1:20] 0.0968 0.1532 0.1702 0.1391 0.1221 ...
 $ S        : num [1:20] 0.04843 0.02054 -0.00863 0.05067 0.01406 ...
 $ AB       : num [1:20] 1.51 1.46 1.45 1.56 1.63 ...
 $ AS       : num [1:20] 1.58 1.56 1.53 1.75 1.65 ...
 $ BS       : num [1:20] 1.88 1.81 1.83 1.82 1.83 ...
 $ Full     : num [1:20] 0.497 0.575 0.548 0.465 0.384 ...
 $ Saturated: num [1:20] 0.963 0.962 0.965 0.963 0.963 ...
 $ Null     : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...

Session info below.

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22000)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sjSDM_1.0.6

loaded via a namespace (and not attached):
 [1] tensorA_0.36.2     gtable_0.3.3       ggplot2_3.4.3      htmlwidgets_1.6.2  devtools_2.4.5     remotes_2.4.2.1   
 [7] processx_3.8.2     lattice_0.21-8     callr_3.7.3        mathjaxr_1.6-0     vctrs_0.6.3        tools_4.3.1       
[13] ps_1.7.5           generics_0.1.3     curl_5.0.2         tibble_3.2.1       fansi_1.0.4        DEoptimR_1.1-1    
[19] proto_1.0.0        pkgconfig_2.0.3    Matrix_1.6-1       checkmate_2.2.0    desc_1.4.2         lifecycle_1.0.3   
[25] farver_2.1.1       compiler_4.3.1     stringr_1.5.0      munsell_0.5.0      httpuv_1.6.11      htmltools_0.5.6   
[31] usethis_2.2.2      hexbin_1.28.3      ggtern_3.4.2       later_1.3.1        pillar_1.9.0       crayon_1.5.2      
[37] urlchecker_1.0.1   MASS_7.3-60        ellipsis_0.3.2     cachem_1.0.8       sessioninfo_1.2.2  robustbase_0.99-0 
[43] mime_0.12          tidyselect_1.2.0   digest_0.6.33      mvtnorm_1.2-2      stringi_1.7.12     dplyr_1.1.2       
[49] purrr_1.0.2        labeling_0.4.2     latex2exp_0.9.6    rprojroot_2.0.3    fastmap_1.1.1      grid_4.3.1        
[55] colorspace_2.1-0   cli_3.6.1          magrittr_2.0.3     compositions_2.0-6 pkgbuild_1.4.2     utf8_1.2.3        
[61] withr_2.5.0        prettyunits_1.1.1  scales_1.2.1       promises_1.2.1     backports_1.4.1    rappdirs_0.3.3    
[67] gridExtra_2.3      reticulate_1.31    png_0.1-8          memoise_2.0.1      shiny_1.7.5        miniUI_0.1.1.1    
[73] profvis_0.3.8      rlang_1.1.1        Rcpp_1.0.11        bayesm_3.1-5       xtable_1.8-4       glue_1.6.2        
[79] pkgload_1.3.2.1    rstudioapi_0.15.0  jsonlite_1.8.7     plyr_1.8.8         R6_2.5.1           fs_1.6.3 

@dansmi-hub
Copy link

Hi,

After some further investigation, I think this function is not working as intended in later packages. A working example below of differences between v1.0.4 and v1.0.5.

I have been able to replicate this issue on Windows, Fedora and in the rocker/verse docker

#### Testing ANOVA - sjSDM V1.0.4 vs V1.0.5 ###

# Choose version of sjSDM
# devtools::install_version("sjSDM", "1.0.4")
# devtools::install_version("sjSDM," "1.0.5")

library(sjSDM) 

#Simulate community
com <- simulate_SDM(env = 10L, species = 20L, sites = 100L, 
                    link = "probit", response = "pa", correlation = T)
X = com$env_weights
Y = com$response

# add spatial residuals (create coordinates and use spatial distance matrix to draw autocorrelated residuals for each species)
XYcoords = matrix(rnorm(200), 100, 2)+2
WW = as.matrix(dist(XYcoords))
spatialResiduals = mvtnorm::rmvnorm( 5L, sigma = exp(-WW))

#Trend surface model - linear
#The idea of the trend surface model is to use the spatial coordinates within a polynom:

colnames(XYcoords) = c("XX", "YY")
model = sjSDM(Y, 
              env = linear(X, ~.), 
              spatial = linear(XYcoords, ~0+XX+YY+XX:YY+I(XX^2)+I(YY^2)), 
              iter = 100L)
summary(model)
#Anova
an = anova(model)
#See plot
plotInternalStructure(an)

#Calculate shared contributions following paper in Ecography
a_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_BS
b_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_AB
c_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_AS
d_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_A - (a_sp + b_sp)
e_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_S - (b_sp + c_sp)
f_sp <- an$species$R2_Nagelkerke$Full - an$species$R2_Nagelkerke$F_B - (a_sp + c_sp)
g_sp <- an$species$R2_Nagelkerke$Full - (a_sp + b_sp + c_sp + d_sp + e_sp + f_sp)

A_sp <- a_sp + f_sp + ((g_sp+d_sp)/2)
B_sp <- c_sp
S_sp <- b_sp + e_sp + ((g_sp + d_sp)/2)

#Save results in dataframe to compare
df_compare <- data.frame(Env_paper = A_sp, 
                         Env_sjSDM = an$species$R2_Nagelkerke_shared$A,
                         Codist_paper = B_sp,
                         Codist_sjSDM = an$species$R2_Nagelkerke_shared$B,
                         Spa_paper = S_sp,
                         Spa_sjSDM = an$species$R2_Nagelkerke_shared$S,
                         Full = an$spe$R2_Nagelkerke_shared$R2)
print(df_compare)

Results from sjSDM 1.0.4

Env_paper Env_sjSDM Codist_paper Codist_sjSDM Spa_paper     Spa_sjSDM      Full
1  0.14105239 0.4068474   0.09095082  0.112259612 0.3127524  0.0256486080 0.5447556
2  0.07697552 0.3798884   0.11980396  0.039457867 0.2245069  0.0019400378 0.4212863
3  0.21345779 0.5193623   0.04123956  0.015349171 0.2787738 -0.0012403493 0.5334712
4  0.07349363 0.4661765   0.17042852  0.105000213 0.3529075  0.0256529076 0.5968297
5  0.08652678 0.3310315   0.10983393  0.077443195 0.2222342  0.0101201290 0.4185949
6  0.08475070 0.3421386   0.16028724  0.192562980 0.3157704  0.0261067517 0.5608083
7  0.11833327 0.4464703   0.12519321  0.090129025 0.3064248  0.0133520029 0.5499513
8  0.16060651 0.3612260   0.08126246  0.160345105 0.2923881  0.0126859511 0.5342571
9  0.11850816 0.3073561   0.11849037  0.189881297 0.2600831 -0.0001557253 0.4970816
10 0.11560227 0.3611234   0.08783732  0.038777880 0.2098071  0.0133453938 0.4132467
11 0.21207837 0.5064937   0.05080475 -0.005273727 0.2512605  0.0129236929 0.5141437
12 0.13954251 0.3952973   0.10473461  0.091958412 0.2611776  0.0181989662 0.5054547
13 0.09120340 0.3173623   0.13664271  0.166405166 0.2731868  0.0172654635 0.5010329
14 0.10684767 0.2855876   0.10367095  0.162926222 0.2611778  0.0231826300 0.4716965
15 0.18797892 0.5103433   0.08143351  0.032156521 0.2893096  0.0162221539 0.5587220
16 0.16733183 0.3236834   0.07016174  0.161962553 0.2668003  0.0186479249 0.5042939
17 0.15729128 0.5452100   0.09626717 -0.002402936 0.3010034  0.0117547788 0.5545619
18 0.11936957 0.3400891   0.10222162  0.136485967 0.2722141  0.0172302267 0.4938053
19 0.11744335 0.3699006   0.12262376  0.139454280 0.2585607 -0.0107271021 0.4986278
20 0.18898127 0.3564803   0.04191954  0.076702106 0.2504376  0.0481560504 0.4813384

Results from sjSDM v1.0.5, the list names are different in this version to get the shared R2 values F_A vs A

#Save results in dataframe to compare
df_compare <- data.frame(Env_paper = A_sp, 
                         Env_sjSDM = an$species$R2_Nagelkerke_shared$F_A,
                         Codist_paper = B_sp,
                         Codist_sjSDM = an$species$R2_Nagelkerke_shared$F_B,
                         Spa_paper = S_sp,
                         Spa_sjSDM = an$species$R2_Nagelkerke_shared$F_S,
                         Full = an$spe$R2_Nagelkerke_shared$R2)
print(df_compare)

Env_paper  Env_sjSDM Codist_paper Codist_sjSDM    Spa_paper   Spa_sjSDM Full
1   0.096781573 -0.7449012   -1.2029526   -0.7449012 -0.118629681 -0.20203036    0
2   0.125386288 -0.5731753   -0.9460816   -0.5731753 -0.199852925 -0.18914005    0
3   0.190205715 -0.7609571   -1.3686117   -0.7609571 -0.225099487 -0.23562004    0
4   0.048850270 -0.4090372   -0.7533718   -0.4090372 -0.088925470 -0.12452295    0
5   0.034354883 -0.6795454   -0.9899062   -0.6795454 -0.137314347 -0.17561082    0
6   0.046319099 -0.6349706   -1.0901582   -0.6349706  0.001451586 -0.16283059    0
7   0.103096026 -0.3527897   -0.7233909   -0.3527897 -0.067669755 -0.15403464    0
8   0.071950519 -0.5320003   -0.8990997   -0.5320003 -0.080908447 -0.15192608    0
9   0.007066122 -0.3842563   -0.5761393   -0.3842563 -0.096639679 -0.11207241    0
10  0.032780514 -0.7201736   -1.0248781   -0.7201736 -0.087534227 -0.15095403    0
11  0.014442275 -0.5837485   -0.9115347   -0.5837485 -0.012753819 -0.14179626    0
12  0.102713060 -0.3755313   -0.6923481   -0.3755313 -0.155123775 -0.13378762    0
13 -0.023534684 -0.7649713   -1.0542971   -0.7649713  0.052998343 -0.09762888    0
14  0.106464392 -0.7162880   -1.2102969   -0.7162880 -0.112254912 -0.19165381    0
15  0.096293379 -0.5212984   -0.9536957   -0.5212984 -0.088920179 -0.18327799    0
16  0.017240236 -0.4677283   -0.6723872   -0.4677283  0.007116004 -0.07281071    0
17 -0.001573435 -0.7538327   -0.9431487   -0.7538327 -0.031036739 -0.09481277    0
18  0.001325702 -0.7350749   -1.0284069   -0.7350749 -0.010599301 -0.11879030    0
19  0.062618911 -0.4639122   -0.8362807   -0.4639122  0.012288521 -0.12697241    0
20  0.028734990 -0.4793437   -0.8656363   -0.4793437 -0.020759569 -0.14938595    0

@MaximilianPi
Copy link
Member

Hi @dansmi-hub,
thanks, I will look into this.

MaximilianPi added a commit that referenced this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants