Parametric bootstrap simulations for global models #125
Unanswered
MengLu-flw
asked this question in
Q&A
Replies: 1 comment
-
@KLohse could you provide some insight here? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all :^)
I would like to perform parametric bootstraps via ‘gimble simulate’ for two purposes (1) to compare two nested models DIV and IM_BA by ΔlnCL (the improvement in fit), (2) to obtain 95% confidence intervals (CIs) of the parameters inferred under the best fitting global model.
I am not sure how to simulate a dataset partitioned into windows that is analogous to the empirical data - i.e., how to define -w (the number of windows per replicate) and -n (the number of blocks per window).
For example, here is the ‘gimble info’ based on my empirical data:
I thought to simulate an analogous dataset without down-sampling, I would have:
-l 64 (block length 64, same as I defined for my empirical data)
-w (not sure about how to define this, but I thought if each window has recombination, then it makes biological sense to me to have the number of windows equal to the number of chromosomes in my empirical data)
-n (not sure about how to define this, but I thought I would have the total number of blocks to be simulated as “empirical total block numbers (X)/empirical number of sample-sets (X) =90,953,757/49=1,856,199.122”. And I thought it would be like w*n=1,856,199.)
--replicates 100, --kmax 2,2,2,2.
--samples_A, --samples_B and --mu will be the same as my empirical data.
--model, --Ne_A, --Ne_B, --Ne_AB, --T, --me will use the same values as estimated under corresponding models based on empirical data.
My question at this step:
(1) Is my interpretation of the -w and -n correct?
(2) If I need to do down-sampling, how should I do it? i.e., if I need to do 10% down-sampling for simulated data compared to the empirical data amount, shall I just decrease the number of windows (-w) for my simulations?
Despite I didn’t fully understand how to define -w and -n (apologies), I had a go with ‘gimble simulate’ with the command below:
The recombination rate that I used is an averaged rate for plants (https://doi.org/10.1098/rstb.2016.0455, Table 1), so it is a very crude estimate...
The command above took 30h:27m:00.704s to complete.
To get the 95% CIs for the DIV model, I re-fit this simulated data to a DIV model with the same parameter boundaries that I set for my empirical dataset.
Then, I queried this label to get a summary output of these 100 replicates.
gimble query -z Lina_Gstore.z -l optimize/DIV_294w.windowsum/DIV_DIVmodelCompare
I found that the range of 100 bootstraps does not cover the estimated value of DIV based on my empirical data.
For example, the ranges based on simulated data are:
Ne_A_B: 890,126.572 - 895,261.233
T: 1,025,487.446 - 1,032,241.796
Ne_A: 457,520.165 - 462,095.671
Ne_B: 300,091.436 - 302,540.218
While, the parameters estimated based on empirical data (also defined in the simulation) are:
Ne_A_B: 1,009,867
T: 763,249
Ne_A: 372,683
Ne_B: 251,619
My question: Is this ‘deviation’ caused by the recombination rate? Or have I done something wrong when simulating data?
Thank you so much! Very looking forward to your insightful reply!
Happy New Year :^)
Meng
Beta Was this translation helpful? Give feedback.
All reactions