What's the difference in using more nquantiles? #1583
-
Setup Information
ContextI'm using the xclim module applied for bias correction (xclim.sdba.adjustment) for daily data from the historical period (7305 times) and future from the CMIP6 models. Code of Conduct
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
As this number is essentially the resolution to which the distribution is resolved, it's usually a matter of performance : the more points you have, the heavier the computation is. I am not sure of this, but I would fear creating situations with overfitting if you used "all" quantiles, meaning you're merely sorting the timeseries, not computing any quantile per se. But I have to dig a bit more in the literature to provide a better answer. However, as I hinted above, the use of EDIT: I see that Cannon's own R package uses all quantiles per default... I guess that nulls my overfitting fear and makes the |
Beta Was this translation helpful? Give feedback.
-
Hi @Joaogmr472, It took some time before I had time to come back to this issue. I discussed it with others and did some quick tests and I have a few conclusions : One issue with taking as many quantiles as values is indeed overfitting, but particularly on the extremes. An example I actually encountered with xclim is related to issue #1015. If we were to use the sorted timeseries as I suggested above, it would assign the quantiles 0 and 1 to the minimum and maximum of the series. In the adjust step, this has the effect of clipping the scenario to the range of the reference, at least within the reference period. The default way xclim chooses its quantiles (after this issue was solved) is to divide the [0, 1] range in N + 1 parts (where In a recent project I used 50, which mean the min and max quantiles are 0.01 and 0.99. It seems to have performed well without overfitting the extremes (but this is hard to exhaustively analyze...). Also I was using However, this is not heavily documented, but you can pass an array to The last technique would still be slower than my proposition, and would be inexact when grouping is used. I do not have time to implement something to support this I hope that answers your question better. I know it's been a long time, but hopefully, this will help other people as well... |
Beta Was this translation helpful? Give feedback.
Hi @Joaogmr472,
It took some time before I had time to come back to this issue. I discussed it with others and did some quick tests and I have a few conclusions :
One issue with taking as many quantiles as values is indeed overfitting, but particularly on the extremes. An example I actually encountered with xclim is related to issue #1015. If we were to use the sorted timeseries as I suggested above, it would assign the quantiles 0 and 1 to the minimum and maximum of the series. In the adjust step, this has the effect of clipping the scenario to the range of the reference, at least within the reference period.
The default way xclim chooses its quantiles (after this issue was solved) is to…