Bandwidth factor "experimental" for ArviZ kernel density estimate #1648

davidedalbosco · 2021-03-31T15:38:22Z

davidedalbosco
Mar 31, 2021

I am new to ArviZ, so I am sorry if my questions are too painfully obvious.

I wanted to reproduce the ArviZ kernel density estimate in arviz.plot_kde with the Scipy function scipy.stats.gaussian_kde.
I am aware that the KDE depends on the bandwidth factor chosen and that there are a couple of thumb-rules to choose it (e.g., the Silverman and the Scott rule).

In the ArviZ documentation, I read that the default bandwidth factor is bw='experimental'. I was wondering what the meaning of this option (possibly with a formula) is.

Attached is a comparison between the scipy.stats.gaussian_kde (with the default 'scott' option for the bandwidth) and arviz.plot_kde.
If I tune the bandwidth factor by hand for the Scipy function, I get a plot that is similar to the ArviZ plot (the red dotted line).

Another question I wanted to ask is how ArviZ treats the boundaries/tails. I read that Arviz does not plot the KDE outside the region where I have data. I agree that this is a good feature.

I thought that I could replicate the same behavior with the Scipy function by restricting the plot to the interval between the minimum and maximum of my data. However, as you can see in the previous picture, at the edge of the histogram, the Arviz function values are quite different from the Scipy ones (even if I tune the bandwidth factor). I guess this is due to conserving the probability under the KDE estimates.

Therefore, I wanted to know how ArviZ adjusts the KDE at the boundaries of the data.

Thank you in advance for the help.

Answered by tomicapretto

Mar 31, 2021

Thank you so much @OriolAbril for tagging me, I wouldn't have seen this otherwise.

Hi @davidedalbosco,

I know that "experimental" is not the best of the names I could have chosen, sorry for that. This is the report that Oriol refers to. And you can find here more of the notes I've been writing when trying to implement the new KDE in ArviZ. But let me try to give you an explanation about what ArviZ does with "experimental" bandwidth.

First of all, the experimental bandwidth is computed here

arviz/arviz/stats/density_utils.py

Lines 79 to 83 in b71c83b

     def _bw_experimental(x, grid_counts=None, x_std=None, x_range=None):  
   """Experimental bandwidth estimator."""  
   bw_silverman 

View full answer

OriolAbril · 2021-03-31T19:20:43Z

OriolAbril
Mar 31, 2021
Maintainer

Tagging @tomicapretto as he is the one that actually did the work.

I know (and remember reading) he did some reports and summaries of the multiple alternatives, but I can't remember where they are. There is a quick overview at #1284 but I can't seem to find the link to the report.

0 replies

tomicapretto · 2021-03-31T19:36:12Z

tomicapretto
Mar 31, 2021
Maintainer

Thank you so much @OriolAbril for tagging me, I wouldn't have seen this otherwise.

Hi @davidedalbosco,

I know that "experimental" is not the best of the names I could have chosen, sorry for that. This is the report that Oriol refers to. And you can find here more of the notes I've been writing when trying to implement the new KDE in ArviZ. But let me try to give you an explanation about what ArviZ does with "experimental" bandwidth.

First of all, the experimental bandwidth is computed here

arviz/arviz/stats/density_utils.py

Lines 79 to 83 in b71c83b

    
           def _bw_experimental(x, grid_counts=None, x_std=None, x_range=None): 
        
               """Experimental bandwidth estimator.""" 
        
               bw_silverman = _bw_silverman(x, x_std=x_std) 
        
               bw_isj = _bw_isj(x, grid_counts=grid_counts, x_range=x_range) 
        
               return 0.5 * (bw_silverman + bw_isj)

Don't bother about function arguments, they are passed to avoid computing some things more than once.

As we can see there, the experimental bandwidth is just the average of two other bandwidths. Silverman's rule and ISJ which stands for Improved Sheather Jones bandwidth. The reason why I've chosen to average those two results can be found in the report. But I basically did a lot of simulations and concluded it gave better results.

On the other hand, there is a boundary correction applied in the tails because the KDE assumes the range of the variable goes from -infinity to +infinity, but we restrict it to the observed domain. That is explained in this specific notebook.

Feel free to ask more questions if the documentation is not clear or if you are not sure about something I've said.

PS The method used to estimate the density function is still a Gaussian Kernel Density Estimator.

1 reply

davidedalbosco Apr 1, 2021
Author

Thank you so much for the very detailed answer! I will read the reports (hopefully they are not too technical for my level).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bandwidth factor "experimental" for ArviZ kernel density estimate #1648

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

	def _bw_experimental(x, grid_counts=None, x_std=None, x_range=None):
	"""Experimental bandwidth estimator."""
	bw_silverman

Bandwidth factor "experimental" for ArviZ kernel density estimate #1648

davidedalbosco Mar 31, 2021

Replies: 2 comments · 1 reply

OriolAbril Mar 31, 2021 Maintainer

tomicapretto Mar 31, 2021 Maintainer

davidedalbosco Apr 1, 2021 Author

davidedalbosco
Mar 31, 2021

Replies: 2 comments 1 reply

OriolAbril
Mar 31, 2021
Maintainer

tomicapretto
Mar 31, 2021
Maintainer

davidedalbosco Apr 1, 2021
Author