⚠️ Warning This repository is meant for reproducability of the original paper and is no longer being maintained. For an implementation of FD-Trees that is being actively maintained, we refer to the PyFD package.
The goal of this repository is to discover regions of the input space with reduced feature interactions. These regions are idenfied as the leaves of a binary decision tree that is trained to minimize feature interactions As a result, post-hoc explainers such as PDP and SHAP increase in agreement when restricted to each region.
To create the conda environment run
conda env create --file environment.yml
conda activate FDTrees
The code relies on a C++ implementation of the Interventional TreeSHAP algorithm to efficiently compute Shapley Values, Shapley Taylor Indices, and the H tensor from the paper. To compile the C++ code, run
python3 setup.py build
If everything worked well, you should see a .so
file in a new build
directory.
All experiments are done in the experiments
directory
cd experiments
The script that start with 0_*
are toy experiments meant to illustrate how FD-Trees work.
These script can be run directly without providing arguments.
0_0_motivation.py
The first toy example in the paper (the piece-wise linear function with two regions)0_1_illustration.py
The code to reproduce Figure 2.0_2_interactions.py
A 2D example where we visualize interactions.0_3_correlations.py
A simple example where we investigate correlated features.0_4_gadget_pdp.py
Toy example to convey the intuition behind GADGET-PDP.
The remaining script are numbered 1_*
(model training), 2_*
(interaction detection),
3_*
(regional explanations computations), and 4_*
(plot results).
The reproduce our results, run the following bash scripts
Model Training ./script_train.sh
Regional Explanations ./script_explain.sh <seed>
with <seed>
taking values 0, 1, 2, 3, 4.
Stability of the Partitions ./script_stability.sh
Finally, the results of the paper are plotted via
4_0_plot_california.py
Plot the figures from Section D.3.4.4_1_plot_disagreements.py
Plot Figure 3 and Table 2 showing the explanation disagreements.4_2_partition_stability.py
Plot the Figures in Section D.1 regarding the stability of the partitions w.r.t the subsample size.