How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

lukaszbachman · 2021-07-29T09:58:36Z

Hi,

I'm building an SVM model with relatively large set of features (10k+). After the model is trained well I'd like to give my users a good explanation of what drives model's decision making. I found out that Tribuo supports LIME for this purpose and decided to give it a go, however I'm finding it hard to extract meaningful information from the API. The problem I'm trying to solve could be defined as follows:

Given

large set of features F
SVM classifier trained to recognize classes A & B
random sample (known, or previously unknown to the classifier) of class A

I'd like to get an information about

a subset of N-features F that drive the classifier most to predict the sample being of class A (correct class)
a subset of N-features F that drive the classifier most to predict the sample being of class B (incorrect class)

Example, for textual sample:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
... where each word is a separate feature, and the classifier is trained to search for this popular dummy text, I'd like to see an output like this:

Positive features: lorem:1.0, ipsum: 1.0
Negative features: sit: -0.5, do: -0.9

Thus far I was able to successfully run LIME using LIMEColumnar class, but the only meaningful feature names I could extract were using the following call:

LIMEExplanation explain = lime.explain(...);
explain.getModel().getActiveFeatures(); // <- here

This line always gives me the names of the features which are most significant, but I don't get any weights associated with them (so I cannot prioritize them), nor I am getting any feedback about the negative features.

Is your question about a specific Tribuo class?
LIMEExplanation, LIMEColumnar

System details

Tribuo version: 4.1.0

The text was updated successfully, but these errors were encountered:

Craigacp · 2021-07-29T17:48:01Z

You want explain.getModel().getTopFeatures(int n) which will return Map<String,List<Pair<String,Double>>> where the map key is the class label, the String value in the pair is the feature name and the double is the importance. That should give you the per class information you want. If you set n to -1 you should get all the features back in a ranked list by absolute value of their importance.

While running through this to check I just noticed that LIMEExplanation.getActiveFeatures() doesn't work if the model doesn't return a per class list of active features, which is true of CARTJointRegressionTree so we'll get a fix for that in soon. As LIME is supposed to explain the classes individually then CARTJointRegressionTree probably won't do what you want as it doesn't have per class feature importances, but it's a valid sparse model so it shouldn't cause an NPE.

lukaszbachman added the question General question label Jul 29, 2021

Craigacp mentioned this issue Jul 30, 2021

Fixing an NPE in LIMEExplanation.getActiveFeatures() #157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

lukaszbachman commented Jul 29, 2021 •

edited

Loading

Craigacp commented Jul 29, 2021 •

edited

Loading

How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

Comments

lukaszbachman commented Jul 29, 2021 • edited Loading

Craigacp commented Jul 29, 2021 • edited Loading

lukaszbachman commented Jul 29, 2021 •

edited

Loading

Craigacp commented Jul 29, 2021 •

edited

Loading