Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract a human friendly explanation for most significant features in a model (using LIME)? #156

Open
lukaszbachman opened this issue Jul 29, 2021 · 1 comment
Labels
question General question

Comments

@lukaszbachman
Copy link

lukaszbachman commented Jul 29, 2021

Hi,

I'm building an SVM model with relatively large set of features (10k+). After the model is trained well I'd like to give my users a good explanation of what drives model's decision making. I found out that Tribuo supports LIME for this purpose and decided to give it a go, however I'm finding it hard to extract meaningful information from the API. The problem I'm trying to solve could be defined as follows:

Given

  • large set of features F
  • SVM classifier trained to recognize classes A & B
  • random sample (known, or previously unknown to the classifier) of class A

I'd like to get an information about

  • a subset of N-features F that drive the classifier most to predict the sample being of class A (correct class)
  • a subset of N-features F that drive the classifier most to predict the sample being of class B (incorrect class)

Example, for textual sample:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
... where each word is a separate feature, and the classifier is trained to search for this popular dummy text, I'd like to see an output like this:

  • Positive features: lorem:1.0, ipsum: 1.0
  • Negative features: sit: -0.5, do: -0.9

Thus far I was able to successfully run LIME using LIMEColumnar class, but the only meaningful feature names I could extract were using the following call:

LIMEExplanation explain = lime.explain(...);
explain.getModel().getActiveFeatures(); // <- here

This line always gives me the names of the features which are most significant, but I don't get any weights associated with them (so I cannot prioritize them), nor I am getting any feedback about the negative features.

Is your question about a specific Tribuo class?
LIMEExplanation, LIMEColumnar

System details

  • Tribuo version: 4.1.0
@lukaszbachman lukaszbachman added the question General question label Jul 29, 2021
@Craigacp
Copy link
Member

Craigacp commented Jul 29, 2021

You want explain.getModel().getTopFeatures(int n) which will return Map<String,List<Pair<String,Double>>> where the map key is the class label, the String value in the pair is the feature name and the double is the importance. That should give you the per class information you want. If you set n to -1 you should get all the features back in a ranked list by absolute value of their importance.

While running through this to check I just noticed that LIMEExplanation.getActiveFeatures() doesn't work if the model doesn't return a per class list of active features, which is true of CARTJointRegressionTree so we'll get a fix for that in soon. As LIME is supposed to explain the classes individually then CARTJointRegressionTree probably won't do what you want as it doesn't have per class feature importances, but it's a valid sparse model so it shouldn't cause an NPE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants