Skip to content

An Explainable AI technique introduced in the paper Axiomatic Attribution for Deep Networks

Notifications You must be signed in to change notification settings

diwakar-vsingh/Integrated-Gradients

Repository files navigation

Integrated Gradients

The wild success of Deep Neural Network (DNN) models in a variety of domains has created considerable excitement in the machine learning community. Although DNNs are being increasingly adopted in real-world contexts, explaining their behavior has often been difficult. Explainability is crucial for a variety of reasons, and various notions of what constitutes an explanation have been proposed. One specific type of explanation is referred to as attribution. Attributing the prediction of a deep network to its input features can offer great insight into its behavior. Furthermore, attribution methods are crucial to help the user improve their understanding of networks, debug networks, extract rules from networks, and engage better with networks. Nevertheless, attribution methods are difficult to assess empirically. Therefore, an axiomatic framework is used to overcome this problem. Consequently, a new method named Integrated Gradients (IG) based on [1] is introduced and is shown to satisfy the two important axioms for attribution methods: Sensitivity and Implementation Invariance. The IG method has become a popular interpretability technique due to its broad applicability to any differentiable model (e.g. images, text, structured data), ease of implementation, theoretical justifications, and computational efficiency relative to alternative approaches that allow it to scale to large networks and feature spaces such as images.

This project studied the IG attribution method and identified the fundamental axioms that it satisfies. Another crucial goal this project achieved was to understand the design of this novel IG method and successfully implement it on an image task using the trained network [2] and dataset. For this project, we used feature attribution in an image classification network. Since this network is differentiable, we could successfully apply the integrated gradients method. The results from the implementation were analyzed and compared with gradients at the image, an older attribution method, to demonstrate the superiority of this axiomatic attribution method. Furthermore, an important use of the attribution method is debugging model performance. To highlight this capability, the IG method was used to understand an important limitation of convolutional neural networks like Inception V1 - CNNs that they are not naturally rotationally or scale-invariant. For this purpose, the IG was applied to a wrongfully predicted zoomed-in image to get a deeper feature-level insight into why the model made an error. Finally, a case study was performed to visualize the effect of an important hyperparameter to the IG attribution method: the baseline. This project investigated the effect of using different baselines, namely black, white, uniform, blurred, Gaussian, and maximum distance, on the IG approach to determine the sensitivity of this method to the input baseline hyperparameter.

References

[1] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” arXiv preprint arXiv:1703.01365, 2017.

[2] P. K. Mudrakarta, A. Taly, M. Sundararajan, and K. Dhamdhere, “Did the model un- derstand the question?,” arXiv preprint arXiv:1805.05492, 2018.

[3] P. Sturmfels, S. Lundberg, and S.-I. Lee, “Visualizing the impact of feature attribution baselines,” Distill, vol. 5, no. 1, p. e22, 2020.

About

An Explainable AI technique introduced in the paper Axiomatic Attribution for Deep Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published