GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang
University of Virginia
ICCV 2025 (Accepted)
Global Concept Activation Vector (GCAV) is a novel framework designed to improve the interpretability of deep neural networks by addressing the fundamental limitations of traditional Testing with Concept Activation Vectors (TCAV).
Traditional TCAV faces several critical issues:
- π Unstable layer selection: The same concept may have varying importance across layers
- β Spurious activations: Certain layers assign high TCAV scores to irrelevant concepts
- π High variance in TCAV scores: Inconsistencies in concept representation across layers
GCAV unifies CAVs into a single, semantically consistent representation through:
- π― Contrastive learning to align concept representations across layers
- π§ Attention-based fusion to construct a globally integrated CAV
- π Decoder-based projection to enable TCAV evaluation on GCAV representations (TGCAV)
- β Cross-Layer Concept Consistency: Eliminates inconsistencies in TCAV scores across layers
- π― Improved Concept Localization: Enhances interpretability by refining concept attributions
- π‘οΈ Adversarial Robustness: Provides more stable explanations under adversarial perturbations
- π§ Architecture Agnostic: Works with multiple deep architectures (ResNet, GoogleNet, MobileNet)
Our framework consists of three sequential training stages:
- Layer-wise Autoencoder Training: Compress high-dimensional CAVs into unified embedding space
- Cross-Layer Alignment: Use contrastive learning with InfoNCE and consistency loss
- Global CAV Fusion: Attention-based fusion with variance and consistency optimization
# Clone the repository
git clone https://github.com/your-username/CAVFusion.git
cd CAVFusion
# Create conda environment
conda create --name gcav python=3.9
conda activate gcav
# Install dependencies
pip install -r requirements.txt
torch>=2.1.0
tensorflow>=2.18.0
tcav>=0.2.2
scikit-learn>=1.6.0
matplotlib>=3.9.0
numpy>=1.26.4
h5py>=3.12.1
Download the required datasets and pre-trained models:
# Download Broden dataset for concept images
python src/download_data.py
# Download pre-trained models (GoogleNet, ResNet50V2, MobileNetV2)
python src/load_models.py
Edit src/configs.py
to specify your experimental setup:
# Target class and concepts
target = 'zebra'
concepts = ["dotted", "striped", "zigzagged"]
# Model selection
model_to_run = 'GoogleNet' # or 'ResNet50V2', 'MobileNetV2'
# Training parameters
embed_dim = 2048
hidden_dims = [4096]
nce_loss = 1
con_loss = 3
var_loss = 3
sim_loss = 1
python src/get_cavs_main.py
# Stage 1: Train autoencoders for dimension alignment
python src/align_dim.py
# Stage 2 & 3: Train alignment and fusion models
python src/integrate_main.py
# Generate TGCAV scores and visualizations
python src/evaluate.py
# Plot comparative results
python src/plot_result.py
Our experiments on GoogleNet, ResNet50V2, and MobileNetV2 demonstrate significant improvements. The following table shows detailed results for different target classes and concepts:
Target | Concept | Method | GoogleNet | ResNet50V2 | MobileNetV2 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | CV | RR | Mean | Std | CV | RR | Mean | Std | CV | RR | |||
Spider Web | cobwebbed | TCAV | 0.808 | 0.140 | 0.173 | 0.508 | 0.644 | 0.127 | 0.197 | 0.715 | 0.619 | 0.169 | 0.273 | 0.985 |
TGCAV | 0.840 | 0.109 | 0.130 | 0.381 | 0.784 | 0.083 | 0.105 | 0.357 | 0.554 | 0.091 | 0.164 | 0.686 | ||
porous | TCAV | 0.499 | 0.123 | 0.246 | 0.862 | 0.586 | 0.098 | 0.167 | 0.717 | 0.377 | 0.059 | 0.157 | 0.584 | |
TGCAV | 0.498 | 0.037 | 0.074 | 0.281 | 0.640 | 0.030 | 0.047 | 0.156 | 0.296 | 0.032 | 0.108 | 0.338 | ||
blotchy | TCAV | 0.459 | 0.210 | 0.457 | 1.460 | 0.574 | 0.107 | 0.186 | 0.871 | 0.438 | 0.133 | 0.304 | 1.005 | |
TGCAV | 0.393 | 0.057 | 0.144 | 0.508 | 0.618 | 0.026 | 0.042 | 0.129 | 0.386 | 0.043 | 0.111 | 0.466 | ||
Zebra | dotted | TCAV | 0.463 | 0.218 | 0.470 | 1.576 | 0.470 | 0.063 | 0.135 | 0.426 | 0.484 | 0.130 | 0.268 | 0.950 |
TGCAV | 0.373 | 0.019 | 0.051 | 0.161 | 0.479 | 0.046 | 0.096 | 0.376 | 0.408 | 0.021 | 0.051 | 0.196 | ||
striped | TCAV | 0.783 | 0.185 | 0.236 | 0.664 | 0.593 | 0.073 | 0.123 | 0.421 | 0.601 | 0.145 | 0.241 | 0.832 | |
TGCAV | 0.699 | 0.042 | 0.060 | 0.200 | 0.597 | 0.050 | 0.084 | 0.318 | 0.600 | 0.017 | 0.029 | 0.083 | ||
zigzagged | TCAV | 0.647 | 0.232 | 0.359 | 0.974 | 0.578 | 0.051 | 0.088 | 0.277 | 0.618 | 0.095 | 0.153 | 0.615 | |
TGCAV | 0.503 | 0.027 | 0.055 | 0.159 | 0.597 | 0.049 | 0.083 | 0.268 | 0.628 | 0.031 | 0.049 | 0.175 |
Metrics Explanation:
- Mean: Average TCAV/TGCAV scores across layers
- Std: Standard deviation (lower = more stable)
- CV: Coefficient of variation (lower = more consistent)
- RR: Range ratio (lower = less fluctuation across layers)
- π― Reduced Variance: TGCAV significantly reduces variance in concept activation scores across layers
- π‘οΈ Adversarial Robustness: 40-60% more stable under adversarial attacks compared to traditional TCAV
- π Better Localization: More precise concept attribution with reduced spurious activations
We evaluated the robustness of GCAV against adversarial attacks using Brown's method, targeting the "dotted" concept in the "zebra" class at the GoogleNet mixed5a layer:
Method | Mixed5a | Mixed5a Change (%) | Mean | Mean Change (%) |
---|---|---|---|---|
TCAV | 0.52 | -- | 0.46 | -- |
TCAV (Attacked) | 0.96 | +84.61% | 0.75 | +61.77% |
TGCAV | 0.38 | -- | 0.37 | -- |
TGCAV (Attacked) | 0.49 | +28.49% | 0.53 | +42.63% |
Key Robustness Insights:
- Traditional TCAV: Vulnerable to adversarial manipulation, with up to 84.61% increase in targeted layer scores
- Our TGCAV: More resilient with only 28.49% increase under the same attack
- Global Integration: By fusing information across layers, GCAV reduces the impact of single-layer perturbations
The following figure shows TCAV score comparisons across layers on GoogleNet before (left) and after (right) applying our method:
Key Observations:
- Original TCAV (Left): High variance across layers, spurious activations in irrelevant concepts
- Our TGCAV (Right): Consistent scores across layers, reduced spurious activations
- Cross-layer stability: TGCAV eliminates layer selection ambiguity
Our GCAV framework provides improved concept localization through Visual-TCAV concept mapping. The visualization shows three representative concepts with original method (top row) vs our method (bottom row):
Striped (Zebra) | Cobwebbed (Spider Web) | Honeycombed (Honeycomb) |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Concept Localization Benefits:
- π― Precise Focus: Our method focuses more precisely on characteristic patterns while reducing distractions from irrelevant regions
- π Reduced Background Noise: Less activation in background regions (sky, surrounding environment) compared to original TCAV
- π Semantic Consistency: Better alignment with expected concept locations - stripes on zebra body, web structures, honeycomb patterns
- π Enhanced Interpretability: More concentrated activations on defining structural patterns, eliminating unnecessary activations in non-characteristic areas
CAVFusion/
βββ src/ # Main source code
β βββ configs.py # Configuration settings
β βββ get_cavs_main.py # Generate original CAVs
β βββ get_cavs.py # CAV generation utilities
β βββ align_dim.py # Autoencoder training
β βββ IntegrateCAV.py # Core GCAV framework
β βββ integrate_main.py # Training pipeline
β βββ evaluate.py # Evaluation scripts
β βββ plot_reslut.py # Visualization tools
β βββ attack.py # Adversarial attack experiments
β βββ concept_map.py # Concept mapping utilities
β βββ reconstruct.py # CAV reconstruction
β βββ load_models.py # Model loading utilities
β βββ download_data.py # Data download script
β βββ download_img.py # Image download utilities
β βββ select_img.py # Image selection tools
β βββ extract_500_birds.py # Bird dataset extraction
β βββ createRandom.py # Random dataset generation
β βββ concate.py # Data concatenation utilities
βββ imgs/ # Images for README
β βββ framework.png # Framework architecture
β βββ tcav_*.png # Original TCAV results
β βββ tgcav_*.png # Our TGCAV results
βββ requirements.txt # Dependencies
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
βββ README.md # This file
model_to_run = 'GoogleNet' # GoogleNet (Inception)
model_to_run = 'ResNet50V2' # ResNet50V2
model_to_run = 'MobileNetV2' # MobileNetV2
fuse_input = "aligned_cavs" # Use aligned CAVs (recommended)
fuse_input = "input_cavs" # Use original CAVs
fuse_input = "none_fuse" # No fusion baseline
# Alignment stage loss weights
nce_loss = 1 # InfoNCE loss weight
con_loss = 3 # Consistency loss weight
# Fusion stage loss weights
var_loss = 3 # Variance loss weight
sim_loss = 1 # Similarity loss weight
- Prepare concept images in
data/concepts/your_concept/
- Update
src/configs.py
:target = 'your_target_class' concepts = ["your_concept1", "your_concept2"]
- Run the full pipeline as described in Quick Start
Our framework supports any CNN architecture. To add a new model:
- Implement model wrapper in
tcav/model.py
- Define bottleneck layers in
src/configs.py
- Update model loading logic
- Coefficient of Variation (CV): Measures relative variability
- Standard Deviation (Std): Absolute measure of dispersion
- Range Ratio (RR): Normalized range of activation scores
- Adversarial Stability: Performance under adversarial attacks
- Cross-Layer Consistency: Correlation between layer-wise activations
- Concept Localization: Precision of concept attribution
If you find this work useful, please cite our paper:
@inproceedings{he2025gcav,
title={GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability},
author={He, Zhenghao and Sinha, Sanchit and Xiong, Guangzhi and Zhang, Aidong},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}
- Zhenghao He: [email protected]
- Institution: University of Virginia
- Built upon the TCAV framework by Kim et al.
- Uses the Broden dataset for concept visualization
- Supported by the University of Virginia Computer Science Department
This project is licensed under the MIT License - see the LICENSE file for details.
β Star this repository if you find it helpful!