You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -115,6 +158,46 @@ The system combines three key components:
115
158
116
159
## Adaptive Classification with LLMs
117
160
161
+
### Strategic Classification Evaluation
162
+
163
+
We evaluated the strategic classification feature using the [AI-Secure/adv_glue](https://huggingface.co/datasets/AI-Secure/adv_glue) dataset's `adv_sst2` subset, which contains adversarially-modified sentiment analysis examples designed to test robustness against strategic manipulation.
164
+
165
+
#### Testing Setup
166
+
-**Dataset**: 148 adversarial text samples (70% train / 30% test)
**Strategic Training Success**: The strategic classifier demonstrates robust performance across both clean and manipulated data, maintaining 82.22% accuracy regardless of input manipulation.
191
+
192
+
**Dual Benefit**: Unlike traditional adversarial defenses that sacrifice clean performance for robustness, our strategic classifier achieves:
193
+
-**2.22% improvement** on clean data
194
+
-**22.22% improvement** on manipulated data
195
+
-**Perfect robustness** (no performance degradation under attack)
196
+
197
+
**Practical Impact**: The 30.34% F1-score improvement on manipulated data demonstrates significant real-world value for applications facing adversarial inputs.
198
+
199
+
**Use Cases**: Ideal for production systems requiring consistent performance under adversarial conditions - content moderation, spam detection, fraud prevention, and security-critical applications where gaming attempts are common.
200
+
118
201
### Hallucination Detector
119
202
120
203
The adaptive classifier can detect hallucinations in language model outputs, especially in Retrieval-Augmented Generation (RAG) scenarios. Despite incorporating external knowledge sources, LLMs often still generate content that isn't supported by the provided context. Our hallucination detector identifies when a model's output contains information that goes beyond what's present in the source material.
@@ -268,6 +351,7 @@ This real-world evaluation demonstrates that adaptive classification can signifi
0 commit comments