SAFE-QAQ is an end-to-end framework for audio-text fraud detection that leverages reinforcement learning to enable slow-thinking decision-making. Below are instructions for setting up the environment, training the model, and running experiments.
- [2026.01] SAFE-QAQ has been accepted by ACL 2026.
This repository contains the source code for SAFE-QAQ, which consists of three main stages:
- Rule-Based Reinforcement Learning (Stage 1): Train a rule-based RL model.
- Rejection Sampling Fine-Tuning (RSFT) and Length-Constrained Reinforcement Learning (LCRL) (Stage 2): Refine the model using rejection sampling and LCRL techniques.
- Real-Time Fine-Tuning (Stage 3): Fine-tune the model for real-time inference.
The prompts for both real-time inference and training are defined in prompt.py.
- Paper (arXiv)
- TeleAntiFraud public dataset on Hugging Face
- TeleAntiFraud public dataset on ModelScope
- TeleAntiFraud main repository
SAFE-QAQ uses the TeleAntiFraud audio-text fraud detection dataset. Dataset downloads, benchmark resources, and evaluation utilities are available in the TeleAntiFraud repository linked above.
To set up the environment, follow the instructions provided in ms-swift.
Train the initial rule-based RL model with:
bash run_swift_grpo_stage1.sh-
Rejection Sampling: Generate samples with:
bash sample.sh
Then process the sampled data with:
bash process_samples.sh
-
Fine-Tuning with RSFT: Fine-tune the model on the processed data:
bash run_swift_sft_stage2_RSFT.sh
-
Length-Constrained Reinforcement Learning (LCRL): Further refine the model with LCRL:
bash run_swift_grpo_stage2_LCRL.sh
Run real-time fine-tuning with:
bash run_swift_grpo_stage3.sh- The
prompt.pyfile contains the definitions of prompts used during training and real-time inference. - Ensure all dependencies are installed as per the ms-swift documentation before running the scripts.
@inproceedings{ma2025teleantifraud,
title={TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection},
author={Ma, Zhiming and Wang, Peidong and Huang, Minhua and Wang, Jinpeng and Wu, Kai and Lv, Xiangzhao and Pang, Yachun and Yang, Yin and Tang, Wenjie and Kang, Yuchen},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
pages={5853--5862},
year={2025}
}
@article{wang2026safe,
title={SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning},
author={Wang, Peidong and Ma, Zhiming and Dai, Xin and Liu, Yongkang and Feng, Shi and Yang, Xiaocui and Hu, Wenxing and Wang, Zhihao and Pan, Mingjun and Yuan, Li and others},
journal={arXiv preprint arXiv:2601.01392},
year={2026}
}