This is the code reference for the main process of SampleLLM accepted by WWW'25 Industry Track. The codes merely serve as a process reference and cannot run directly.
- data_generation is for the first stage (generating LLM-based samples)
- sampling is for the second stage (feature attribution-based importance sampling)
If you feel our work is insightful and want to use the code or cite our paper, please add the following citation to your paper references.
@inproceedings{gao2025samplellm,
title={SampleLLM: Optimizing Tabular Data Synthesis in Recommendations},
author={Gao, Jingtong and Du, Zhaocheng and Li, Xiaopeng and Wang, Yichao and Li, Xiangyang and Guo, Huifeng and Tang, Ruiming and Zhao, Xiangyu},
booktitle={Companion Proceedings of the ACM on Web Conference 2025},
pages={211--220},
year={2025}
}