RoBERTa-based Obfuscated Binary Code Similarity Detection
pip install -r requirements.txt
Besides, you need to install the followings:
- python3 (tested on 3.8)
- IDA Pro (tested on 8.2)
If you only want to test without preprocessing and training:
-
Download the RQ test dataset from https://zenodo.org/records/17119870.
-
Then move the files into the
datasetdirectory:
mkdir dataset
mv RQ_test_dataset dataset/
mv dataset/RQ_test_dataset/* dataset/
rmdir dataset/RQ_test_dataset
- Run the evaluation:
python eval.py
You can download the binary datasets (ollvm.tar.xz and tigress.tar.xz) from
https://zenodo.org/records/17119870 and use them to perform the preprocessing steps for training and testing.
By default, put ollvm and tigress under the /data directory.
python make_dataset.py --dataset_name ollvm
python make_dataset.py --dataset_name tigress
python make_tokenizer_dataset.py
python make_pretrain_dataset.py --dataset_name ollvm
python make_pretrain_dataset.py --dataset_name tigress
python make_finetune_dataset.py --dataset_name ollvm
python make_finetune_dataset.py --dataset_name tigress
python pretrain.py --dataset_name tigress
python pretrain.py --dataset_name ollvm
python finetune.py --dataset_name tigress
python finetune.py --dataset_name ollvm
python val_finetunedata.py --dataset_name tigress
python val_finetunedata.py --dataset_name ollvm
python make_rq_test_data.py
python eval.py