Code for Paper: Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
- A data synthesis approach relying on indirect knowledge
- 100k next action demonstrations in the form of web trajectories
- Synatra-CodeLlama-7B, a dedicated web navigation agent
This repository is divided into two parts:
-
Data Synthesis: contains pipeline to generate synthetic trajectories using tutorials and web page snapshots.
-
Training: contains training code Synatra-CodeLlama-7B and all other experimented models in the paper.
-
Evaluation: contains evaluation code on all benchmarks we tested in the paper.
- Synatra: Download Synatra's 100k synthesized trajectories from huggingface.
Model Name | LLM | Checkpoint |
---|---|---|
Synatra-CodeLlama-7B | CodeLlama-7B | Synatra-CodeLlama-7B |
cd ./data_generation
Follow instructions to generate trajectories.
Set up LLaMA-Factory according to the instructions.
To start training:
cd ./train
python launch_training_batch.py
To serve evaluated models locally with vLLM:
cd ./evaluation/
sbatch vllm_serve.sh
To evaluate WebArena and MiniWoB++:
Use the WebArena benchmark with MiniWoB++ intergration
cd ./evaluation/webarena_miniwob
Follow the set-up and evaluation instruction of webarena_miniwob
To evaluate Mind2Web:
Run inference
cd ./evaluation/mind2web/inference
python m2w_code.py \
../data/(domain|task|website)_test.json \
MODEL_NAME \
Calculate metrics
python ../eval/count_m2w.py