Skip to content

web-arena-x/synatra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for Paper: Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

About Synatra

  • A data synthesis approach relying on indirect knowledge
  • 100k next action demonstrations in the form of web trajectories
  • Synatra-CodeLlama-7B, a dedicated web navigation agent

Repository Structure

This repository is divided into two parts:

  • Data Synthesis: contains pipeline to generate synthetic trajectories using tutorials and web page snapshots.

  • Training: contains training code Synatra-CodeLlama-7B and all other experimented models in the paper.

  • Evaluation: contains evaluation code on all benchmarks we tested in the paper.

Dataset Download

  • Synatra: Download Synatra's 100k synthesized trajectories from huggingface.

Model Checkpoint

Model Name LLM Checkpoint
Synatra-CodeLlama-7B CodeLlama-7B Synatra-CodeLlama-7B

Data Synthesis

Generate trajectories with WikiHow Tutorials and Web Page Snapshots

cd ./data_generation

Follow instructions to generate trajectories.

Training

Train with LLaMA-Factory

Set up LLaMA-Factory according to the instructions.

To start training:

cd ./train
python launch_training_batch.py

Run Evaluation

Serve Models With vLLM

To serve evaluated models locally with vLLM:

cd ./evaluation/
sbatch vllm_serve.sh

WebArena & MiniWoB++

To evaluate WebArena and MiniWoB++:

Use the WebArena benchmark with MiniWoB++ intergration

cd ./evaluation/webarena_miniwob

Follow the set-up and evaluation instruction of webarena_miniwob

Mind2Web Evaluation

To evaluate Mind2Web:

Run inference

cd ./evaluation/mind2web/inference

python m2w_code.py \
../data/(domain|task|website)_test.json \
MODEL_NAME \

Calculate metrics

python ../eval/count_m2w.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published