You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements.
A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states.
Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset.
We should consider integrating WebCanvas dataset which is perfectly fit into CRAB.
Solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Required prerequisites
Motivation
WebCanvas: Benchmarking Web Agents in Online Environments is a advanced web agent benchmark framework that shares a similar idea with CRAB in some perspectives.
WebCanva provides three main components:
We should consider integrating WebCanvas dataset which is perfectly fit into CRAB.
Solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: