Implement Omniparser #894

abrichr · 2024-10-26T02:43:53Z

Feature request

We want to implement https://huggingface.co/microsoft/OmniParser in a ReplayStrategy (e.g. #888)

Motivation

OmniParser is designed to be able to convert unstructured screenshot image into structured list of elements including interactable regions location and captions of icons on its potential functionality.
OmniParser is intended to be used in settings where users are already trained on responsible analytic approaches and critical reasoning is expected. OmniParser is capable of providing extracted information from the screenshot, however human judgement is needed for the output of OmniParser.
OmniParser is intended to be used on various screenshots, which includes both PC and Phone, and also on various applications.

The text was updated successfully, but these errors were encountered:

abrichr added the enhancement New feature or request label Oct 26, 2024

abrichr mentioned this issue Oct 26, 2024

Implement CoordinateReplayStrategy with Claude/UGround/Omniparser #882

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Omniparser #894

Implement Omniparser #894

abrichr commented Oct 26, 2024

Implement Omniparser #894

Implement Omniparser #894

Comments

abrichr commented Oct 26, 2024

Feature request

Motivation