Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Omniparser #894

Open
abrichr opened this issue Oct 26, 2024 · 0 comments
Open

Implement Omniparser #894

abrichr opened this issue Oct 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@abrichr
Copy link
Member

abrichr commented Oct 26, 2024

Feature request

We want to implement https://huggingface.co/microsoft/OmniParser in a ReplayStrategy (e.g. #888)

Motivation

OmniParser is designed to be able to convert unstructured screenshot image into structured list of elements including interactable regions location and captions of icons on its potential functionality.
OmniParser is intended to be used in settings where users are already trained on responsible analytic approaches and critical reasoning is expected. OmniParser is capable of providing extracted information from the screenshot, however human judgement is needed for the output of OmniParser.
OmniParser is intended to be used on various screenshots, which includes both PC and Phone, and also on various applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant