Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to use it on Windows in Firefox browser, it fails to do even basic actions #56

Open
RaiaN opened this issue Dec 12, 2024 · 3 comments

Comments

@RaiaN
Copy link

RaiaN commented Dec 12, 2024

Hi there,
I'm trying to make use of your app (using gpt-4o + ShowUI) but it fails to perform even basic actions. I have 2560x1440 monitor. I am using Windows. I have Firefox opened.

I ask it to open a new tab for me. It fails to do so. Mouse is not being moved to correct location. Clicks do not occur.

How do I debug your system?

@h-siyuan
Copy link
Collaborator

could you try the screeshot grounding on our huggingface space: https://huggingface.co/spaces/showlab/ShowUI? you can post the results here and we will investigate that:)

@tristayunsub
Copy link

You should better lower the resolution to 1028 x728

@FringeNet
Copy link

FringeNet commented Jan 14, 2025

Even at lower resolution, it is still "unaware" of what is going on.
It tries to open software thats already open and displaying on the screen.
It also hallucinates having successfully completed steps.

e.g (1366x768, Win 11, ShowUI + GPT4o, RTX 4070):
Modifications to loop.py in order to actually get it working:
image

Planner tries to:

  1. Click File Menu
  2. Create a new project
  3. Enter the name for the project
  4. Create Minecraft entity as instructed

Actually does:

  1. Click File Menu
    2: Clicks New (Has a submenu of project types)
    3: Hallucinates typing name after click the submenu
    4: Hallucinates interacting with the application, despite seeing the same screenshot over and over where it is stuck.

image
image

Tail of Console output:

_render_message: **VLMPlanner**:
I need to adjust the cube to form the base shape of the Shardling's semi-transparent
_render_message: **VLMPlanner** sending action to **<span style="color:rgb(106, 158, 210)">S</span><span style="color
_render_message: Screenshot for **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)
Output Text: [{'action': 'CLICK', 'value': None, 'position': [0.23, 0.09]}]
Parsed Output: [{'action': 'CLICK', 'value': None, 'position': [0.23, 0.09]}]
Action Item: {'action': 'CLICK', 'value': None, 'position': [0.23, 0.09]}
Parsed Action List: [{'action': 'mouse_move', 'text': None, 'coordinate': (314, 69)}, {'action': 'left_click', 'text': None, 'coordinate': None}]
_render_message: **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span
Converted Action: {'action': 'mouse_move', 'text': None, 'coordinate': (314, 69)}
sync_call: computer {'action': 'mouse_move', 'text': None, 'coordinate': (314, 69)}
action: mouse_move, text: None, coordinate: (314, 69)
mouse move to 314, 69
_render_message: **<span style="color:rgb(106, 158, 210)">S</span><span style="color:rgb(111, 163, 82)">h</span><span
Converted Action: {'action': 'left_click', 'text': None, 'coordinate': None}
sync_call: computer {'action': 'left_click', 'text': None, 'coordinate': None}
action: left_click, text: None, coordinate: None```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants