Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run locally #9

Open
ShasTheMass opened this issue Nov 6, 2024 · 1 comment
Open

run locally #9

ShasTheMass opened this issue Nov 6, 2024 · 1 comment

Comments

@ShasTheMass
Copy link

Hello, great work here. But I wonder if we could make this run completely locally? e.g. with an Ollama based model? has anyone tried this? are models good enough (the small ones that fit on a, say 16GB mem, PC/MAC) to understand screenshots?

Hope to hear back from you @deedy

@kediaharshit9
Copy link

One of the key enabler of computer control is the LM looking at the image and prediction the action with proper coordinates.
This feature is surprisingly accurate on the new claude-3.5-sonnet model.

Not very confident on the samller VLMs being able to do that accurately (as of today). Hopefully someone can create a finetune dataset and then we can have smaller/quantized models do accurately on this step (which might affect the reasoning capabilities then).

Wishing for a truely local future soon, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants