ShowUI

Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou
Show Lab @ National University of Singapore, Microsoft

🔥 Update

[2024.12.28] Update GPT-4o annotation recaptioning scripts.
[2024.12.27] Update training codes and instructions.
[2024.12.23] Update showui for UI-guided token selection implementation.
[2024.12.15] ShowUI received Outstanding Paper Award at NeurIPS2024 Open-World Agents workshop.
[2024.12.9] Support int8 Quantization.
[2024.12.5] Major Update: ShowUI is integrated into OOTB for local run!
[2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at HF Spaces demo.
[2024.11.27] We release the arXiv paper, HF Spaces demo and ShowUI-desktop-8K.
[2024.11.16] showlab/ShowUI-2B is available at huggingface.

🖥️ Computer Use

See Computer Use OOTB for using ShowUI to control your PC.

computer_use_with_showui-en-s.mp4

⭐ Quick Start

See Quick Start for model usage.

🤗 Local Gradio

See Gradio for installation.

🚀 Training

Our Training codebases supports:

DeepSpeed Zero1, Zero2, Zero3
Full-tuning (FP32, FP16, BF16), LoRA, QLoRA
SDPA, Flash Attention 2
Multiple datasets mixed training
Interleaved data streaming
Image randomly resize (crop, pad)

See Train for training set up.

🕹️ UI-Guided Token Selection

Try test.ipynb, which seamless support for Qwen2VL models.

(b) By applying UI-graph, UI Component number: 167

✍️ Annotate your own data

Try recaption.ipynb, where we provide instructions on how to recaption the original annotations using GPT-4o.

❤ Acknowledgement

We extend our gratitude to SeeClick for providing their codes and datasets.

Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{lin2024showui,
      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, 
      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
      year={2024},
      eprint={2411.17465},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17465}, 
}

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
assets		assets
data		data
ds_configs		ds_configs
examples		examples
main		main
model		model
utils		utils
GRADIO.md		GRADIO.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
TRAIN.md		TRAIN.md
app.py		app.py
merge_weight.py		merge_weight.py
recaption.ipynb		recaption.ipynb
requirements.txt		requirements.txt
test.ipynb		test.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShowUI

🔥 Update

🖥️ Computer Use

⭐ Quick Start

🤗 Local Gradio

🚀 Training

🕹️ UI-Guided Token Selection

✍️ Annotate your own data

❤ Acknowledgement

🎓 BibTeX

About

Releases

Packages

Contributors 2

Languages

License

showlab/ShowUI

Folders and files

Latest commit

History

Repository files navigation

ShowUI

🔥 Update

🖥️ Computer Use

⭐ Quick Start

🤗 Local Gradio

🚀 Training

🕹️ UI-Guided Token Selection

✍️ Annotate your own data

❤ Acknowledgement

🎓 BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages