Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add support to LLaVA OneVision #8944

Closed
4 tasks done
alexrah opened this issue Aug 9, 2024 · 1 comment
Closed
4 tasks done

Feature Request: add support to LLaVA OneVision #8944

alexrah opened this issue Aug 9, 2024 · 1 comment
Labels
enhancement New feature or request stale

Comments

@alexrah
Copy link

alexrah commented Aug 9, 2024

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

LLaVA has a new version called OneVision which was released 2024/08/06

HuggingFace
GitHub
Release Notes

LLaVA OnVision uses SO400M as the vision encoder and Qwen-2.0 as the language model, with trainable components including a projector and the full model in later stages.

I'm no expert but as I understand, the architecture is similar the the previous versions, but both vision encoder and the language model are different

llama.cpp LLaVA support: https://github.com/ggerganov/llama.cpp/tree/master/examples/llava

Motivation

compared to the current supported LLaVa 1.6, it provide the following features:

  • Supports various input resolutions up to 2304 * 2304 pixels.
  • Single image input is represented by 729 * (9+1) tokens at most under anyres_max_9 mode.
  • Supports multi-image and video inputs. Multi-image input is represented by 729 token for each image, and video input is represented by 196 token for each frame.
  • Available in three sizes: 0.5B, 7B and 72B parameter versions, fit for different memory and inference latency requirements.
  • better support for Set-of-mark prompting
  • and more...

Possible Implementation

No response

@alexrah alexrah added the enhancement New feature or request label Aug 9, 2024
@github-actions github-actions bot added the stale label Sep 9, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

1 participant