Feature Request: add support to LLaVA OneVision #8944

alexrah · 2024-08-09T08:46:11Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

LLaVA has a new version called OneVision which was released 2024/08/06

LLaVA OnVision uses SO400M as the vision encoder and Qwen-2.0 as the language model, with trainable components including a projector and the full model in later stages.

I'm no expert but as I understand, the architecture is similar the the previous versions, but both vision encoder and the language model are different

llama.cpp LLaVA support: https://github.com/ggerganov/llama.cpp/tree/master/examples/llava

Motivation

compared to the current supported LLaVa 1.6, it provide the following features:

Supports various input resolutions up to 2304 * 2304 pixels.
Single image input is represented by 729 * (9+1) tokens at most under anyres_max_9 mode.
Supports multi-image and video inputs. Multi-image input is represented by 729 token for each image, and video input is represented by 196 token for each frame.
Available in three sizes: 0.5B, 7B and 72B parameter versions, fit for different memory and inference latency requirements.
better support for Set-of-mark prompting
and more...

Possible Implementation

No response

github-actions · 2024-09-23T01:07:11Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

alexrah added the enhancement New feature or request label Aug 9, 2024

ddpasa mentioned this issue Aug 20, 2024

LlaVA OneVision ollama/ollama#6438

Open

github-actions bot added the stale label Sep 9, 2024

github-actions bot closed this as completed Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add support to LLaVA OneVision #8944

Feature Request: add support to LLaVA OneVision #8944

alexrah commented Aug 9, 2024

github-actions bot commented Sep 23, 2024

Feature Request: add support to LLaVA OneVision #8944

Feature Request: add support to LLaVA OneVision #8944

Comments

alexrah commented Aug 9, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

github-actions bot commented Sep 23, 2024