Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support MiniCPM-V-2.6 #8967

Merged
merged 74 commits into from
Aug 16, 2024
Merged

Conversation

tc-mb
Copy link
Contributor

@tc-mb tc-mb commented Aug 10, 2024

Dear llama.cpp Official,

Hi, I'm writing to address our new PR submission for integrating our model MiniCPM-V 2.6 into llama.cpp. MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. This model is stronger and supports multi-images understanding and video understanding.

This version of the model supports video understanding, and I have implemented functions such as video frame extraction in my fork version. However, because ffmpeg is introduced, there may be many environment and compilation issues in other devices. Therefore, I think it can be divided into multiple PR submissions.

  1. This PR will first submit the modification of the model, and I hope it can be merged soon, so that the community can use MiniCPM-V 2.6 by GGUF first.
  2. And in the later PR, support for video formats will be submitted, and we can spend more time discussing how llama.cpp can better integrate the function implementation of video understanding.

Best regards,
MiniCPM-V Official ^_^

@yorkane
Copy link

yorkane commented Aug 12, 2024

waiting for merge

@HaishengLiang
Copy link

waiting for merge

@nanowell
Copy link

waiting for merge

@ggerganov ggerganov merged commit d565bb2 into ggerganov:master Aug 16, 2024
54 checks passed
@saket424
Copy link

saket424 commented Aug 17, 2024

I have opened an issue 9066 where I experienced a crash after this pull request was merged. The crash was unrelated to this miniCPM-V-2.6 model. I hope you can reproduce the error

@tc-mb
Copy link
Contributor Author

tc-mb commented Aug 19, 2024

I have opened an issue 9066 where I experienced a crash after this pull request was merged. The crash was unrelated to this miniCPM-V-2.6 model. I hope you can reproduce the error

Hello, I saw that the issue you mentioned was that llava would crash, but my update only involves the part of minicpmv. Although I am not sure about the issue problem, I feel that it may not be the problem with this branch.
Can you test whether this branch will also crash before being merged? Of course, if it is indeed a problem introduced by this PR, I will be very happy to help modify it.

@saket424
Copy link

@tc-mb
The crash is not directly related to your miniCPM2.6 PR other than there is no crash before your PR and a crash after your PR owing to some uninitialized variables

Here is a PR that appears to fix the issue I reported
#9082

Sorry for the false alarm

@tc-mb
Copy link
Contributor Author

tc-mb commented Aug 19, 2024

@tc-mb The crash is not directly related to your miniCPM2.6 PR other than there is no crash before your PR and a crash after your PR owing to some uninitialized variables

Here is a PR that appears to fix the issue I reported #9082

Sorry for the false alarm

I'm glad your problem was solved.

@x4080
Copy link

x4080 commented Aug 19, 2024

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

@tc-mb
Copy link
Contributor Author

tc-mb commented Aug 20, 2024

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

Yes, it's now storing cache.

You can run in interactive mode to ask multiple rounds of questions.

./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i

or modify the minicpmv-cli function (which is more like an example) to achieve the functionality you want.

@yizhangliu
Copy link

Eagerly awaiting...

@tc-mb tc-mb deleted the prepare-PR-of-minicpm-v2.6 branch August 20, 2024 11:09
if args.text_only:
fname_middle = "text-"
has_vision_encoder = False
elif args.minicpmv_projector is not None:
fname_middle = "mmproj-"
has_text_encoder = False
has_minicpmv_projector = True
minicpmv_version = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line necessary? It overrides minicpmv_version value set in the command line when converting MiniCPM-V2.5 which results in a broken mmproj-model-f16.gguf.

@x4080
Copy link

x4080 commented Aug 20, 2024

@tc-mb Can we use mini cpm with context cache ? So that we upload image once and ask for multiple question referring to the same image ?

Yes, it's now storing cache.

You can run in interactive mode to ask multiple rounds of questions.

./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i

or modify the minicpmv-cli function (which is more like an example) to achieve the functionality you want.

cool, thats a great feature, thanks @tc-mb

@dewarrn1
Copy link

Very cool! Are GPU operations supported at this time?

@tc-mb
Copy link
Contributor Author

tc-mb commented Aug 23, 2024

Very cool! Are GPU operations supported at this time?

I have tested in Ubuntu + Nvidia(4090), it is available and speed looks good. You can use it in the following way.

make LLAMA_CUDA=1
And add appropriate ngl parameters, such as.
./llama-minicpmv-cli -m ../MiniCPM-V-2_6/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-V-2_6/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?" -ngl 100

@dewarrn1
Copy link

Awesome, thanks!

@saket424
Copy link

@tc-mb
Can you give us the usage for how to serve up minicpm2.6 using llama-server so we can send it openai compatible chat completion requests with base64 encoded images

@tc-mb
Copy link
Contributor Author

tc-mb commented Aug 28, 2024

@tc-mb Can you give us the usage for how to serve up minicpm2.6 using llama-server so we can send it openai compatible chat completion requests with base64 encoded images

Sorry, I didn't test the server method when I updated it, I will support this capability in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.