Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

Open
ninirobot opened this issue Dec 12, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@ninirobot
Copy link

🥰 需求描述

希望支持以下 Google Gemini 模型和新功能:

Gemini 2.0 Flash Exp 模型: 轻量级、速度更快的 Gemini 模型,适用于低延迟、高质量,经过测试,比Pro级别的模型都要更好。
Gemini Exp-1206 模型: 特定 Gemini 模型版本,可能包含特定功能或优化。
Stream Realtime功能: Google 近期推出的实时交互式聊天功能。

🧐 解决方案

能不能允许用户在配置中选择特定 Gemini 模型,提供 API 接口,以及使用Gemini实时聊天功能,OpenAI的要消耗额度,不划算,而谷歌是限量免费的,足够个人使用。目前这些功能在Google AI Studio已经可以预览了。新版本的模型需要新的SDK,可以参考https://cloud.google.com/vertex-ai/generative-ai/docs/sdks/overview
https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2
另外,新模型好像还支持画图,谷歌的Imagen 3画图模型应该也有API可以用,能否支持一下?

📝 补充信息

希望团队考虑此请求,并尽快将其纳入开发计划。

@ninirobot ninirobot added the enhancement New feature or request label Dec 12, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


🥰 Description of requirements

The following Google Gemini models and new features are expected to be supported:

Gemini 2.0 Flash Exp model: A lightweight, faster Gemini model, suitable for low latency, high quality, and tested to be better than the Pro-level model.
Gemini Exp-1206 model: A specific Gemini model version that may contain specific features or optimizations.
Stream Realtime function: Google's recently launched real-time interactive chat function.

🧐 Solution

Can users be allowed to select a specific Gemini model in the configuration, provide an API interface, and use the Gemini real-time chat function? OpenAI consumes credits and is not cost-effective, while Google provides a limited number of free ones, which is enough for personal use. These functions are currently available for preview in Google AI Studio. The new version of the model requires a new SDK, please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/sdks/overview
https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2
In addition, the new model seems to support drawing. Google's Imagen 3 drawing model should also have an API that can be used. Can it be supported?

📝 Supplementary information

Hopefully the team will consider this request and include it in the development plan as soon as possible.

@copyliu
Copy link

copyliu commented Dec 12, 2024

可以暂时在 CUSTOM_MODELS 中放一个 +gemini-2.0-flash-exp@Google 达到用上2.0的目的, 其他模型可以比照设置

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


You can temporarily put a +gemini-2.0-flash-exp@Google in CUSTOM_MODELS to achieve the purpose of using 2.0. Other models can be set accordingly.

@zhengxinjipai
Copy link

但是不能上传文件,无法识别图片和视频,能否搞定一下

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


But I can’t upload files and can’t recognize pictures and videos. Can you fix it?

@ninirobot
Copy link
Author

但是不能上传文件,无法识别图片和视频,能否搞定一下

应该是识别是否为视觉模型的isVisionModel里面没有gemini-2.0的关键词匹配,需要自己修改那个代码。视频能上传吗?别的模型好像也不行。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


But I can’t upload files and can’t recognize pictures and videos. Can you fix it?

It should be that there is no keyword matching for gemini-2.0 in the isVisionModel that identifies whether it is a visual model, and you need to modify that code yourself. Can videos be uploaded? Other models don't seem to work either.

@Kosette
Copy link
Contributor

Kosette commented Dec 13, 2024

const visionKeywords = [
"vision",
"gpt-4o",
"claude-3",
"gemini-1.5",
"gemini-exp",
"learnlm",
"qwen-vl",
"qwen2-vl",
];

目前判断vision模型的方法还不够灵活,需要修改源码

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


const visionKeywords = [
"vision",
"gpt-4o",
"claude-3",
"gemini-1.5",
"gemini-exp",
"learnlm",
"qwen-vl",
"qwen2-vl",
];

The current method of judging the vision model is not flexible enough and the source code needs to be modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants