[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

ninirobot · 2024-12-12T12:14:05Z

🥰 需求描述

希望支持以下 Google Gemini 模型和新功能：

Gemini 2.0 Flash Exp 模型：轻量级、速度更快的 Gemini 模型，适用于低延迟、高质量，经过测试，比Pro级别的模型都要更好。
Gemini Exp-1206 模型：特定 Gemini 模型版本，可能包含特定功能或优化。
Stream Realtime功能： Google 近期推出的实时交互式聊天功能。

🧐 解决方案

能不能允许用户在配置中选择特定 Gemini 模型，提供 API 接口，以及使用Gemini实时聊天功能，OpenAI的要消耗额度，不划算，而谷歌是限量免费的，足够个人使用。目前这些功能在Google AI Studio已经可以预览了。新版本的模型需要新的SDK，可以参考https://cloud.google.com/vertex-ai/generative-ai/docs/sdks/overview
https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2
另外，新模型好像还支持画图，谷歌的Imagen 3画图模型应该也有API可以用，能否支持一下？

📝 补充信息

希望团队考虑此请求，并尽快将其纳入开发计划。

Issues-translate-bot · 2024-12-12T12:14:16Z

Bot detected the issue body's language is not English, translate it automatically.

🥰 Description of requirements

The following Google Gemini models and new features are expected to be supported:

Gemini 2.0 Flash Exp model: A lightweight, faster Gemini model, suitable for low latency, high quality, and tested to be better than the Pro-level model.
Gemini Exp-1206 model: A specific Gemini model version that may contain specific features or optimizations.
Stream Realtime function: Google's recently launched real-time interactive chat function.

🧐 Solution

Can users be allowed to select a specific Gemini model in the configuration, provide an API interface, and use the Gemini real-time chat function? OpenAI consumes credits and is not cost-effective, while Google provides a limited number of free ones, which is enough for personal use. These functions are currently available for preview in Google AI Studio. The new version of the model requires a new SDK, please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/sdks/overview
https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2
In addition, the new model seems to support drawing. Google's Imagen 3 drawing model should also have an API that can be used. Can it be supported?

📝 Supplementary information

Hopefully the team will consider this request and include it in the development plan as soon as possible.

copyliu · 2024-12-12T16:31:10Z

可以暂时在 CUSTOM_MODELS 中放一个 +gemini-2.0-flash-exp@Google 达到用上2.0的目的, 其他模型可以比照设置

Issues-translate-bot · 2024-12-12T16:31:23Z

Bot detected the issue body's language is not English, translate it automatically.

You can temporarily put a +gemini-2.0-flash-exp@Google in CUSTOM_MODELS to achieve the purpose of using 2.0. Other models can be set accordingly.

zhengxinjipai · 2024-12-13T08:21:40Z

但是不能上传文件，无法识别图片和视频，能否搞定一下

Issues-translate-bot · 2024-12-13T08:21:54Z

Bot detected the issue body's language is not English, translate it automatically.

But I can’t upload files and can’t recognize pictures and videos. Can you fix it?

ninirobot · 2024-12-13T13:20:51Z

但是不能上传文件，无法识别图片和视频，能否搞定一下

应该是识别是否为视觉模型的isVisionModel里面没有gemini-2.0的关键词匹配，需要自己修改那个代码。视频能上传吗？别的模型好像也不行。

Issues-translate-bot · 2024-12-13T13:21:05Z

Bot detected the issue body's language is not English, translate it automatically.

But I can’t upload files and can’t recognize pictures and videos. Can you fix it?

It should be that there is no keyword matching for gemini-2.0 in the isVisionModel that identifies whether it is a visual model, and you need to modify that code yourself. Can videos be uploaded? Other models don't seem to work either.

Kosette · 2024-12-13T21:49:08Z

ChatGPT-Next-Web/app/utils.ts

Lines 258 to 267 in 83cea3a

    
           const visionKeywords = [ 
        
             "vision", 
        
             "gpt-4o", 
        
             "claude-3", 
        
             "gemini-1.5", 
        
             "gemini-exp", 
        
             "learnlm", 
        
             "qwen-vl", 
        
             "qwen2-vl", 
        
           ];

目前判断vision模型的方法还不够灵活，需要修改源码

Issues-translate-bot · 2024-12-13T21:49:22Z

Bot detected the issue body's language is not English, translate it automatically.

ChatGPT-Next-Web/app/utils.ts

Lines 258 to 267 in 83cea3a

    
           const visionKeywords = [ 
        
             "vision", 
        
             "gpt-4o", 
        
             "claude-3", 
        
             "gemini-1.5", 
        
             "gemini-exp", 
        
             "learnlm", 
        
             "qwen-vl", 
        
             "qwen2-vl", 
        
           ];

The current method of judging the vision model is not flexible enough and the source code needs to be modified.

ninirobot added the enhancement New feature or request label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

ninirobot commented Dec 12, 2024

Issues-translate-bot commented Dec 12, 2024

copyliu commented Dec 12, 2024

Issues-translate-bot commented Dec 12, 2024

zhengxinjipai commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024

ninirobot commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024

Kosette commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024

[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

[Feature Request] Add Gemini 2.0 Flash Exp, Gemini Exp-1206 Models and Gemini Stream Realtime Functionality #5931

Comments

ninirobot commented Dec 12, 2024

🥰 需求描述

🧐 解决方案

📝 补充信息

Issues-translate-bot commented Dec 12, 2024

🥰 Description of requirements

🧐 Solution

📝 Supplementary information

copyliu commented Dec 12, 2024

Issues-translate-bot commented Dec 12, 2024

zhengxinjipai commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024

ninirobot commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024

Kosette commented Dec 13, 2024

Issues-translate-bot commented Dec 13, 2024