-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does Dify have plans to support rerank and embedding models launched by vLLM? #11857
Comments
+1 |
Because of this #11588, we no longer accept the prs related to model runtimes. |
vLLM rerank task api :
waiting for your v1.0 😭 |
Hi, @massif-01. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale. Issue Summary
Next Steps
Thank you for your understanding and contribution! |
vLLM 0.7.2 now support /v1/rerank. However, we will face 404 error when trying to add rerank model to dify, since there is a bug in rerank.py we could modify the code in L67 to following
If premitted, I could submit pr to fix it. |
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
Does Dify have plans to support rerank and embedding models launched by vLLM?
When deploying large models and Dify locally on performance-limited devices (such as Orin X or a Mac mini with 64GB of RAM), using Xinference to launch the models can consume precious memory. Using VLLM to launch various models together is a better choice.
2. Additional context or comments
No response
3. Can you help us with this feature?
The text was updated successfully, but these errors were encountered: