Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Module Name] Bug title [ChatKnowledge] document embedding failed Resource not found #2335

Open
3 of 15 tasks
jianxinxie opened this issue Feb 8, 2025 · 2 comments
Open
3 of 15 tasks
Labels
bug Something isn't working Waiting for reply

Comments

@jianxinxie
Copy link

jianxinxie commented Feb 8, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Operating system information

Linux

Python version information

3.10

DB-GPT version

main

Related scenes

  • Chat Data
  • Chat Excel
  • Chat DB
  • Chat Knowledge
  • Model Management
  • Dashboard
  • Plugins

Installation Information

Device information

Device: CPU

Models information

LLM: proxy llm azure
embedding llm: azure

What happened

create a knowledge base, and upload a local file in vector mode, found error:

Image

What you expected to happen

2025-02-08 16:38:01 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a024cbe0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:04 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a024eda0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:07 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a0237fa0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
2025-02-08 16:38:10 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a0312d10>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
/space/list params:
INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK
/space/list params:
INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK
/knowledge/space/arguments params:
2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18
2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18
INFO: 127.0.0.1:47876 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:47888 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/arguments HTTP/1.1" 200 OK
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None
current session:<sqlalchemy.orm.session.Session object at 0x7df3a034c0d0>
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'>
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:40:10 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:40:21 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
INFO: 127.0.0.1:59078 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:41:00 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}

Image

How to reproduce

INFO: 127.0.0.1:43016 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:42:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None
current session:<sqlalchemy.orm.session.Session object at 0x7df3b1507c40>
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'>
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:42:32 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
INFO: 127.0.0.1:41878 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:43:14 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
2025-02-08 16:43:14 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}

Additional context

I tried pdf, docx and text, all of them has this issue

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@jianxinxie jianxinxie added bug Something isn't working Waiting for reply labels Feb 8, 2025
@Aries-ckt
Copy link
Collaborator

can you show your embedding settings in .env?

@jianxinxie
Copy link
Author

can you show your embedding settings in .env?

This is my embedding setings in .env

EMBEDDING_MODEL=proxy_azure
proxy_openai_proxy_server_url=https://openai-south.openai.azure.com
proxy_openai_proxy_api_key=xxxxxx
proxy_openai_proxy_backend=text-embedding-3-large

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Waiting for reply
Projects
None yet
Development

No branches or pull requests

2 participants