We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux
3.10
main
Installation From Source
Docker Installation
Docker Compose Installation
Cluster Installation
AutoDL Image
Other
Device: CPU
LLM: proxy llm azure embedding llm: azure
create a knowledge base, and upload a local file in vector mode, found error:
2025-02-08 16:38:01 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20 current session:<sqlalchemy.orm.session.Session object at 0x7df3a024cbe0> INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK 2025-02-08 16:38:04 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20 current session:<sqlalchemy.orm.session.Session object at 0x7df3a024eda0> INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK 2025-02-08 16:38:07 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20 current session:<sqlalchemy.orm.session.Session object at 0x7df3a0237fa0> INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK 2025-02-08 16:38:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}} 2025-02-08 16:38:10 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20 current session:<sqlalchemy.orm.session.Session object at 0x7df3a0312d10> INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK /space/list params: INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK /space/list params: INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK /knowledge/space/arguments params: 2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18 2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18 INFO: 127.0.0.1:47876 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK INFO: 127.0.0.1:47888 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/arguments HTTP/1.1" 200 OK 2025-02-08 16:40:06 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None current session:<sqlalchemy.orm.session.Session object at 0x7df3a034c0d0> 2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'> 2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx 2025-02-08 16:40:10 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx 2025-02-08 16:40:21 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10) INFO: 127.0.0.1:59078 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK 2025-02-08 16:40:43 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads. 2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:41:00 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
INFO: 127.0.0.1:43016 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK 2025-02-08 16:41:57 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads. 2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:42:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}} 2025-02-08 16:42:16 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None current session:<sqlalchemy.orm.session.Session object at 0x7df3b1507c40> 2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'> 2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx 2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx 2025-02-08 16:42:32 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10) INFO: 127.0.0.1:41878 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK 2025-02-08 16:42:46 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads. 2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document 2025-02-08 16:43:14 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10) 2025-02-08 16:43:14 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
I tried pdf, docx and text, all of them has this issue
The text was updated successfully, but these errors were encountered:
can you show your embedding settings in .env?
Sorry, something went wrong.
This is my embedding setings in .env
EMBEDDING_MODEL=proxy_azure proxy_openai_proxy_server_url=https://openai-south.openai.azure.com proxy_openai_proxy_api_key=xxxxxx proxy_openai_proxy_backend=text-embedding-3-large
No branches or pull requests
Search before asking
Operating system information
Linux
Python version information
3.10
DB-GPT version
main
Related scenes
Installation Information
Installation From Source
Docker Installation
Docker Compose Installation
Cluster Installation
AutoDL Image
Other
Device information
Device: CPU
Models information
LLM: proxy llm azure
embedding llm: azure
What happened
create a knowledge base, and upload a local file in vector mode, found error:
What you expected to happen
2025-02-08 16:38:01 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a024cbe0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:04 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a024eda0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:07 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a0237fa0>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
2025-02-08 16:38:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
2025-02-08 16:38:10 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=[10] doc_type=None status=None page=1 page_size=20
current session:<sqlalchemy.orm.session.Session object at 0x7df3a0312d10>
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
/space/list params:
INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK
/space/list params:
INFO: 127.0.0.1:51284 - "POST /knowledge/space/list HTTP/1.1" 200 OK
/knowledge/space/arguments params:
2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18
2025-02-08 16:38:14 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO /document/list params: 安全检查, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18
INFO: 127.0.0.1:47876 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:47888 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/list HTTP/1.1" 200 OK
INFO: 127.0.0.1:51284 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/arguments HTTP/1.1" 200 OK
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None
current session:<sqlalchemy.orm.session.Session object at 0x7df3a034c0d0>
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'>
2025-02-08 16:40:06 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:40:10 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:40:21 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
INFO: 127.0.0.1:59078 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:40:43 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:41:00 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
How to reproduce
INFO: 127.0.0.1:43016 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:41:57 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:42:07 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.app.knowledge.api[949346] INFO Received params: 安全检查, doc_ids=[10] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None
current session:<sqlalchemy.orm.session.Session object at 0x7df3b1507c40>
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.connector[949346] INFO VectorStore:<class 'dbgpt.storage.vector_store.chroma_store.ChromaStore'>
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO begin save document chunks, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:42:16 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] INFO async doc persist sync, doc:COEKB06 职业卫生档案管理规范.docx
2025-02-08 16:42:32 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
INFO: 127.0.0.1:41878 - "POST /knowledge/%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5/document/sync HTTP/1.1" 200 OK
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.rag.index.base[949346] INFO Loading 18 chunks in 2 groups with 1 threads.
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:42:46 ubuntu-jianxin dbgpt.storage.vector_store.chroma_store[949346] INFO ChromaStore load document
2025-02-08 16:43:14 ubuntu-jianxin dbgpt.util.api_utils[949346] WARNING Health check failed for http://127.0.0.1:5670, error: HTTPConnectionPool(host='127.0.0.1', port=5670): Read timed out. (read timeout=10)
2025-02-08 16:43:14 ubuntu-jianxin dbgpt.serve.rag.service.service[949346] ERROR document embedding, failed:COEKB06 职业卫生档案管理规范.docx, Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
Additional context
I tried pdf, docx and text, all of them has this issue
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: