Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

Open
mayanlong2020 opened this issue Dec 23, 2024 · 1 comment
Open

[Question]: RAPTOR Stage Stuck After Document Parsing #4173

mayanlong2020 opened this issue Dec 23, 2024 · 1 comment
Labels
question Further information is requested

Comments

@mayanlong2020
Copy link

mayanlong2020 commented Dec 23, 2024

Describe your problem

I started parsing a document 10 hours ago. Initially, everything seemed normal, but coz' I enabled the RAPTOR policy, after the document parsing finished at normal speed, it got stuck at the RAPTOR stage with no progress. Out of the 10 hours, less than 1 hours were spent parsing the document, and the rest of the time (up until now) has been stuck at the RAPTOR stage. The CPU, memory, and disk are all in an idle state, and there are no errors in the logs. What could be the reason?

RAGFLOW version:v0.15.0 full (doc engine: es)

> 
> 开始于:
> Sun, 22 Dec 2024 23:57:01 GMT
> 持续时间:
> 36189.00 s
> 进度:
> Start to do RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
> Task has been received.
> Page(1~13): OCR started
> Page(1~13): OCR finished (6.26s)
> Page(1~13): Layout analysis (8.27s)
> Page(1~13): Table analysis (0.00s)
> Page(1~13): Text extraction (0.01s)
> Page(1~13): Start to generate keywords for every chunk ...
> Page(1~13): Keywords generation completed in 55.18s
> Page(1~13): Start to generate questions for every chunk ...
> Page(1~13): Question generation completed in 71.36s
> Page(1~13): Generate 85 chunks
> Page(1~13): Embedding chunks (3.12s)
> Page(1~13): Done (2.58s)
> Task has been received.
> Page(13~25): OCR started
> Page(13~25): OCR finished (6.11s)
> Page(13~25): Layout analysis (7.83s)
> Page(13~25): Table analysis (0.00s)
> Page(13~25): Text extraction (0.01s)
> Page(13~25): Start to generate keywords for every chunk ...
> Page(13~25): Keywords generation completed in 57.58s
> Page(13~25): Start to generate questions for every chunk ...
> Page(13~25): Question generation completed in 80.04s
> Page(13~25): Generate 96 chunks
> Page(13~25): Embedding chunks (2.73s)
> Page(13~25): Done (2.44s)
> Task has been received.
> Page(25~37): OCR started
> Page(25~37): OCR finished (6.12s)
> Page(25~37): Layout analysis (7.96s)
> Page(25~37): Table analysis (0.00s)
> Page(25~37): Text extraction (0.01s)
> Page(25~37): Start to generate keywords for every chunk ...
> Page(25~37): Keywords generation completed in 45.91s
> Page(25~37): Start to generate questions for every chunk ...
> Page(25~37): Question generation completed in 58.98s
> Page(25~37): Generate 74 chunks
> Page(25~37): Embedding chunks (2.10s)
> Page(25~37): Done (1.38s)
> Task has been received.
> Page(37~49): OCR started
> Page(37~49): OCR finished (6.81s)
> Page(37~49): Layout analysis (8.68s)
> Page(37~49): Table analysis (0.00s)
> Page(37~49): Text extraction (0.01s)
> Page(37~49): Start to generate keywords for every chunk ...
> Page(37~49): Keywords generation completed in 51.67s
> Page(37~49): Start to generate questions for every chunk ...
> Page(37~49): Question generation completed in 67.50s
> Page(37~49): Generate 82 chunks
> Page(37~49): Embedding chunks (2.64s)
> Page(37~49): Done (2.83s)
> Task has been received.
> Page(49~59): OCR started
> Page(49~59): OCR finished (5.52s)
> Page(49~59): Layout analysis (6.60s)
> Page(49~59): Table analysis (0.00s)
> Page(49~59): Text extraction (0.01s)
> Page(49~59): Start to generate keywords for every chunk ...
> Page(49~59): Keywords generation completed in 59.46s
> Page(49~59): Start to generate questions for every chunk ...
> Page(49~59): Question generation completed in 70.61s
> Page(49~59): Generate 94 chunks
> Page(49~59): Embedding chunks (3.12s)
> Page(49~59): Done (1.81s)

Additionally, according to the processing flow, RAPTOR should be the last process to be executed. However, in the logs, the RAPTOR logs from the final stage are being inserted at the top of the log, instead of at the end. Is this normal or a bug?

@mayanlong2020 mayanlong2020 added the question Further information is requested label Dec 23, 2024
KevinHuSh added a commit that referenced this issue Dec 23, 2024
### What problem does this PR solve?

#4173

### Type of change

- [x] Performance Improvement
@mayanlong2020
Copy link
Author

thanks & cheers~

baifachuan pushed a commit to baifachuan/ragflow that referenced this issue Dec 26, 2024
### What problem does this PR solve?

infiniflow#4173

### Type of change

- [x] Performance Improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant