Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed after adding gunicorn #94

Open
CharlieZJG opened this issue Nov 15, 2024 · 1 comment
Open

failed after adding gunicorn #94

CharlieZJG opened this issue Nov 15, 2024 · 1 comment

Comments

@CharlieZJG
Copy link

so I have a self-defined Dockerfile.custom file that tries to build an image on top of the base image. i realize that when i try to add the gunicorn configuration, the program crashes. specifically i'm observing these error logs:
first log:
2024-11-15 09:37:26 2024-11-15 14:37:26,047 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar.md5 to /tmp/tika-server.jar.md5. 2024-11-15 09:37:26 2024-11-15 14:37:26,049 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar.md5 to /tmp/tika-server.jar.md5. 2024-11-15 09:37:26 2024-11-15 14:37:26,052 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar.md5 to /tmp/tika-server.jar.md5. 2024-11-15 09:37:26 2024-11-15 14:37:26,054 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0.jar.md5 to /tmp/tika-server.jar.md5. 2024-11-15 09:37:26 2024-11-15 14:37:26,565 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:26 2024-11-15 14:37:26,565 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:26 2024-11-15 14:37:26,566 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:31 2024-11-15 14:37:31,499 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:31 2024-11-15 14:37:31,567 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:31 2024-11-15 14:37:31,567 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:31 2024-11-15 14:37:31,567 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:35 [2024-11-15 14:37:35 +0000] [1] [ERROR] Worker (pid:79) exited with code 255 2024-11-15 09:37:35 [2024-11-15 14:37:35 +0000] [1] [ERROR] Worker (pid:79) exited with code 255. 2024-11-15 09:37:35 [2024-11-15 14:37:35 +0000] [1] [ERROR] Worker (pid:81) exited with code 255 2024-11-15 09:37:35 [2024-11-15 14:37:35 +0000] [1] [ERROR] Worker (pid:81) exited with code 255. 2024-11-15 09:37:36 2024-11-15 14:37:36,505 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:36 2024-11-15 14:37:36,568 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:36 2024-11-15 14:37:36,568 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:36 2024-11-15 14:37:36,569 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:40 [2024-11-15 14:37:40 +0000] [1] [ERROR] Worker (pid:622) exited with code 255 2024-11-15 09:37:40 [2024-11-15 14:37:40 +0000] [1] [ERROR] Worker (pid:622) exited with code 255. 2024-11-15 09:37:41 2024-11-15 14:37:41,513 [MainThread ] [WARNI] Failed to see startup log message; retrying... 2024-11-15 09:37:41 2024-11-15 14:37:41,571 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2024-11-15 09:37:41 2024-11-15 14:37:41,572 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2024-11-15 09:37:41 2024-11-15 14:37:41,573 [MainThread ] [ERROR] Tika startup log message not received after 3 tries. 2024-11-15 09:37:41 2024-11-15 14:37:41,573 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer. 2024-11-15 09:37:41 2024-11-15 14:37:41,574 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer. 2024-11-15 09:37:41 2024-11-15 14:37:41,575 [MainThread ] [ERROR] Failed to receive startup confirmation from startServer. 2024-11-15 09:37:46 [2024-11-15 14:37:46 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:10) 2024-11-15 09:37:46 [2024-11-15 14:37:46 +0000] [10] [INFO] Worker exiting (pid: 10) 2024-11-15 09:37:46 /app/nlm_ingestor/ingestion_daemon/__main__.py:upload_complete_time: 1731681436.0297816, process_start_time: 1731681436.0297854, initial_memory: 338309120, initial_cpu: 0.1 2024-11-15 09:37:46 [2024-11-15 14:37:46 +0000] [1] [ERROR] Worker (pid:10) exited with code 1 2024-11-15 09:37:46 [2024-11-15 14:37:46 +0000] [1] [ERROR] Worker (pid:10) exited with code 1. 2024-11-15 09:37:46 [2024-11-15 14:37:46 +0000] [1209] [INFO] Booting worker with pid: 1209

SECOND ERROR:
2024-11-15 09:38:35 /app/nlm_ingestor/ingestion_daemon/__main__.py:upload_complete_time: 1731681436.0395405, process_start_time: 1731681436.0395412, initial_memory: 337965056, initial_cpu: 0.1 2024-11-15 09:38:35 error uploading file, stacktrace: Traceback (most recent call last): 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestion_daemon/__main__.py", line 64, in parse_document 2024-11-15 09:38:35 return_dict, _ = ingestor_api.ingest_document( 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/ingestor_api.py", line 37, in ingest_document 2024-11-15 09:38:35 ingestor = pdf_ingestor.PDFIngestor(doc_location, parse_options) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 33, in __init__ 2024-11-15 09:38:35 tika_html_doc = parse_pdf(doc_location, parse_options) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 61, in parse_pdf 2024-11-15 09:38:35 parsed_content = pdf_file_parser.parse_to_html(doc_location) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/file_parser/tika_parser.py", line 34, in parse_to_html 2024-11-15 09:38:35 return parser.from_file(filepath, xmlContent=True, requestOptions={'headers': headers, 'timeout': timeout}) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/usr/local/lib/python3.11/site-packages/tika/parser.py", line 42, in from_file 2024-11-15 09:38:35 output = parse1(service, filename, serverEndpoint, services={'meta': '/meta', 'text': '/tika', 'all': '/rmeta/xml'}, 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/usr/local/lib/python3.11/site-packages/tika/tika.py", line 337, in parse1 2024-11-15 09:38:35 status, response = callServer('put', serverEndpoint, service, f, 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/usr/local/lib/python3.11/site-packages/tika/tika.py", line 532, in callServer 2024-11-15 09:38:35 serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/usr/local/lib/python3.11/site-packages/tika/tika.py", line 602, in checkTikaServer 2024-11-15 09:38:35 raise RuntimeError("Unable to start Tika server.") 2024-11-15 09:38:35 RuntimeError: Unable to start Tika server.

THIRD ERROR:
2024-11-15 09:38:35 error uploading file, stacktrace: Traceback (most recent call last): 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestion_daemon/__main__.py", line 64, in parse_document 2024-11-15 09:38:35 return_dict, _ = ingestor_api.ingest_document( 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/ingestor_api.py", line 37, in ingest_document 2024-11-15 09:38:35 ingestor = pdf_ingestor.PDFIngestor(doc_location, parse_options) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 35, in __init__ 2024-11-15 09:38:35 blocks, _block_texts, _sents, _file_data, result, page_dim, num_pages = parse_blocks( 2024-11-15 09:38:35 ^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/pdf_ingestor.py", line 176, in parse_blocks 2024-11-15 09:38:35 parsed_doc = visual_ingestor.Doc(pages, ignore_blocks, render_format) 2024-11-15 09:38:35 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 117, in __init__ 2024-11-15 09:38:35 self.parse(pages) 2024-11-15 09:38:35 File "/app/nlm_ingestor/ingestor/visual_ingestor/visual_ingestor.py", line 155, in parse 2024-11-15 09:38:35 page_style = pages[page_idx].attrs.get("style", None) or pages[0].attrs["style"] 2024-11-15 09:38:35 ~~~~~~~~~~~~~~^^^^^^^^^ 2024-11-15 09:38:35 KeyError: 'style'

I see that another PR indicate switching the base image to v0.1.6 can solve the KeyError: 'style' error but it did not work for me.
here's my Dockerfile.custom that is causing it to crash:

FROM ghcr.io/nlmatics/nlm-ingestor:v0.1.6
RUN pip install psutil gunicorn waitress

WORKDIR /app

ENV PYTHONPATH="/app"

COPY tika-server-standard-2.6.0.jar /tmp/tika-server.jar

CMD ["gunicorn", "-w", "4", "--threads", "1", "--bind", "0.0.0.0:5001", "nlm_ingestor.ingestion_daemon.__main__:app"]

Note that without this line: CMD ["gunicorn", "-w", "4", "--threads", "1", "--bind", "0.0.0.0:5001", "nlm_ingestor.ingestion_daemon.__main__:app"]
Everything worked out perfect

@lav-sharma-cerelabs
Copy link

Hello @CharlieZJG, Can you let me know what you are trying to achieve here? Also, I had to manually indent the errors properly as they are not at all readable in the way you have posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants