-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM terminated unexpectedly at the multi-gpu environment #512
Comments
Hello @gnekt First of all, Thanks for the report. And about the termination of vLLM module. It can be OOM error. Do you check your VRAM status? |
CPU: AMD EPYC 7402P (48) @ 2.800GHz
This is our # This config YAML file does not contain any optimization.
node_lines:
- node_line_name: retrieve_node_line # Arbitrary node line name
nodes:
- node_type: retrieval
strategy:
metrics: [retrieval_f1, retrieval_recall, retrieval_precision]
top_k: 3
modules:
- module_type: vectordb
embedding_model: huggingface_baai_bge_small
- node_line_name: post_retrieve_node_line # Arbitrary node line name
nodes:
- node_type: prompt_maker
strategy:
metrics: [bleu, meteor, rouge]
modules:
- module_type: fstring
prompt: "Read the passages and answer the given question. \n Question: {query} \n Passage: {retrieved_contents} \n Answer : "
- node_type: generator
strategy:
metrics: [bleu, meteor, rouge]
modules:
- module_type: vllm
llm: mistralai/Mistral-7B-Instruct-v0.2 We checked the VRAM status and it takes only 1GB for the embedding computation (it seems not possible to occur in OOM error). |
This is the last logger output when executing autorag: UserWarning: This pandas object has duplicate indices, and swifter may not be able to improve performance. Consider resetting the indices with `df.reset_index(drop=True)`.
warnings.warn(
[06/20/24 08:39:54] INFO [evaluator.py:97] >> Running node line post_retrieve_node_line... evaluator.py:97
INFO [node.py:55] >> Running node prompt_maker... node.py:55
INFO [base.py:20] >> Running prompt maker node - fstring module... base.py:20
INFO [node.py:55] >> Running node generator... node.py:55
INFO [base.py:34] >> Running generator node - vllm module... base.py:34
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
|
Hi @gnekt I checked Eli5 and triviaQA dataset and the dataset had no missing doc_id in the corpus dataset. Unfortunately, I can't find the error you mentioned. @gnekt @CristianCosci Plus, about the vLLM termination, actually I can't find this error also. I think it is because the vLLM install environment? (It worked fine on our system) Maybe re-install vLLM to the latest version can be helpful.... My vllm version is 0.4.3 |
It looks like there is an error at the multi-gpu environment... |
I finally get a multi-gpu environment. So I can try to reproduce a bug and resolve it. |
Because of the CUDA-related issue, we lose our multi-gpu environment again 😭 |
No problem 😃 |
Hello,
While using the ELI5 and TriviaQA datasets from the Hugging Face library, I encountered errors related to missing documents that are not present in the corpus. I experienced a similar issue with the HotpotQA dataset but managed to resolve it by cleaning the mismatched documents.
However, when I switched to using the HotpotQA dataset, I observed some strange behavior with the VLLM module. Specifically, the process is terminated without any error messages.
Thank you in advance.
The text was updated successfully, but these errors were encountered: