Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM terminated unexpectedly at the multi-gpu environment #512

Open
gnekt opened this issue Jun 20, 2024 · 8 comments
Open

vLLM terminated unexpectedly at the multi-gpu environment #512

gnekt opened this issue Jun 20, 2024 · 8 comments
Labels
bug Something isn't working help wanted Extra attention is needed High Priority

Comments

@gnekt
Copy link
Contributor

gnekt commented Jun 20, 2024

Hello,

While using the ELI5 and TriviaQA datasets from the Hugging Face library, I encountered errors related to missing documents that are not present in the corpus. I experienced a similar issue with the HotpotQA dataset but managed to resolve it by cleaning the mismatched documents.

However, when I switched to using the HotpotQA dataset, I observed some strange behavior with the VLLM module. Specifically, the process is terminated without any error messages.

Thank you in advance.

@vkehfdl1
Copy link
Contributor

Hello @gnekt First of all, Thanks for the report.
We'll check Eli5 and triviaQA dataset first. @bwook00 @Eastsidegunn will help this also.

And about the termination of vLLM module. It can be OOM error. Do you check your VRAM status?
And what file did you use, and what system you used for running each dataset?

@CristianCosci
Copy link

And what file did you use, and what system you used for running each dataset?

CPU: AMD EPYC 7402P (48) @ 2.800GHz
GPU0: NVIDIA GeForce RTX 3090
GPU1: NVIDIA GeForce RTX 3090
Memory: 3042MiB / 128667MiB

And about the termination of vLLM module. It can be OOM error. Do you check your VRAM status?

This is our config.yaml:

# This config YAML file does not contain any optimization.
node_lines:
- node_line_name: retrieve_node_line  # Arbitrary node line name
  nodes:
    - node_type: retrieval
      strategy:
        metrics: [retrieval_f1, retrieval_recall, retrieval_precision]
      top_k: 3
      modules:
        - module_type: vectordb
          embedding_model: huggingface_baai_bge_small
- node_line_name: post_retrieve_node_line  # Arbitrary node line name
  nodes:
    - node_type: prompt_maker
      strategy:
        metrics: [bleu, meteor, rouge]
      modules:
        - module_type: fstring
          prompt: "Read the passages and answer the given question. \n Question: {query} \n Passage: {retrieved_contents} \n Answer : "
    - node_type: generator
      strategy:
        metrics: [bleu, meteor, rouge]
      modules:
        - module_type: vllm
          llm: mistralai/Mistral-7B-Instruct-v0.2

We checked the VRAM status and it takes only 1GB for the embedding computation (it seems not possible to occur in OOM error).

@CristianCosci
Copy link

This is the last logger output when executing autorag:

UserWarning: This pandas object has duplicate indices, and swifter may not be able to improve performance. Consider resetting the indices with `df.reset_index(drop=True)`.
  warnings.warn(
[06/20/24 08:39:54] INFO     [evaluator.py:97] >> Running node line post_retrieve_node_line...                                    evaluator.py:97
                    INFO     [node.py:55] >> Running node prompt_maker...                                                              node.py:55
                    INFO     [base.py:20] >> Running prompt maker node - fstring module...                                             base.py:20
                    INFO     [node.py:55] >> Running node generator...                                                                 node.py:55
                    INFO     [base.py:34] >> Running generator node - vllm module...                                                   base.py:34
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

@vkehfdl1 vkehfdl1 added the bug Something isn't working label Jun 21, 2024
@vkehfdl1 vkehfdl1 changed the title Sample Datasets Error vLLM terminated unexpectedly Jun 22, 2024
@vkehfdl1
Copy link
Contributor

Hi @gnekt I checked Eli5 and triviaQA dataset and the dataset had no missing doc_id in the corpus dataset. Unfortunately, I can't find the error you mentioned.

@gnekt @CristianCosci Plus, about the vLLM termination, actually I can't find this error also. I think it is because the vLLM install environment? (It worked fine on our system)

Maybe re-install vLLM to the latest version can be helpful.... My vllm version is 0.4.3

@vkehfdl1
Copy link
Contributor

vkehfdl1 commented Aug 5, 2024

It looks like there is an error at the multi-gpu environment...

@vkehfdl1 vkehfdl1 changed the title vLLM terminated unexpectedly vLLM terminated unexpectedly at the multi-gpu environment Aug 5, 2024
@vkehfdl1 vkehfdl1 added the help wanted Extra attention is needed label Aug 15, 2024
@vkehfdl1
Copy link
Contributor

I finally get a multi-gpu environment. So I can try to reproduce a bug and resolve it.

@vkehfdl1
Copy link
Contributor

Because of the CUDA-related issue, we lose our multi-gpu environment again 😭
We need more time to resolve this.....

@CristianCosci
Copy link

No problem 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed High Priority
Projects
None yet
Development

No branches or pull requests

3 participants