Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got stuck when evaluating MMLU #1

Open
zhangliang-04 opened this issue Dec 7, 2023 · 4 comments
Open

Got stuck when evaluating MMLU #1

zhangliang-04 opened this issue Dec 7, 2023 · 4 comments

Comments

@zhangliang-04
Copy link

zhangliang-04 commented Dec 7, 2023

Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:

Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

it seems the process is stuck in the batched tokenizing, how to deal with this?

@omarnj-lab
Copy link

omarnj-lab commented Jan 3, 2024

Is your problem solved ? please let me know since I am dealing with same issue

@usernameisintentionallyhidden
Copy link

usernameisintentionallyhidden commented Jan 28, 2024

You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.

@dame-cell
Copy link

dame-cell commented Mar 9, 2024

You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.

How do we do this exactly?

@dame-cell
Copy link

Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:

Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

it seems the process is stuck in the batched tokenizing, how to deal with this?

Did you fix this ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants