Got stuck when evaluating MMLU #1

zhangliang-04 · 2023-12-07T13:18:28Z

Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:

Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

it seems the process is stuck in the batched tokenizing, how to deal with this?

The text was updated successfully, but these errors were encountered:

omarnj-lab · 2024-01-03T12:25:33Z

Is your problem solved ? please let me know since I am dealing with same issue

usernameisintentionallyhidden · 2024-01-28T15:44:10Z

You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.

dame-cell · 2024-03-09T12:17:09Z

You can limit the #token feed to the model to match the max token length. This is rather a problem with lm_eval_harness than additional tasks.

How do we do this exactly?

dame-cell · 2024-03-09T13:20:10Z

Thanks for your open sourcing! i'm trying to evaluate Llama-7b-hf on mmlu-fr, a warning of Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors occurs and it seems the process is stuck. Here is the callstack after keyboard interrupt:

Token indices sequence length is longer than the specified maximum sequence length for this model (5023 > 4096). Running this sequence through the model will result in indexing errors
^CTraceback (most recent call last):
  File "/data2/zl/code/mlmm-evaluation/main.py", line 135, in <module>
    main()
  File "/data2/zl/code/mlmm-evaluation/main.py", line 108, in main
    results = evaluator.open_llm_evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 79, in open_llm_evaluate
    results = evaluate(
  File "/data2/zl/code/mlmm-evaluation/lm_eval/utils.py", line 205, in _wrapper
    return fn(*args, **kwargs)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/evaluator.py", line 262, in evaluate
    resps = getattr(lm, reqtype)([req.args for req in reqs])
  File "/data2/zl/code/mlmm-evaluation/lm_eval/base.py", line 181, in loglikelihood
    context_enc = self.tok_encode(context)
  File "/data2/zl/code/mlmm-evaluation/lm_eval/models/huggingface.py", line 361, in tok_encode
    return self.tokenizer.encode(string, add_special_tokens=self.add_special_tokens)
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2569, in encode
    encoded_inputs = self.encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2977, in encode_plus
    return self._encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _encode_plus
    batched_output = self._batch_encode_plus(
  File "/opt/conda/envs/lm_eval/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 504, in _batch_encode_plus
    encodings = self._tokenizer.encode_batch(
KeyboardInterrupt

it seems the process is stuck in the batched tokenizing, how to deal with this?

Did you fix this ??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got stuck when evaluating MMLU #1

Got stuck when evaluating MMLU #1

zhangliang-04 commented Dec 7, 2023 •

edited

Loading

omarnj-lab commented Jan 3, 2024 •

edited

Loading

usernameisintentionallyhidden commented Jan 28, 2024 •

edited

Loading

dame-cell commented Mar 9, 2024 •

edited

Loading

dame-cell commented Mar 9, 2024

Got stuck when evaluating MMLU #1

Got stuck when evaluating MMLU #1

Comments

zhangliang-04 commented Dec 7, 2023 • edited Loading

omarnj-lab commented Jan 3, 2024 • edited Loading

usernameisintentionallyhidden commented Jan 28, 2024 • edited Loading

dame-cell commented Mar 9, 2024 • edited Loading

dame-cell commented Mar 9, 2024

zhangliang-04 commented Dec 7, 2023 •

edited

Loading

omarnj-lab commented Jan 3, 2024 •

edited

Loading

usernameisintentionallyhidden commented Jan 28, 2024 •

edited

Loading

dame-cell commented Mar 9, 2024 •

edited

Loading