Using the humanevalpack to test the ChatGLM3 model results in an abnormal score. #251

burger-pb · 2024-07-05T13:59:31Z

Hi
When I tried to test the ChatGLM3 model using the humanevalfixdocs-python task in humanevalpack, an abnormality occurred with a score of 0. The command used is as follows.

accelerate launch main.py
--model THUDM/chatglm3-6b
--left_padding
--tasks humanevalfixdocs-python
--max_length_generation 2048
--prompt chatglm3
--trust_remote_code
--temperature 0.7
--do_sample True
--n_samples 1
--batch_size 64
--precision bf16
--allow_code_execution
--save_generations

The prompt I used is as follows.
elif self.prompt == "chatglm3":
prompt = f"<|user|>{inp}<|assistant|>{prompt_base}"
else:
raise ValueError(f"The --prompt argument {self.prompt} wasn't provided or isn't supported")

The issue I encountered is that the model did not generate any answers in the generations.
The answer to the first question is as follows, and it can be seen that no response from the model was obtained.

from typing import List\n\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n """ Check if in given list of numbers, are any two numbers closer to each other than\n given threshold.\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\n False\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\n True\n """

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the humanevalpack to test the ChatGLM3 model results in an abnormal score. #251

Using the humanevalpack to test the ChatGLM3 model results in an abnormal score. #251

burger-pb commented Jul 5, 2024 •

edited

Loading

Using the humanevalpack to test the ChatGLM3 model results in an abnormal score. #251

Using the humanevalpack to test the ChatGLM3 model results in an abnormal score. #251

Comments

burger-pb commented Jul 5, 2024 • edited Loading

burger-pb commented Jul 5, 2024 •

edited

Loading