You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Step 2: Create and Enter DataFlow Working Directory
50
43
51
44
```bash
52
45
mkdir workspace
53
46
cd workspace
54
47
```
55
48
56
-
57
-
58
49
## Step 3: Prepare Evaluation Data and Initialize Configuration Files
59
50
60
51
Initialize configuration files
61
-
62
52
```bash
63
53
dataflow eval init
64
54
```
65
55
66
-
After initialization, the project directory structure becomes:
67
-
56
+
💡After initialization, the project directory structure becomes:
68
57
```bash
69
58
Project Root/
70
59
├── eval_api.py # Configuration file for API model evaluator
71
60
└── eval_local.py # Configuration file for local model evaluator
72
61
```
73
62
74
-
75
-
76
63
## Step 4: Prepare Evaluation Data
77
64
78
-
79
-
80
65
### Method 1: JSON Format
81
66
82
67
Please prepare a JSON format file with data structure similar to the example below:
83
-
84
68
```json
85
69
[
86
70
{
@@ -90,8 +74,7 @@ Please prepare a JSON format file with data structure similar to the example bel
90
74
]
91
75
```
92
76
93
-
In this example data:
94
-
77
+
💡In this example data:
95
78
-`input` is the question (can also be question + answer choices merged into one input)
96
79
97
80
-`output` is the standard answer
@@ -101,7 +84,6 @@ In this example data:
101
84
### Method 2: Custom Field Mapping
102
85
103
86
You can also skip data preprocessing (as long as you have clear question and standard answer fields) and configure field name mapping through `eval_api.py` and `eval_local.py`:
104
-
105
87
```python
106
88
EVALUATOR_RUN_CONFIG= {
107
89
"input_test_answer_key": "model_generated_answer", # Field name for model-generated answers
@@ -110,14 +92,11 @@ EVALUATOR_RUN_CONFIG = {
110
92
}
111
93
```
112
94
113
-
114
-
115
95
## Step 5: Configure Parameters
116
-
96
+
### Model Parameter Configure
117
97
If you want to use a local model as the evaluator, please modify the parameters in the `eval_local.py` file.
118
98
119
99
If you want to use an API model as the evaluator, please modify the parameters in the `eval_api.py` file.
120
-
121
100
```python
122
101
# Target Models Configuration (same as API mode)
123
102
@@ -133,34 +112,55 @@ TARGET_MODELS = [
133
112
134
113
# 3. Custom configuration
135
114
# Add more models...
136
-
# {
137
-
# "name": "llama_8b",
138
-
# "path": "meta-llama/Llama-3-8B-Instruct",
139
-
# "tensor_parallel_size": 2,
140
-
# "max_tokens": 2048,
141
-
# "gpu_memory_utilization": 0.9,
142
-
143
-
# # You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
144
-
# # Default prompt for evaluated models
145
-
# # IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
146
-
# "answer_prompt": """please answer the questions:
147
-
# question:{question}
148
-
# answer:"""
149
-
# }
115
+
116
+
{
117
+
"name": "qwen_7b", # Model name
118
+
"path": "./Qwen2.5-7B-Instruct", # Model path
119
+
# Large language models can use different parameters
120
+
"vllm_tensor_parallel_size": 4, # Number of GPUs
121
+
"vllm_temperature": 0.1, # Randomness
122
+
"vllm_top_p": 0.9, # Top-p sampling
123
+
"vllm_max_tokens": 2048, # Maximum number of tokens
0 commit comments