OpenDCAI
diff --git a/‎docs/en/notes/guide/model_evaluation/command_eval.md‎
Lines changed: 47 additions & 47 deletions b/‎docs/en/notes/guide/model_evaluation/command_eval.md‎
Lines changed: 47 additions & 47 deletions
diff --git a/‎docs/en/notes/guide/pipelines/EvalPipeline.md‎
Lines changed: 0 additions & 154 deletions b/‎docs/en/notes/guide/pipelines/EvalPipeline.md‎
Lines changed: 0 additions & 154 deletions
@@ -1,17 +1,15 @@
 ---
-title: Model QA Capability Assessment Pipeline
+title: Model Capability Assessment Pipeline
+createTime: 2025/08/30 14:27:02
 icon: hugeicons:chart-evaluation
-createTime: 2025/10/20 10:41:21
-permalink: /en/guide/qmvjcv9o/
+permalink: /en/guide/evaluation-pipeline/
 ---
 
+# Model Capability Assessment Pipeline
 
-# Model QA Capability Assessment Pipeline
-
-Only supports QA pair format evaluation
+⚠️Only supports QA pair format evaluation
 
 ## Quick Start
-
 ```bash
 cd DataFlow
 pip install -e .[eval]
@@ -32,55 +30,41 @@ dataflow eval init
 dataflow eval api / dataflow eval local
 ```
 
-
-
 ## Step 1: Install Evaluation Environment
 
 Download evaluation environment
-
 ```bash
 cd DataFlow
 pip install -e .[eval]
 cd ..
 ```
 
-
-
 ## Step 2: Create and Enter DataFlow Working Directory
 
 ```bash
 mkdir workspace
 cd workspace
 ```
 
-
-
 ## Step 3: Prepare Evaluation Data and Initialize Configuration Files
 
 Initialize configuration files
-
 ```bash
 dataflow eval init
 ```
 
-After initialization, the project directory structure becomes:
-
+💡After initialization, the project directory structure becomes:
 ```bash
 Project Root/
 ├── eval_api.py      # Configuration file for API model evaluator
 └── eval_local.py    # Configuration file for local model evaluator
 ```
 
-
-
 ## Step 4: Prepare Evaluation Data
 
-
-
 ### Method 1: JSON Format
 
 Please prepare a JSON format file with data structure similar to the example below:
-
 ```json
 [
     {
@@ -90,8 +74,7 @@ Please prepare a JSON format file with data structure similar to the example bel
 ]
 ```
 
-In this example data:
-
+💡In this example data:
 - `input` is the question (can also be question + answer choices merged into one input)
 
 - `output` is the standard answer
@@ -101,7 +84,6 @@ In this example data:
 ### Method 2: Custom Field Mapping
 
 You can also skip data preprocessing (as long as you have clear question and standard answer fields) and configure field name mapping through `eval_api.py` and `eval_local.py`:
-
 ```python
 EVALUATOR_RUN_CONFIG = {
     "input_test_answer_key": "model_generated_answer",  # Field name for model-generated answers
@@ -110,14 +92,11 @@ EVALUATOR_RUN_CONFIG = {
 }
 ```
 
-
-
 ## Step 5: Configure Parameters
-
+### Model Parameter Configure
 If you want to use a local model as the evaluator, please modify the parameters in the `eval_local.py` file.
 
 If you want to use an API model as the evaluator, please modify the parameters in the `eval_api.py` file.
-
 ```python
 # Target Models Configuration (same as API mode)
 
@@ -133,34 +112,55 @@ TARGET_MODELS = [
 
     # 3. Custom configuration
     # Add more models...
-    # {
-    #     "name": "llama_8b",
-    #     "path": "meta-llama/Llama-3-8B-Instruct",
-    #     "tensor_parallel_size": 2,
-    #     "max_tokens": 2048,
-    #     "gpu_memory_utilization": 0.9,
-    
-    #     # You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
-    #     # Default prompt for evaluated models
-    #     # IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
-    #     "answer_prompt": """please answer the questions：
-    #      question：{question}
-    #      answer："""
-    # }
+
+{
+    "name": "qwen_7b",  # Model name
+    "path": "./Qwen2.5-7B-Instruct",  # Model path
+    # Large language models can use different parameters
+    "vllm_tensor_parallel_size": 4,  # Number of GPUs
+    "vllm_temperature": 0.1,  # Randomness
+    "vllm_top_p": 0.9,  # Top-p sampling
+    "vllm_max_tokens": 2048,  # Maximum number of tokens
+    "vllm_repetition_penalty": 1.0,  # Repetition penalty
+    "vllm_seed": None,  # Random seed
+    "vllm_gpu_memory_utilization": 0.9,  # Maximum GPU memory utilization
+    # Custom prompt can be defined for each model
+    "answer_prompt": """please answer the following question:"""
+}
+
+
 ]
 ```
 
-
+### Bench Parameter Configuration
+Supports batch configuration of benchmarks
+```python
+BENCH_CONFIG = [
+    {
+        "name": "bench_name",  # Benchmark name
+        "input_file": "path_to_your_qa/qa.json",  # Data file
+        "question_key": "input",  # Question field name
+        "reference_answer_key": "output",  # Reference answer field name
+        "output_dir": "path/bench_name",  # Output directory
+    },
+    {
+        "name": "other_bench_name",
+        "input_file": "path_to_your_qa/other_qa.json",
+        "question_key": "input",
+        "reference_answer_key": "output",
+        "output_dir": "path/other_bench_name",
+    }
+]
+```
 
 ## Step 6: Run Evaluation
 
 Run local evaluation:
-
 ```bash
 dataflow eval local
 ```
 
 Run API evaluation:
-
 ```bash
-dataflow eval api
+dataflow eval api
+```