Skip to content

Commit a8a1be5

Browse files
committed
fix typo
1 parent 161620e commit a8a1be5

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

blogs/zenflow-corebinding/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This raised a question: Could this switch also benefit ZeRO Offload? We conduct
2323

2424
**Model:** Qwen2.5-3B
2525

26-
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor1TB memory
26+
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor, 1TB memory
2727

2828
**Test URL:** [DeepSpeedExamples/training/DeepSpeed-ZenFlow/finetuning](https://github.com/deepspeedai/DeepSpeedExamples/tree/master/training/DeepSpeed-ZenFlow/finetuning) (All following tests are using the same URL)
2929

@@ -69,7 +69,7 @@ From this data, DeepSpeed's core binding provides approximately a 15% performanc
6969

7070
**Model:** Qwen2.5-3B
7171

72-
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor1TB memory
72+
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor, 1TB memory
7373

7474
**DeepSpeed commit:** 1d7b90adc48d57c2283e8825f5c668a3730ff899
7575

@@ -140,7 +140,7 @@ Under this new core binding mechanism, we re-evaluated the performance of ZenFlo
140140

141141
**Model:** Qwen2.5-3B
142142

143-
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor1TB memory
143+
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor, 1TB memory
144144

145145
**DeepSpeed commit:** 80033a82938f6cd8ce4988a63c914941e7a8f324
146146

@@ -166,7 +166,7 @@ We conducted a comparative analysis of the performance across several configurat
166166

167167
**Model:** Qwen2.5-3B
168168

169-
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor1TB memory
169+
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor, 1TB memory
170170

171171
The result clearly shows that the improved ZenFlow achieves a 2.59x speedup compared to ZeRO Offload without core binding, and a 2.24x speedup compared to ZeRO Offload with core binding.
172172

@@ -186,9 +186,9 @@ Since we couldn't run Qwen2.5-3B with ZeRO2 using the same config on two GPUs in
186186
| ZeRO Offload with DeepSpeed core binding | 1365ms | 17.6% |
187187
| DeepSpeed core binding + new ZenFlow worker core binding | 569ms | 42.2% |
188188

189-
**Model: Qwen2.5-B**
189+
**Model: Qwen2.5-1.5B**
190190

191-
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor1TB memory
191+
**Test environment:** 2xDGX-A100-SXM4-40GB, 2xAMD EPYC 7742 64-Core Processor, 1TB memory
192192

193193
Based on the tests conducted on 2xA100 GPUs, the practicality metric for ZeRO Offload was 17.6%, while ZenFlow achieved a practicality metric of 42.2%. This result demonstrates that ZenFlow significantly improves the practicality of offloading.
194194

0 commit comments

Comments
 (0)