Skip to content

Commit 6721e72

Browse files
ashors1terrykong
authored andcommitted
docs: add more details on CP + SFT support (#447)
Signed-off-by: ashors1 <[email protected]> Signed-off-by: Terry Kong <[email protected]>
1 parent 5fffa58 commit 6721e72

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

CHANGELOG.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
1818
## [Next Version]
1919

2020
### New Features and Optimizations
21-
- Added context parallel support for SFT. CP can be enabled by setting `model.context_parallel_size` in your config.
21+
- Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's [prepare_packed_ft_dataset.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py) script prior to training. Be sure to pass the context parallel size to this script, for example:
22+
23+
```
24+
python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
25+
model.data.train_ds.file_names=[/path/to/training.jsonl] \
26+
model.data.train_ds.max_seq_length=2048 \
27+
+tokenizer_path=/path/to/tokenizer \
28+
+output_dir=/path/to/output_folder \
29+
+pack_sizes=[2048,4096,8192] \
30+
model.context_parallel_size=2
31+
```
32+
CP can then be enabled in your training run by setting `model.context_parallel_size` in your config. Refer to the [SFT documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/user-guide/sft.rst#step-1-format-the-data)
33+
for more details on running `prepare_packed_ft_dataset.py` and on running SFT with a packed dataset.
2234
- Sequence packing is now supported when running DPO.
2335
- Added support for Knowledge Distillation with SFT. See the [tutorial](docs/user-guide/knowledge-distillation.rst) for details.
2436
- Added support for Megatron Core’s distributed optimizer, which can be configured using `++model.optim.name=mcore_distributed_optim`.

0 commit comments

Comments
 (0)