docs: add more details on CP + SFT support (#447)

ashors1 · terrykong · commit 6721e7241540 · 2024-12-18T13:29:56.000-08:00
Signed-off-by: ashors1 &lt;ashors@nvidia.com&gt;
Signed-off-by: Terry Kong &lt;terryk@nvidia.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,7 +18,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 ## [Next Version]
 
 ### New Features and Optimizations
-- Added context parallel support for SFT. CP can be enabled by setting `model.context_parallel_size` in your config.
+- Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's [prepare_packed_ft_dataset.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py) script prior to training. Be sure to pass the context parallel size to this script, for example:
+
+   ```
+   python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
+      model.data.train_ds.file_names=[/path/to/training.jsonl] \
+      model.data.train_ds.max_seq_length=2048 \
+      +tokenizer_path=/path/to/tokenizer \
+      +output_dir=/path/to/output_folder \
+      +pack_sizes=[2048,4096,8192] \
+      model.context_parallel_size=2
+   ```
+  CP can then be enabled in your training run by setting `model.context_parallel_size` in your config. Refer to the [SFT documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/user-guide/sft.rst#step-1-format-the-data)
+for more details on running `prepare_packed_ft_dataset.py` and on running SFT with a packed dataset.
 - Sequence packing is now supported when running DPO.
 - Added support for Knowledge Distillation with SFT. See the [tutorial](docs/user-guide/knowledge-distillation.rst) for details.
 - Added support for Megatron Core’s distributed optimizer, which can be configured using `++model.optim.name=mcore_distributed_optim`.