Sequence parallelism #2412

djsaunde · 2025-03-13T23:05:40Z

Description

This PR implements sequence parallelism via ring-flash-attn. Specifically, their hf_adapter.py module is used to patch transformers flash attention with llama3_flash_attn_varlen_func, the SP implementation from the llama3 tech report. This technically isn't ring attention, but is the most performant SP variant in most cases.

I think since the batch API (non-sample packing case) is a special case of the varlen API (sample packing case), these changes should be sufficient to cover both cases, but this should be validated with tests.

Motivation and Context

SP is necessary for long context post-training where the VRAM on a single card results in OOM for a single sequence. If a user has >1 GPUs, they can run longer context post-training by enabling this option.

The attention is distributed across the GPUs according to the set sequence_parallel_degree (i.e., if sequence_parallel_degree = 4, then sequences are split into 4 equal-length chunks). Attention is computed on each of the sub-sequences, and then comm is done inter-GPU in order to complete the attention computation.

How has this been tested?

pytest coverage (not super comprehensive) and functional tests.

Screenshots (if appropriate)

Types of changes

ring-flash-attn hf_adapter.py integration
Data collation changes (sequence splitting, position ID adjustment)
AxolotlTrainer sampler, dataloader changes
- Refactor multipack sampler logic to helper method
- DistributedSampler for SP case
  - Setting rank = SP group ID allows us to sample data according to SP group
- Data loader (in the SP case) is not prepared for distributed training by the accelerator object
  - Distribution already handled by the DistributedSampler
Bonus: added random_init flag to load model without pretrained weights
Bonus: a bit of cleanup

winglian · 2025-03-14T19:08:24Z

src/axolotl/utils/models.py

@@ -548,6 +553,14 @@ def apply_patches(self) -> None:

            patch_self_attn_lora(self.cfg)

+        if self.cfg.sequence_parallel_degree > 1:


Suggested change

if self.cfg.sequence_parallel_degree > 1:

if self.cfg.sequence_parallel_degree and self.cfg.sequence_parallel_degree > 1:

should fix the NoneType comparison exception in the e2e tests

djsaunde added 19 commits March 13, 2025 15:48

adding easy_context as integration for now

5beef83

progress on ring attn impl

1b7185e

progress on ring attn impl

50699e0

cleanup

c259763

remove errant file

8b7fd9d

fix req

1cb29b6

removing unused code

70b489f

updates

d2a3d30

pytest

8892d9b

update

1094eea

updates

3a2167d

fixes

508c71a

precommit fixes

af8c447

working multi-group SP

6ac786f

fixing sample packing

a156529

remove debug logs and simplify

39bc435

eval dataloader and sampler changes

b8bf3ee

removing some obvious comments

f067126

update config.qmd and rename option

a24428e

djsaunde self-assigned this Mar 13, 2025

djsaunde and others added 10 commits March 13, 2025 23:30

scoping down problematic import

71b9fe0

another import scoping change

28d3cdd

pernicious Fire CLI bugfix

3f4397e

isolate cli tests

24cb60c

actually isolate CLI tests

036d31b

gracefully handle no ring-flash-attn

936e4b5

fix

9c2b082

fix

ea1578b

move ring flash attn to extras with flash-attn (#2414)

f5b80ca

removing flash-attn from requirements.txt (in setup.py extras already)

204b559

rename file, delete another

7e84767

winglian reviewed Mar 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence parallelism #2412

Sequence parallelism #2412

djsaunde commented Mar 13, 2025 •

edited

Loading

winglian Mar 14, 2025

		@@ -548,6 +553,14 @@ def apply_patches(self) -> None:

		patch_self_attn_lora(self.cfg)

		if self.cfg.sequence_parallel_degree > 1:

	if self.cfg.sequence_parallel_degree > 1:
	if self.cfg.sequence_parallel_degree and self.cfg.sequence_parallel_degree > 1:

Sequence parallelism #2412

Are you sure you want to change the base?

Sequence parallelism #2412

Conversation

djsaunde commented Mar 13, 2025 • edited Loading

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

winglian Mar 14, 2025

Choose a reason for hiding this comment

djsaunde commented Mar 13, 2025 •

edited

Loading