How to use thd format qkv with cp + packed_seq_params #1368

Wraythh · 2024-12-12T04:03:09Z

If I have a dataset with sequence lengths of [4, 8, 6, 10], and I use cp2 to split the data, I observe that te performs the operation cu_seqlen_q / cp_size on cu_seqlen_q. This means I need to split each subsequence in the sequence into two subsequences and then concatenate them, resulting in two subsequences of [2, 4, 3, 5]. Should I pass cu_seqlen_q as [0, 4, 12, 18, 20] to both cp_rank instances in this case, or is there an issue with this usage?

xrennvidia · 2024-12-16T20:46:01Z

Hi @Wraythh

CP splits sequence into CP*2 chunks, and each GPU gets 2 chunks (GPU0 gets first and last chunks, GPU1 gets second and second last chunks, and so on), this is for load balancing with causal masking.

THD+CP implementation in TE splits each individual sequence of the packed sequence into CP2 chunks, so you need to pad each individual sequence to a length that is divisible by CP2. Here is an example of how we split the input.

You should pass [0, 4, 12, 18, 20] to TE API, CP code will handle everything under the hood. You may have paddings after you pad each individual sequence to be divisible by CP*2, then you need cu_seqlens_padded for paddings between sequences.

TE CP unit test is a good reference for you.

Thanks.

Wraythh · 2024-12-19T08:38:43Z

OK thank you very much. What will happen if each of each individual sequence is not divisible by CP*2? Will it cause a loss crash? I use the tex.thd_get_partitioned_indices API to split my sequence, and pass cu_seqlen_q form like [0, 4, 12, 18, 20] to TE API but I found the loss will become NaN. Everything works fine when I don't pass the cu_seqlen_q parameter.

xrennvidia · 2024-12-19T09:17:38Z

You need to pad each individual sequence to be divisible by CP*2 (refer here).

After you pad each sequence to meet the divisible requirement, you need both cu_seqlens and cu_seqlens_padded (refer here).

Wraythh · 2024-12-23T03:56:51Z

Thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use thd format qkv with cp + packed_seq_params #1368

How to use thd format qkv with cp + packed_seq_params #1368

Wraythh commented Dec 12, 2024

xrennvidia commented Dec 16, 2024

Wraythh commented Dec 19, 2024 •

edited

Loading

xrennvidia commented Dec 19, 2024

Wraythh commented Dec 23, 2024

How to use thd format qkv with cp + packed_seq_params #1368

How to use thd format qkv with cp + packed_seq_params #1368

Comments

Wraythh commented Dec 12, 2024

xrennvidia commented Dec 16, 2024

Wraythh commented Dec 19, 2024 • edited Loading

xrennvidia commented Dec 19, 2024

Wraythh commented Dec 23, 2024

Wraythh commented Dec 19, 2024 •

edited

Loading