-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use thd format qkv with cp + packed_seq_params #1368
Comments
Hi @Wraythh CP splits sequence into CP*2 chunks, and each GPU gets 2 chunks (GPU0 gets first and last chunks, GPU1 gets second and second last chunks, and so on), this is for load balancing with causal masking. THD+CP implementation in TE splits each individual sequence of the packed sequence into CP2 chunks, so you need to pad each individual sequence to a length that is divisible by CP2. Here is an example of how we split the input. You should pass TE CP unit test is a good reference for you. Thanks. |
OK thank you very much. What will happen if each of each individual sequence is not divisible by CP*2? Will it cause a loss crash? I use the tex.thd_get_partitioned_indices API to split my sequence, and pass cu_seqlen_q form like [0, 4, 12, 18, 20] to TE API but I found the loss will become NaN. Everything works fine when I don't pass the cu_seqlen_q parameter. |
You need to pad each individual sequence to be divisible by CP*2 (refer here). After you pad each sequence to meet the divisible requirement, you need both |
Thank you very much |
If I have a dataset with sequence lengths of [4, 8, 6, 10], and I use cp2 to split the data, I observe that te performs the operation cu_seqlen_q / cp_size on cu_seqlen_q. This means I need to split each subsequence in the sequence into two subsequences and then concatenate them, resulting in two subsequences of [2, 4, 3, 5]. Should I pass cu_seqlen_q as [0, 4, 12, 18, 20] to both cp_rank instances in this case, or is there an issue with this usage?
The text was updated successfully, but these errors were encountered: