Context Parallelism using Xformers, support,
- Blockwise Attention, https://arxiv.org/abs/2305.19370
- Ring Attention, https://arxiv.org/abs/2310.01889
- Tree Attention, https://arxiv.org/abs/2408.04093
Xformers implemented partial attention, https://facebookresearch.github.io/xformers/_modules/xformers/ops/fmha.html#memory_efficient_attention_partial and the parameters are straight forward.