[FEATURE] Control flow support in shard_parallel #400

merrymercy · 2022-04-23T07:41:34Z

Background

Currently, the auto-sharding pass for intra-op parallelism (or shard parallel) does not support any control flow instructions (e.g., while and if). For example, the function compute_alpa() below will cause assertion errors.

import jax
import jax.numpy as jnp
import numpy as np

import alpa

N = 1024
n_iter = 5

x = np.ones((N, N), dtype=np.float32)
w = np.ones((N, N), dtype=np.float32)


def compute_numpy():
    y = x
    for i in range(n_iter):
        y = y @ w
    return y


def func(a, b):
    init_state = (0, x, w)
    cond_func = lambda state: state[0] < n_iter
    body_func = lambda state: (state[0] + 1, state[1] @ state[2], state[2])

    final_state = jax.lax.while_loop(cond_func, body_func, init_state)
    return final_state[1]


def compute_jax_jit():
    return jax.jit(func)(x, w)


def compute_alpa():
    return alpa.parallelize(func)(x, w)


# Check correctness
expected = compute_numpy()
actual = compute_jax_jit()
np.testing.assert_allclose(expected, actual)

# Inspect the HLO IR
hlo_text = jax.jit(func).lower(x, w).compile().compiler_ir()[0].to_string()
print(hlo_text)

# Currently, alpa does not support while loop. The following function
# causes assertion errors. We want to support it.
# actual = compute_alpa()
# np.testing.assert_allclose(expected, actual)

To support them, we need to correctly handle HloOpcode::kWhile and HloOpcode::kConditional in the auto-sharding pass.

Todo

Learn the auto-sharding pass
- Read the reference materials
- Understand the test cases in tests/test_auto_sharding_basic.py and tests/test_auto_sharding_mlp.py
  - The call stack is a little complicated with mixed python code and c++ code. The core entry point is alpa/shard_parallel/auto_sharding.py and tensorflow-alpa/compiler/xla/service/spmd/auto_sharding.cc. You can run one test case and trace it from python to c++.
Support while loop
We assume loop length is known at compile time. As the first step, we can simply set the length as a fixed constant (e.g., 5). This can be improved by inferring from the code later. Once we know the loop length, we can unroll the while loop when we build the cost graph. For example, we can multiply the costs of all nodes/edges in the while body by the loop length.
- Implement the above idea in auto_sharding.cc and fix all other errors. We should be able to run compute_alpa().
- add unit test cases to tests/test_auto_sharding_control_flow.py
Support if

Reference

XLA Operation Semantics: https://www.tensorflow.org/xla/operation_semantics
Algorithm description: section 4 of https://arxiv.org/abs/2201.12023

The text was updated successfully, but these errors were encountered:

merrymercy · 2022-04-23T07:42:46Z

cc @HeydrichBeillschmidt

yf225 · 2022-06-14T00:58:17Z

I suspect if is a more commonly used control flow than while, and we can consider implementing if first.

@zhisbug suggested we can just map it to https://www.tensorflow.org/xla/operation_semantics#conditional for now and ignore the effect on training plan, and we can fix any imbalance issue later.

merrymercy · 2022-06-14T03:20:50Z

@HeydrichBeillschmidt Could you share a little bit about your progress?

mmorinag127 · 2022-08-30T04:06:41Z

Is there any news on this topic?
I really want to use it with the alpa:)

merrymercy · 2022-08-30T04:54:17Z

@mmorinag127 WIP branch alpa-projects/tensorflow-alpa#124

merrymercy changed the title ~~While loop support in shard_parallel~~ Control flow support in shard_parallel Apr 23, 2022

merrymercy changed the title ~~Control flow support in shard_parallel~~ [FEATURE] Control flow support in shard_parallel Apr 23, 2022

merrymercy mentioned this issue Jun 6, 2022

How alpa lower control flow jaxpr into XLA HLO? #490

Closed

merrymercy self-assigned this Aug 6, 2022

merrymercy mentioned this issue Sep 23, 2022

[BUG] FFT - unhandled instruction error #713

Closed

merrymercy added the enhancement New feature label Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Control flow support in shard_parallel #400

[FEATURE] Control flow support in shard_parallel #400

merrymercy commented Apr 23, 2022 •

edited

Loading

merrymercy commented Apr 23, 2022

yf225 commented Jun 14, 2022

merrymercy commented Jun 14, 2022

mmorinag127 commented Aug 30, 2022 •

edited

Loading

merrymercy commented Aug 30, 2022

[FEATURE] Control flow support in shard_parallel #400

[FEATURE] Control flow support in shard_parallel #400

Comments

merrymercy commented Apr 23, 2022 • edited Loading

Background

Todo

Reference

merrymercy commented Apr 23, 2022

yf225 commented Jun 14, 2022

merrymercy commented Jun 14, 2022

mmorinag127 commented Aug 30, 2022 • edited Loading

merrymercy commented Aug 30, 2022

merrymercy commented Apr 23, 2022 •

edited

Loading

mmorinag127 commented Aug 30, 2022 •

edited

Loading