Possibility of a modifiable **side_inputs? #121

hrbigelow · 2025-06-04T17:58:32Z

hrbigelow
Jun 4, 2025

Conceptually, **side_inputs are like a global read-only dictionary of inputs, available to every layer's __call__ function. The global availability is great, but is there any way to combine this idea with a layer that could also modify these? I know this is not possible with the current design, but it would be really useful.

Something like:

class Layer:
    def __call__(self, argument: Any, /, **side_inputs) -> tuple[Any, dict[str, Any]]:
        """Returns
            - new_argument -> to be passed to the next Layer as `argument`
            - new_side_inputs -> to be passed to all subsequent layers as `**side_inputs`
        """

I guess there is some jax'y reason why this sort of design wouldn't jit-compile?

The use case for this is that I want to create a wrapper that can efficiently record and concatenate select activations from an underlying model. That is, this wrapper:

runs the original base model
collects activations from each select layer (assumes they have all same dimensions except a concat dimension)
concatenates them across the chosen dimension
returns the concatenated result.

One approach could be to insert a "SaveActivation" layer just after each Transformer block which saves the output in a StateVariable, and then have a function which extracts all of these and concatenates them. But, to do this requires memory to store the StateVariables, and the same amount of memory to perform the concatenation. I was hoping to find a method that can:

allocate a single buffer to store the full result of concatenation
pass it alongside the original input
at each custom layer, perform a jax.lax.dynamic_update_slice_in_dim using an offset.

Unfortunately, this seems to mean I would have to wrap every layer, converting it to one that would accept the tuple (argument, buffer, offsets) and either ignore buffer and offsets (if it's a non-recording layer).

hrbigelow · 2025-06-04T18:35:45Z

hrbigelow
Jun 4, 2025
Author

Just for what it's worth, I think this would work too, would it be the recommended approach?

from functools import partial
import jax
import jax.numpy as jnp
from penzai import pz

@pz.pytree_dataclass
class ThreaderLayer(pz.nn.Layer):
    layer: pz.nn.Layer # the subject layer
    side_layer: pz.nn.Layer # the layer updating the threaded values

    def __call__(self, argument, **side_inputs):
        arg, side_arg = argument
        out = self.layer(arg, **side_inputs)
        side_out = self.side_layer(argument, **side_inputs)
        return out, side_out 

@pz.pytree_dataclass
class Identity(pz.nn.Layer):
    def __call__(self, argument, **side_Inputs):
        return argument

@pz.pytree_dataclass
class AddPair(pz.nn.Layer):
    def __call__(self, argument, /, **side_inputs):
        arg, side_arg = argument
        return arg + side_arg

@pz.variable_jit
def call_model(model, x):
    return model(x)

if __name__ == "__main__":
    model = pz.nn.Sequential([Identity() for _ in range(5)])
    thread_model = (
        pz.select(model)
        .at_instances_of(Identity)
        .apply(lambda i: ThreaderLayer(i, AddPair()))
    )

    x = jnp.ones((2,3))
    z = jnp.zeros_like(x)
    x_out, z_out = call_model(thread_model, (x, z))
    print(x_out, z_out)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possibility of a modifiable **side_inputs? #121

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Possibility of a modifiable **side_inputs? #121

Uh oh!

hrbigelow Jun 4, 2025

Replies: 1 comment

Uh oh!

hrbigelow Jun 4, 2025 Author

hrbigelow
Jun 4, 2025

hrbigelow
Jun 4, 2025
Author