You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation extracts input features for key, query and value projections and computes intermediate steps using the Kobayashi refactoring to obtain the transformed vectors used in the final ALTi computation.
The computation of attention layer outputs is carried on up to the resultant (i.e. the actual output of the attention layer) in order to check that the result matches the original output of the attention layer forward pass. This is only done for sanity checking purposes but it's not especially heavy from a computational perspective, so it can be preserved (e.g. raise an error if the outputs doesn't match to signal the model is maybe not supported)
Focusing on GPT-2 as an example model, the per-head attention weights and outputs (i.e. matmul of weights and value vectors) are returned here so they can be extracted with a hook and used to compute the transformed vectors needed for ALTI.
Pre- and post-layer norm models are handled differently because the transformed vectors are the final outputs of the attention block, regardless of the position of the layer norm (it needs to be included in any case). In the Kobayashi decomposition of the attention layer the bias component needs to be separate both for the layer norm and the output projections, so we need to make sure whether this is possible out of the box, or it needs to be computed in an ad-hoc hook.
If we are interested in the output vectors before the bias is added, we can extract the bias vector alongside the output of the attention module and subtract the former from the latter.
For aggregating ALTI+ scores in order to obtain overall importance we will use the extended rollout implementation that is currently being developed in Value Zeroing attribution method #173.
Description
The ALTI+ method is an extension of ALTI for encoder-decoder (and by extension, decoder-only) models.
Authors: @gegallego @javiferran
Implementation notes:
The current implementation extracts input features for key, query and value projections and computes intermediate steps using the Kobayashi refactoring to obtain the transformed vectors used in the final ALTi computation.
The computation of attention layer outputs is carried on up to the resultant (i.e. the actual output of the attention layer) in order to check that the result matches the original output of the attention layer forward pass. This is only done for sanity checking purposes but it's not especially heavy from a computational perspective, so it can be preserved (e.g. raise an error if the outputs doesn't match to signal the model is maybe not supported)
Focusing on GPT-2 as an example model, the per-head attention weights and outputs (i.e. matmul of weights and value vectors) are returned here so they can be extracted with a hook and used to compute the transformed vectors needed for ALTI.
Pre- and post-layer norm models are handled differently because the transformed vectors are the final outputs of the attention block, regardless of the position of the layer norm (it needs to be included in any case). In the Kobayashi decomposition of the attention layer the bias component needs to be separate both for the layer norm and the output projections, so we need to make sure whether this is possible out of the box, or it needs to be computed in an ad-hoc hook.
If we are interested in the output vectors before the bias is added, we can extract the bias vector alongside the output of the attention module and subtract the former from the latter.
For aggregating ALTI+ scores in order to obtain overall importance we will use the extended rollout implementation that is currently being developed in Value Zeroing attribution method #173.
Refererence implementation mt-upc/transformer-contributions-nmt
The text was updated successfully, but these errors were encountered: