You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following #60 bug, we are moving to power of 2 scaling factors. For most operations, this is a very simple change (power of 2 being stable under mul, max, min, ...).
Nevertheless, scalify rules for other operations are not directly compatible with power of 2. Typically, add / sub / dot_general can lead to non power of 2 scaling factors. The simplest way to keep a power of 2 scaling is to use rounding down.
In the case of add, this would lead in some situations to unchanged scaling, potentially creeping to overflowing if add ops are serialized (e.g. gradient accumulation). To prevent that, we should experiment with stochastic rounding (with the typical example of gradient accumulation in mind).
The text was updated successfully, but these errors were encountered:
balancap
changed the title
Stochastic rounding on autoscale scale exponent operations
Stochastic rounding on scalify scale exponent operations
Jun 17, 2024
Following #60 bug, we are moving to power of 2 scaling factors. For most operations, this is a very simple change (power of 2 being stable under
mul
,max
,min
, ...).Nevertheless,
scalify
rules for other operations are not directly compatible with power of 2. Typically,add
/sub
/dot_general
can lead to non power of 2 scaling factors. The simplest way to keep a power of 2 scaling is to use rounding down.In the case of
add
, this would lead in some situations to unchanged scaling, potentially creeping to overflowing ifadd
ops are serialized (e.g. gradient accumulation). To prevent that, we should experiment with stochastic rounding (with the typical example of gradient accumulation in mind).The text was updated successfully, but these errors were encountered: