Stochastic rounding on `scalify` scale exponent operations

Following #60 bug, we are moving to power of 2 scaling factors. For most operations, this is a very simple change (power of 2 being stable under `mul`, `max`, `min`, ...).

Nevertheless, `scalify` rules for other operations are not directly compatible with power of 2. Typically, `add` / `sub` / `dot_general` can lead to non power of 2 scaling factors. The simplest way to keep a power of 2 scaling is to use rounding down.

In the case of `add`, this would lead in some situations to unchanged scaling, potentially creeping to overflowing if `add` ops are serialized (e.g. gradient accumulation). To prevent that, we should experiment with stochastic rounding (with the typical example of gradient accumulation in mind).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stochastic rounding on `scalify` scale exponent operations #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stochastic rounding on scalify scale exponent operations #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stochastic rounding on `scalify` scale exponent operations #63