Skip to content

Stochastic rounding on scalify scale exponent operations #63

Open
@balancap

Description

@balancap

Following #60 bug, we are moving to power of 2 scaling factors. For most operations, this is a very simple change (power of 2 being stable under mul, max, min, ...).

Nevertheless, scalify rules for other operations are not directly compatible with power of 2. Typically, add / sub / dot_general can lead to non power of 2 scaling factors. The simplest way to keep a power of 2 scaling is to use rounding down.

In the case of add, this would lead in some situations to unchanged scaling, potentially creeping to overflowing if add ops are serialized (e.g. gradient accumulation). To prevent that, we should experiment with stochastic rounding (with the typical example of gradient accumulation in mind).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions