Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic rounding on scalify scale exponent operations #63

Open
balancap opened this issue Jan 2, 2024 · 0 comments
Open

Stochastic rounding on scalify scale exponent operations #63

balancap opened this issue Jan 2, 2024 · 0 comments
Labels
experiments Experiments ops Ops coverage

Comments

@balancap
Copy link
Contributor

balancap commented Jan 2, 2024

Following #60 bug, we are moving to power of 2 scaling factors. For most operations, this is a very simple change (power of 2 being stable under mul, max, min, ...).

Nevertheless, scalify rules for other operations are not directly compatible with power of 2. Typically, add / sub / dot_general can lead to non power of 2 scaling factors. The simplest way to keep a power of 2 scaling is to use rounding down.

In the case of add, this would lead in some situations to unchanged scaling, potentially creeping to overflowing if add ops are serialized (e.g. gradient accumulation). To prevent that, we should experiment with stochastic rounding (with the typical example of gradient accumulation in mind).

@balancap balancap added experiments Experiments ops Ops coverage labels Jan 2, 2024
@balancap balancap changed the title Stochastic rounding on autoscale scale exponent operations Stochastic rounding on scalify scale exponent operations Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiments Experiments ops Ops coverage
Projects
None yet
Development

No branches or pull requests

1 participant