-
Notifications
You must be signed in to change notification settings - Fork 20
Floating point issues
Ilya Yaroshenko edited this page Sep 28, 2016
·
12 revisions
Floating point operations can be cruel! Here are some mistakes that can introduce numerical errors.
float< 0,double> 0,real< 0- CT and RT differences
- 32-bit vs. 64-bit have different behavior
- Direct vs. intermediate comparison
- "Same" function, different results
- std.math vs C
- Linux vs. Windows
- sin with different precision (+/- 0)
More issues are explained below.
Let's consider these two equivalent definitions to represent linear functions:
y1 = slope * (x - _y) + _a
y2 = slope * x + intercept
where intercept = slope * (- _y) + _a.
Nota bene: If y2 is written fully: y2 = slope * x + slope * (- _y) + _a, we see that the distributive law is used to transform from y1. In other words the multiplication occurs before the addition in y1.
Now let's see why y1 is the better representation:
alias S = double;
S slope = 2.87415e+15;
S _a = -0.139631;
S _y = -1.5;
S intercept = slope * (- _y) + _a; // 4.31123e+15
S x = -1.5;
S y1 = _a + slope * (x - _y); // -0.139631
S y2 = slope * x + intercept; // 0btw with real there's no difference ;-)
real x = -1.0; // -0x1p+0
enum y = -1.0; // -0x8p-3-> Always use %a (exact hexadecimal printing) to verify.