Skip to content

Commit

Permalink
Fix #12 - remove printable to improve footprint. (#13)
Browse files Browse the repository at this point in the history
- Fix #12, breaking change. Thanks to Andyjbm for the measurements.
- remove Printable interface as it makes the effective footprint larger!
- remove getDecimals() and setDecimals().
- patch examples and unit test for the above.
- add example **float16_sizeof_array.ino**.
- add **isPosInf()** and **isNegInf()**
- add link to **float16ext** class with a larger range than float16.
- update readme.md.
- update unit-tests.
  • Loading branch information
RobTillaart committed Apr 18, 2024
1 parent f8319b1 commit 78bf064
Show file tree
Hide file tree
Showing 17 changed files with 328 additions and 98 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,19 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).


## [0.3.0] - 2024-04-17
- Fix #12, breaking change. Thanks to Andyjbm for the measurements.
- remove Printable interface as it makes the effective footprint larger!
- remove getDecimals() and setDecimals().
- patch examples and unit test for the above.
- add example **float16_sizeof_array.ino**.
- add **isPosInf()** and **isNegInf()**
- add link to **float16ext** class with a larger range than float16.
- update readme.md.
- update unit-tests.

----

## [0.2.0] - 2024-03-05
- **warning: breaking changes!**
- Fix #10, mantissa overflow
Expand Down
151 changes: 103 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,68 @@ Arduino library to implement float16 data type.
## Description

This **experimental** library defines the float16 (2 byte) data type, including conversion
function to and from float32 type. It is definitely **work in progress**.

The library implements the **Printable** interface so one can directly print the
float16 values in any stream e.g. Serial.
function to and from float32 type.

The primary usage of the float16 data type is to efficiently store and transport
a floating point number. As it uses only 2 bytes where float and double have typical
4 and 8 bytes, gains can be made at the price of range and precision.

Note that float16 only has ~3 significant digits.

To print a float16, one need to convert it with toFloat(), toDouble() or toString(decimals).
The latter allows concatenation and further conversion to an char array.

In pre 0.3.0 version the Printable interface was implemented, but it has been removed
as it caused excessive memory usage when declaring arrays of float16.


#### ARM alternative half-precision

-https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision

_ARM processors support (via a floating point control register bit)
an "alternative half-precision" format, which does away with the
special case for an exponent value of 31 (111112).[10] It is almost
identical to the IEEE format, but there is no encoding for infinity or NaNs;
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._

Implemented in https://github.com/RobTillaart/float16ext class.


#### Difference with float16 and float16ext

The float16ext library has an extended range as it supports values from +- 65504
to +- 131008.

The float16ext does not support INF, -INF and NAN. These values are mapped upon
the largest positive, the largest negative and the largest positive number.

The -0 and 0 values will both exist.


Although they share a lot of code float16 and float16ext should not be mixed.
In the future these libraries might merge / derive one from the other.


#### Breaking change 0.3.0

Version 0.3.0 has a breaking change. The **Printable** interface is removed as
it causes larger than expected arrays of float 16 (See #16). On ESP8266 every
float16 object was 8 bytes and on AVR it was 5 bytes instead of the expected 2 bytes.

To support printing the class added two new conversion functions:
```cpp
f16.toFloat();
f16.toString(decimals);

Serial.println(f16.toFloat(), 4);
Serial.println(f16.toString(4));
```
This keeps printing relative easy.

The footprint of the library is now smaller and one can now create compact array's
of float16 elements using only 2 bytes per element.


#### Breaking change 0.2.0

Expand All @@ -34,26 +87,28 @@ For some specific values the mantissa overflowed when the float 16 was
assigned a value to. This overflow was not detected / corrected.

During the analysis of this bug it became clear that the sub-normal numbers
were also implemented correctly. This is fixed too in 0.2.0.
were also not implemented correctly. This is fixed too in 0.2.0.

There is still an issue 0 versus -0
There is still an issue with 0 versus -0 (sign gets lost in conversion).

**This makes all pre-0.2.0 version obsolete.**


## Specifications


| attribute | value | notes |
|:----------|:-------------|:--------|
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1,5,10)
| sign | 1 bit |
| exponent | 5 bit |
| mantissa | 10 bit | ~ 3 digits
| minimum | 5.96046 E−8 | smallest positive number.
| | 1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
| maximum | 65504 |
| | |
| Attribute | Value | Notes |
|:------------|:----------------|:--------|
| size | 2 bytes | layout s eeeee mmmmmmmmmm (1, 5, 10)
| sign | 1 bit |
| exponent | 5 bit |
| mantissa | 10 bit | 3 - 4 digits
| minimum | ±5.96046 E−8 | smallest number.
| | ±1.0009765625 | 1 + 2^−10 = smallest number larger than 1.
| maximum | ±65504 |
| | |

± = ALT 0177


#### Example values
Expand Down Expand Up @@ -87,6 +142,10 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format
#### Related

- https://wokwi.com/projects/376313228108456961 (demo of its usage)
- https://github.com/RobTillaart/float16
- https://github.com/RobTillaart/float16ext
- https://github.com/RobTillaart/fraction
- https://en.wikipedia.org/wiki/Half-precision_floating-point_format


## Interface
Expand All @@ -97,28 +156,35 @@ Source: https://en.wikipedia.org/wiki/Half-precision_floating-point_format

#### Constructors

- **float16(void)** defaults to zero.
- **float16(void)** defaults value to zero.
- **float16(double f)** constructor.
- **float16(const float16 &f)** copy constructor.


#### Conversion

- **double toDouble(void)** convert to double (or float).
- **double toDouble(void)** convert value to double or float (if the same e.g. UNO).
- **float toFloat(void)** convert value to float.
- **String toString(unsigned int decimals = 2)** convert value to a String with decimals.
Please note that the accuracy is only 3-4 digits for the whole number so use decimals
with care.


#### Export and store

To serialize the internal format e.g. to disk, two helper functions are available.

- **uint16_t getBinary()** get the 2 byte binary representation.
- **void setBinary(uint16_t u)** set the 2 bytes binary representation.
- **size_t printTo(Print& p) const** Printable interface.
- **void setDecimals(uint8_t d)** idem, used for printTo.
- **uint8_t getDecimals()** idem.

Note the setDecimals takes one byte per object which is not efficient for arrays of float16.
See array example for efficient storage using set/getBinary() functions.


#### Compare

Standard compare functions. Since 0.1.5 these are quite optimized,
so it is fast to compare e.g. 2 measurements.
The library implement the standard compare functions.
These are optimized, so it is fast to compare 2 float16 values.

Note: comparison with a float or double always include a conversion.
You can improve performance by converting e.g. a threshold only once before comparison.

- **bool operator == (const float16& f)**
- **bool operator != (const float16& f)**
Expand All @@ -143,20 +209,16 @@ Not planned to optimize these.
- **float16& operator \*= (const float16& f)**
- **float16& operator /= (const float16& f)**

negation operator.
Negation operator.
- **float16 operator - ()** fast negation.

Math helpers.
- **int sign()** returns 1 == positive, 0 == zero, -1 == negative.
- **bool isZero()** returns true if zero. slightly faster than **sign()**.
- **bool isInf()** returns true if value is (-)infinite.


#### Experimental 0.1.8

- **bool isNaN()** returns true if value is not a number.


## Notes
- **bool isNaN()** returns true if value is not a number.
- **bool isInf()** returns true if value is ± infinite.
- **bool isPosInf()** returns true if value is + infinite.
- **bool isNegInf()** returns true if value is - infinite.


## Future
Expand All @@ -167,26 +229,19 @@ negation operator.

#### Should

- unit tests of the above.
- how to handle 0 == -0 (0x0000 == 0x8000)
- investigate ARM alternative half-precision
_ARM processors support (via a floating point control register bit)
an "alternative half-precision" format, which does away with the
special case for an exponent value of 31 (111112).[10] It is almost
identical to the IEEE format, but there is no encoding for infinity or NaNs;
instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008._


#### Could

- copy constructor?
- update documentation.
- unit tests.
- error handling.
- divide by zero errors.
- look for optimizations.
- rewrite **f16tof32()** with bit magic.
- add storage example - with SD card, FRAM or EEPROM
- add communication example - serial or Ethernet?
- add examples
- persistent storage e.g. SD card, FRAM or EEPROM.
- communication e.g. Serial or Ethernet (XML, JSON)?
- sorting an array of float16?

#### Wont

Expand Down
44 changes: 44 additions & 0 deletions examples/float16_sizeof_array/float16_sizeof_array.ino
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
//
// FILE: float16_sizeof_array.ino
// AUTHOR: Rob Tillaart
// PURPOSE: test float16 size
// URL: https://github.com/RobTillaart/float16
// See #12

#include "Arduino.h"
#include "float16.h"


float16 test16[100];
float test32[100];

void setup()
{
Serial.begin(115200);

Serial.println("FLOAT16");
Serial.println(sizeof(test16) / sizeof(test16[0]));
Serial.println(sizeof(test16));
Serial.println(sizeof(test16[0]));
Serial.println();

Serial.println("FLOAT32");
Serial.println(sizeof(test32) / sizeof(test32[0]));
Serial.println(sizeof(test32));
Serial.println(sizeof(test32[0]));
Serial.println();

// set some values to make sure the compiler doesn't optimise out the arrays.
test16[5] = 32;
test32[4] = 32;

// Serial.println(test16[5].toDouble(), 3);
// Serial.println(test16[5].toFloat(), 3);
// Serial.println(test16[5].toString());
// Serial.println(test16[5].toString(1));
// Serial.println(test16[5].toString(3));
};

void loop()
{
};
2 changes: 0 additions & 2 deletions examples/float16_test_all/float16_test_all.ino
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,6 @@ void setup()
Serial.println(FLOAT16_LIB_VERSION);
Serial.println("\nStart ");

f16.setDecimals(6);

test_1();
test_2();
test_3();
Expand Down
4 changes: 1 addition & 3 deletions examples/float16_test_all_2/float16_test_all_2.ino
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@ void setup()
Serial.print("FLOAT16_LIB_VERSION: ");
Serial.println(FLOAT16_LIB_VERSION);

f16.setDecimals(6);

test_all();

Serial.println("\ndone");
Expand Down Expand Up @@ -96,7 +94,7 @@ void test_0()
f16 = x;
Serial.print(x);
Serial.print("\t");
Serial.print(f16);
Serial.print(f16.toString(2));
Serial.print("\t");
Serial.print(f16.toDouble(), 2);
Serial.print("\t");
Expand Down
2 changes: 1 addition & 1 deletion examples/float16_test_array/float16_test_array.ino
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
// URL: https://github.com/RobTillaart/float16


// show different storage needs
// show storage needs (fixed in 0.3.0)

#include "float16.h"

Expand Down
32 changes: 32 additions & 0 deletions examples/float16_test_array/output_0.3.0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

float16_test_array.ino
FLOAT16_LIB_VERSION: 0.3.0

0 5.07
1 -0.51
2 -2.27
3 3.58
4 6.30
5 -0.28
6 2.44
7 5.78
8 6.23
9 4.09
0.30

0 5.07
1 -0.51
2 -2.27
3 3.58
4 6.30
5 -0.28
6 2.44
7 5.78
8 6.23
9 4.09
0.30

SIZE: 20
SIZE: 20

done
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//
// FILE: float16_test_performance.ino
// AUTHOR: Rob Tillaart
// PURPOSE: test float16
// PURPOSE: test float16 performance
// URL: https://github.com/RobTillaart/float16


Expand Down Expand Up @@ -162,7 +162,7 @@ void setup()
delay(10);
Serial.println();

Serial.println(f16);
Serial.println(f16.toString(4));

Serial.println("MATH III - negation");
start = micros();
Expand All @@ -173,7 +173,7 @@ void setup()
delay(10);
Serial.println();

Serial.println(f18);
Serial.println(f18.toString(4));

Serial.println("\ndone");
}
Expand Down
Loading

0 comments on commit 78bf064

Please sign in to comment.