Adding smallest possible float to a float

C++C++11Floating Point

C++ Problem Overview


I want to add the smallest possible value of a float to a float. So, for example, I tried doing this to get 1.0 + the smallest possible float:

float result = 1.0f + std::numeric_limits<float>::min();

But after doing that, I get the following results:

(result > 1.0f) == false
(result == 1.0f) == true

I'm using Visual Studio 2015. Why does this happen? What can I do to get around it?

C++ Solutions


Solution 1 - C++

If you want the next representable value after 1, there is a function for that called std::nextafter, from the <cmath> header.

float result = std::nextafter(1.0f, 2.0f);

It returns the next representable value starting from the first argument in the direction of the second argument. So if you wanted to find the next value below 1, you could do this:

float result = std::nextafter(1.0f, 0.0f);

Adding the smallest positive representable value to 1 doesn't work because the difference between 1 and the next representable value is greater than the difference between 0 and the next representable value.

Solution 2 - C++

The "problem" you're observing is because of the very nature of floating point arithmetic.

In FP the precision depends on the scale; around the value 1.0 the precision is not enough to be able to differentiate between 1.0 and 1.0+min_representable where min_representable is the smallest possible value greater than zero (even if we only consider the smallest normalized number, std::numeric_limits<float>::min()... the smallest denormal is another few orders of magnitude smaller).

For example with double-precision 64-bit IEEE754 floating point numbers, around the scale of x=10000000000000000 (1016) it's impossible to distinguish between x and x+1.


The fact that the resolution changes with scale is the very reason for the name "floating point", because the decimal point "floats". A fixed point representation instead will have a fixed resolution (for example with 16 binary digits below units you have a precision of 1/65536 ~ 0.00001).

For example in the IEEE754 32-bit floating point format there is one bit for the sign, 8 bits for the exponent and 31 bits for the mantissa:

floating point


The smallest value eps such that 1.0f + eps != 1.0f is available as a pre-defined constant as FLT_EPSILON, or std::numeric_limits<float>::epsilon. See also machine epsilon on Wikipedia, which discusses how epsilon relates to rounding errors.

I.e. epsilon is the smallest value that does what you were expecting here, making a difference when added to 1.0.

The more general version of this (for numbers other than 1.0) is called 1 unit in the last place (of the mantissa). See Wikipedia's ULP article.

Solution 3 - C++

min is the smallest non-zero value that a (normalized-form) float can assume, i.e. something around 2-126 (-126 is the minimum allowed exponent for a float); now, if you sum it to 1 you'll still get 1, since a float has just 23 bits of mantissa, so such a small change cannot be represented in such a "big" number (you would need a 126 bit mantissa to see a change summing 2-126 to 1).

The minimum possible change to 1, instead, is epsilon (the so-called machine epsilon), which is in fact 2-23 - as it affects the last bit of the mantissa.

Solution 4 - C++

To increase/decrement a floating point value by the smallest possible amount, use nextafter towards +/- infinity().

If you just use next_after(x,std::numeric_limits::max()), the result with be wrong in case x is infinity.

#include <iostream>
#include <limits>
#include <cmath>

template<typename T>
T next_above(const T& v){
    return std::nextafter(v,std::numeric_limits<T>::infinity()) ;
}
template<typename T>
T next_below(const T& v){
    return std::nextafter(v,-std::numeric_limits<T>::infinity()) ;
}

int main(){
  std::cout << "eps   : "<<std::numeric_limits<double>::epsilon()<< std::endl; // gives eps

  std::cout << "after : "<<next_above(1.0) - 1.0<< std::endl; // gives eps (the definition of eps)
  std::cout << "below : "<<next_below(1.0) - 1.0<< std::endl; // gives -eps/2

  // Note: this is what next_above does:
  std::cout << std::nextafter(std::numeric_limits<double>::infinity(),
     std::numeric_limits<double>::infinity()) << std::endl; // gives inf

  // while this is probably not what you need:
  std::cout << std::nextafter(std::numeric_limits<double>::infinity(),
     std::numeric_limits<double>::max()) << std::endl; // gives 1.79769e+308

}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiongsemacView Question on Stackoverflow
Solution 1 - C++Benjamin LindleyView Answer on Stackoverflow
Solution 2 - C++6502View Answer on Stackoverflow
Solution 3 - C++Matteo ItaliaView Answer on Stackoverflow
Solution 4 - C++Johan LundbergView Answer on Stackoverflow