Is floating-point == ever OK?

C++ComparisonFloating Point

C++ Problem Overview


Just today I came across third-party software we're using and in their sample code there was something along these lines:

// Defined in somewhere.h
static const double BAR = 3.14;

// Code elsewhere.cpp
void foo(double d)
{
    if (d == BAR)
        ...
}

I'm aware of the problem with floating-points and their representation, but it made me wonder if there are cases where float == float would be fine? I'm not asking for when it could work, but when it makes sense and works.

Also, what about a call like foo(BAR)? Will this always compare equal as they both use the same static const BAR?

C++ Solutions


Solution 1 - C++

Yes, you are guaranteed that whole numbers, including 0.0, compare with ==

Of course you have to be a little careful with how you got the whole number in the first place, assignment is safe but the result of any calculation is suspect

ps there are a set of real numbers that do have a perfect reproduction as a float (think of 1/2, 1/4 1/8 etc) but you probably don't know in advance that you have one of these.

Just to clarify. It is guaranteed by IEEE 754 that float representions of integers (whole numbers) within range, are exact.

float a=1.0;
float b=1.0;
a==b  // true

But you have to be careful how you get the whole numbers

float a=1.0/3.0;
a*3.0 == 1.0  // not true !!

Solution 2 - C++

There are two ways to answer this question:

  1. Are there cases where float == float gives the correct result?
  2. Are there cases where float == float is acceptable coding?

The answer to (1) is: Yes, sometimes. But it's going to be fragile, which leads to the answer to (2): No. Don't do that. You're begging for bizarre bugs in the future.

As for a call of the form foo(BAR): In that particular case the comparison will return true, but when you are writing foo you don't know (and shouldn't depend on) how it is called. For example, calling foo(BAR) will be fine but foo(BAR * 2.0 / 2.0) (or even maybe foo(BAR * 1.0) depending on how much the compiler optimises things away) will break. You shouldn't be relying on the caller not performing any arithmetic!

Long story short, even though a == b will work in some cases you really shouldn't rely on it. Even if you can guarantee the calling semantics today maybe you won't be able to guarantee them next week so save yourself some pain and don't use ==.

To my mind, float == float is never* OK because it's pretty much unmaintainable.

*For small values of never.

Solution 3 - C++

The other answers explain quite well why using == for floating point numbers is dangerous. I just found one example that illustrates these dangers quite well, I believe.

On the x86 platform, you can get weird floating point results for some calculations, which are not due to rounding problems inherent to the calculations you perform. This simple C program will sometimes print "error":

#include <stdio.h>

void test(double x, double y)
{
  const double y2 = x + 1.0;
  if (y != y2)
    printf("error\n");
}

void main()
{
  const double x = .012;
  const double y = x + 1.0;

  test(x, y);
}

The program essentially just calculates

x = 0.012 + 1.0;
y = 0.012 + 1.0;

(only spread across two functions and with intermediate variables), but the comparison can still yield false!

The reason is that on the x86 platform, programs usually use the x87 FPU for floating point calculations. The x87 internally calculates with a higher precision than regular double, so double values need to be rounded when they are stored in memory. That means that a roundtrip x87 -> RAM -> x87 loses precision, and thus calculation results differ depending on whether intermediate results passed via RAM or whether they all stayed in FPU registers. This is of course a compiler decision, so the bug only manifests for certain compilers and optimization settings :-(.

For details see the GCC bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

Rather scary...

Additional note:

Bugs of this kind will generally be quite tricky to debug, because the different values become the same once they hit RAM.

So if for example you extend the above program to actually print out the bit patterns of y and y2 right after comparing them, you will get the exact same value. To print the value, it has to be loaded into RAM to be passed to some print function like printf, and that will make the difference disappear...

Solution 4 - C++

Perfect for integral values even in floating point formats

But the short answer is: "No, don't use ==."

Ironically, the floating point format works "perfectly", i.e., with exact precision, when operating on integral values within the range of the format. This means that you if you stick with double values, you get perfectly good integers with a little more than 50 bits, giving you about +- 4,500,000,000,000,000, or 4.5 quadrillion.

In fact, this is how JavaScript works internally, and it's why JavaScript can do things like + and - on really big numbers, but can only << and >> on 32-bit ones.

Strictly speaking, you can exactly compare sums and products of numbers with precise representations. Those would be all the integers, plus fractions composed of 1 / 2n terms. So, a loop incrementing by n + 0.25, n + 0.50, or n + 0.75 would be fine, but not any of the other 96 decimal fractions with 2 digits.

So the answer is: while exact equality can in theory make sense in narrow cases, it is best avoided.

Solution 5 - C++

I'll try to provide more-or-less real example of legitimate, meaningful and useful testing for float equality.

#include <stdio.h>
#include <math.h>

/* let's try to numerically solve a simple equation F(x)=0 */
double F(double x) {
    return 2*cos(x) - pow(1.2, x);
}

/* I'll use a well-known, simple&slow but extremely smart method to do this */
double bisection(double range_start, double range_end) {
    double a = range_start;
    double d = range_end - range_start;
    int counter = 0;
    while(a != a+d) // <-- WHOA!!
    {
        d /= 2.0;
        if(F(a)*F(a+d) > 0) /* test for same sign */
            a = a+d;
    
        ++counter;
    }
    printf("%d iterations done\n", counter);
    return a;
}

int main() {
    /* we must be sure that the root can be found in [0.0, 2.0] */
    printf("F(0.0)=%.17f, F(2.0)=%.17f\n", F(0.0), F(2.0));

    double x = bisection(0.0, 2.0);

    printf("the root is near %.17f, F(%.17f)=%.17f\n", x, x, F(x));
}

I'd rather not explain the bisection method used itself, but emphasize on the stopping condition. It has exactly the discussed form: (a == a+d) where both sides are floats: a is our current approximation of the equation's root, and d is our current precision. Given the precondition of the algorithm — that there must be a root between range_start and range_end — we guarantee on every iteration that the root stays between a and a+d while d is halved on every step, shrinking the bounds.

And then, after a number of iterations, d becomes so small that during addition with a it gets rounded to zero! That is, a+d turns out to be closer to a then to any other float; and so the FPU rounds it to the closest value: to the a itself. This can be easily illustrated by calculation on a hypothetical calculating machine; let it have 4-digit decimal mantissa and some large exponent range. Then what result the machine should give to 2.131e+02 + 7.000e-3? The exact answer is 213.107, but our machine can't represent such number; it has to round it. And 213.107 is much closer to 213.1 than to 213.2 — so the rounded result becomes 2.131e+02 — the little summand vanished, rounded up to zero. Exactly the same is guaranteed to happen at some iteration of our algorithm — and at that point we can't continue anymore. We have found the root to maximum possible precision.

The edifying conclusion is, apparently, that floats are tricky. They look so much like real numbers that every programmer is tempted to think of them as real numbers. But they are not. They have their own behavior, slightly reminiscent of real's, but not quite the same. You need to be very careful about them, especially when comparing for equality.


Update

Revisiting the answer after a while, I have also noticed an interesting fact: in the algorithm above one cannot actually use "some small number" in the stopping condition. For any choice of the number, there will be inputs which will deem your choice too large, causing loss of precision, and there will be inputs which will deem your choiсe too small, causing excess iterations or even entering an infinite loop. Detailed discussion follows.

You might already know that calculus has no notion of a "small number": for any real number, you can easily find infinitely many even smaller ones. The problem is that one of those "even smaller" ones might be what we actually seek for; it might be a root of our equation. Even worse, for different equations there may be distinct roots (e.g. 2.51e-8 and 1.38e-8), both of which will get approximated by the same number if our stopping condition would look like d < 1e-6. Whichever "small number" you will choose, many roots which would've been found correctly to the maximum precision with a == a+d stopping condition will get spoiled by the "epsilon" being too large.

It's true however that in floating point numbers the exponent has limited range, so you can actually find the smallest nonzero positive FP number (e.g. 1e-45 denorm for IEEE 754 single precision FP). But it's useless! while (d < 1e-45) {...} will loop forever, assuming single-precision (positive nonzero) d.

Setting aside those pathological edge cases, any choice of the "small number" in the d < eps stopping condition will be too small for many equations. In those equations where the root has the exponent high enough, the result of substraction of two mantissas differing only at the least significant digit will easily exceed our "epsilon". For example, with 6-digit mantissas 7.00023e+8 - 7.00022e+8 = 0.00001e+8 = 1.00000e+3 = 1000, meaning that the smallest possible difference between numbers with exponent +8 and 5-digit mantissa is... 1000! Which will never fit into, say, 1e-4. For these numbers with (relatively) high exponent we simply have not enough precision to ever see a difference of 1e-4.

My implementation above took this last problem into account too, and you can see that d is halved each step, instead of being recalculated as a difference of (possibly huge in exponent) a and b. So if we change the stopping condition to d < eps, the algorithm won't be stuck in infinite loop with huge roots (it very well could with (b-a) < eps), but will still perform needless iterations during shrinking d down below the precision of a.

This kind of reasoning may seem to be overly theoretical and needlessly deep, but its purpose is to illustrate again the trickiness of floats. One should be very careful about their finite precision when writing arithmetic operators around them.

Solution 6 - C++

The only case where I ever use == (or !=) for floats is in the following:

if (x != x)
{
    // Here x is guaranteed to be Not a Number
}

and I must admit I am guilty of using Not A Number as a magic floating point constant (using numeric_limits<double>::quiet_NaN() in C++).

There is no point in comparing floating point numbers for strict equality. Floating point numbers have been designed with predictable relative accuracy limits. You are responsible for knowing what precision to expect from them and your algorithms.

Solution 7 - C++

It's probably ok if you're never going to calculate the value before you compare it. If you are testing if a floating point number is exactly pi, or -1, or 1 and you know that's the limited values being passed in...

Solution 8 - C++

I also used it a few times when rewriting few algorithms to multithreaded versions. I used a test that compared results for single- and multithreaded version to be sure, that both of them give exactly the same result.

Solution 9 - C++

Let's say you have a function that scales an array of floats by a constant factor:

void scale(float factor, float *vector, int extent) {
   int i;
   for (i = 0; i < extent; ++i) {
      vector[i] *= factor;
   }
}

I'll assume that your floating point implementation can represent 1.0 and 0.0 exactly, and that 0.0 is represented by all 0 bits.

If factor is exactly 1.0 then this function is a no-op, and you can return without doing any work. If factor is exactly 0.0 then this can be implemented with a call to memset, which will likely be faster than performing the floating point multiplications individually.

The [reference implementation of BLAS functions at netlib][1] uses such techniques extensively.

[1]: http://www.netlib.org/blas/ "netlib's top-level BLAS directory"

Solution 10 - C++

In my opinion, comparing for equality (or some equivalence) is a requirement in most situations: standard C++ containers or algorithms with an implied equality comparison functor, like std::unordered_set for example, requires that this comparator be an equivalence relation (see C++ named requirements: UnorderedAssociativeContainer).

Unfortunately, comparing with an epsilon as in abs(a - b) < epsilon does not yield an equivalence relation since it loses transitivity. This is most probably undefined behavior, specifically two 'almost equal' floating point numbers could yield different hashes; this can put the unordered_set in an invalid state. Personally, I would use == for floating points most of the time, unless any kind of FPU computation would be involved on any operands. With containers and container algorithms, where only read/writes are involved, == (or any equivalence relation) is the safest.

abs(a - b) < epsilon is more or less a convergence criteria similar to a limit. I find this relation useful if I need to verify that a mathematical identity holds between two computations (for example PV = nRT, or distance = time * speed).

In short, use == if and only if no floating point computation occur; never use abs(a-b) < e as an equality predicate;

Solution 11 - C++

Yes. 1/x will be valid unless x==0. You don't need an imprecise test here. 1/0.00000001 is perfectly fine. I can't think of any other case - you can't even check tan(x) for x==PI/2

Solution 12 - C++

The other posts show where it is appropriate. I think using bit-exact compares to avoid needless calculation is also okay..

Example:

float someFunction (float argument)
{
  // I really want bit-exact comparison here!
  if (argument != lastargument)
  {
    lastargument = argument;
    cachedValue = very_expensive_calculation (argument);
  }

  return cachedValue;
}

Solution 13 - C++

I would say that comparing floats for equality would be OK if a false-negative answer is acceptable.

Assume for example, that you have a program that prints out floating points values to the screen and that if the floating point value happens to be exactly equal to M_PI, then you would like it to print out "pi" instead. If the value happens to deviate a tiny bit from the exact double representation of M_PI, it will print out a double value instead, which is equally valid, but a little less readable to the user.

Solution 14 - C++

I have a drawing program that fundamentally uses a floating point for its coordinate system since the user is allowed to work at any granularity/zoom. The thing they are drawing contains lines that can be bent at points created by them. When they drag one point on top of another they're merged.

In order to do "proper" floating point comparison I'd have to come up with some range within which to consider the points the same. Since the user can zoom in to infinity and work within that range and since I couldn't get anyone to commit to some sort of range, we just use '==' to see if the points are the same. Occasionally there'll be an issue where points that are supposed to be exactly the same are off by .000000000001 or something (especially around 0,0) but usually it works just fine. It's supposed to be hard to merge points without the snap turned on anyway...or at least that's how the original version worked.

It throws of the testing group occasionally but that's their problem :p

So anyway, there's an example of a possibly reasonable time to use '=='. The thing to note is that the decision is less about technical accuracy than about client wishes (or lack thereof) and convenience. It's not something that needs to be all that accurate anyway. So what if two points won't merge when you expect them to? It's not the end of the world and won't effect 'calculations'.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmurrekattView Question on Stackoverflow
Solution 1 - C++Martin BeckettView Answer on Stackoverflow
Solution 2 - C++Cameron SkinnerView Answer on Stackoverflow
Solution 3 - C++sleskeView Answer on Stackoverflow
Solution 4 - C++DigitalRossView Answer on Stackoverflow
Solution 5 - C++ulidtkoView Answer on Stackoverflow
Solution 6 - C++Alexandre C.View Answer on Stackoverflow
Solution 7 - C++Abdullah JibalyView Answer on Stackoverflow
Solution 8 - C++Chris HasińskiView Answer on Stackoverflow
Solution 9 - C++Philip StarhillView Answer on Stackoverflow
Solution 10 - C++Julien Villemure-FréchetteView Answer on Stackoverflow
Solution 11 - C++MSaltersView Answer on Stackoverflow
Solution 12 - C++Nils PipenbrinckView Answer on Stackoverflow
Solution 13 - C++JoelView Answer on Stackoverflow
Solution 14 - C++Edward StrangeView Answer on Stackoverflow