double or float, which is faster?

C++Floating PointDouble

C++ Problem Overview


I am reading "accelerated C++". I found one sentence which states "sometimes double is faster in execution than float in C++". After reading sentence I got confused about float and double working. Please explain this point to me.

C++ Solutions


Solution 1 - C++

Depends on what the native hardware does.

  • If the hardware is (or is like) x86 with legacy x87 math, float and double are both extended (for free) to an internal 80-bit format, so both have the same performance (except for cache footprint / memory bandwidth)

  • If the hardware implements both natively, like most modern ISAs (including x86-64 where SSE2 is the default for scalar FP math), then usually most FPU operations are the same speed for both. Double division and sqrt can be slower than float, as well as of course being significantly slower than multiply or add. (Float being smaller can mean fewer cache misses. And with SIMD, twice as many elements per vector for loops that vectorize).

  • If the hardware implements only double, then float will be slower if conversion to/from the native double format isn't free as part of float-load and float-store instructions.

  • If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.

  • And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).

The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.

Also beware that x * 3.3 + y for float x,y will trigger promotion to double for both variables. This is not the hardware's fault, and you should avoid it by writing 3.3f to let your compiler make efficient asm that actually keeps numbers as floats if that's what you want.

Solution 2 - C++

You can find a complete answer in this article:

What Every Computer Scientist Should Know About Floating-Point Arithmetic

This is a quote from a previous Stack Overflow thread, about how float and double variables affect memory bandwidth:

> If a double requires > more storage than a float, then it > will take longer to read the data. > That's the naive answer. On a modern > IA32, it all depends on where the data > is coming from. If it's in L1 cache, > the load is negligible provided the > data comes from a single cache line. > If it spans more than one cache line > there's a small overhead. If it's from > L2, it takes a while longer, if it's > in RAM then it's longer still and > finally, if it's on disk it's a huge > time. So the choice of float or double > is less imporant than the way the data > is used. If you want to do a small > calculation on lots of sequential > data, a small data type is preferable. > Doing a lot of computation on a small > data set would allow you to use bigger > data types with any significant > effect. If you're accessing the data > very randomly, then the choice of data > size is unimportant - data is loaded > in pages / cache lines. So even if you > only want a byte from RAM, you could > get 32 bytes transfered (this is very > dependant on the architecture of the > system). On top of all of this, the > CPU/FPU could be super-scalar (aka > pipelined). So, even though a load may > take several cycles, the CPU/FPU could > be busy doing something else (a > multiply for instance) that hides the > load time to a degree

Solution 3 - C++

Short answer is: it depends.

CPU with x87 will crunch floats and doubles equally fast. Vectorized code will run faster with floats, because SSE can crunch 4 floats or 2 doubles in one pass.

Another thing to consider is memory speed. Depending on your algorithm, your CPU could be idling a lot while waiting for the data. Memory intensive code will benefit from using floats, but ALU limited code won't (unless it is vectorized).

Solution 4 - C++

I can think of two basic cases when doubles are faster than floats:

  1. Your hardware supports double operations but not float operations, so floats will be emulated by software and therefore be slower.

  2. You really need the precision of doubles. Now, if you use floats anyway you will have to use two floats to reach similar precision to double. The emulation of a true double with floats will be slower than using floats in the first place.

  3. You do not necessarily need doubles but your numeric algorithm converges faster due to the enhanced precision of doubles. Also, doubles might offer enough precision to use a faster but numerically less stable algorithm at all.

For completeness' sake I also give some reasons for the opposite case of floats being faster. You can see for yourself whichs reasons dominate in your case:

  1. Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats.

  2. They conserve memory-bandwidth because they occupy half the space per number.

  3. There are also platforms that can process more floats than doubles in parallel.

Solution 5 - C++

On Intel, the coprocessor (nowadays integrated) will handle both equally fast, but as some others have noted, doubles result in higher memory bandwidth which can cause bottlenecks. If you're using scalar SSE instructions (default for most compilers on 64-bit), the same applies. So generally, unless you're working on a large set of data, it doesn't matter much.

However, parallel SSE instructions will allow four floats to be handled in one instruction, but only two doubles, so here float can be significantly faster.

Solution 6 - C++

In experiments of adding 3.3 for 2000000000 times, results are:

Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double

So double is faster and default in C and C++. It's more portable and the default across all C and C++ library functions. Alos double has significantly higher precision than float.

Even Stroustrup recommends double over float:

"The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don't have that understanding, get advice, take the time to learn, or use double and hope for the best."

Perhaps the only case where you should use float instead of double is on 64bit hardware with a modern gcc. Because float is smaller; double is 8 bytes and float is 4 bytes.

Solution 7 - C++

float is usually faster. double offers greater precision. However performance may vary in some cases if special processor extensions such as 3dNow or SSE are used.

Solution 8 - C++

There is only one reason 32-bit floats can be slower than 64-bit doubles (or 80-bit 80x87). And that is alignment. Other than that, floats take less memory, generally meaning faster access, better cache performance. It also takes fewer cycles to process 32-bit instructions. And even when (co)-processor has no 32-bit instructions, it can perform them on 64-bit registers with the same speed. It probably possible to create a test case where doubles will be faster than floats, and v.v., but my measurements of real statistics algos didn't show noticeable difference.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questioncoming out of voidView Question on Stackoverflow
Solution 1 - C++fooView Answer on Stackoverflow
Solution 2 - C++Diego DiasView Answer on Stackoverflow
Solution 3 - C++watson1180View Answer on Stackoverflow
Solution 4 - C++Peter G.View Answer on Stackoverflow
Solution 5 - C++Frederik SlijkermanView Answer on Stackoverflow
Solution 6 - C++Akash AgrawalView Answer on Stackoverflow
Solution 7 - C++P47RICKView Answer on Stackoverflow
Solution 8 - C++Gene BushuyevView Answer on Stackoverflow