Algorithm for finding the smallest power of two that's greater or equal to a given value

C++AlgorithmAssembly

C++ Problem Overview


I need to find the smallest power of two that's greater or equal to a given value. So far, I have this:

int value = 3221; // 3221 is just an example, could be any number
int result = 1;

while (result < value) result <<= 1;

It works fine, but feels kind of naive. Is there a better algorithm for that problem?

EDIT. There were some nice Assembler suggestions, so I'm adding those tags to the question.

C++ Solutions


Solution 1 - C++

Here's my favorite. Other than the initial check for whether it's invalid (<0, which you could skip if you knew you'd only have >=0 numbers passed in), it has no loops or conditionals, and thus will outperform most other methods. This is similar to erickson's answer, but I think that my decrementing x at the beginning and adding 1 at the end is a little less awkward than his answer (and also avoids the conditional at the end).

/// Round up to next higher power of 2 (return x if it's already a power
/// of 2).
inline int
pow2roundup (int x)
{
    if (x < 0)
        return 0;
    --x;
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    return x+1;
}

Solution 2 - C++

ceil(log2(value))

ilog2() can be calculated in 3 asm instructions e.g., http://www.asterisk.org/doxygen/1.4/log2comp_8h-source.html

Solution 3 - C++

On Intel hardware the BSR instruction is close to what you want - it finds the most-significant-set-bit. If you need to be more precise you can then wonder if the remaining bits are precisely zero or not. I tend to assume that other CPU's will have something like BSR - this is a question you want answered to normalize a number. If your number is more than 32 bits then you would do a scan from your most-significant-DWORD to find the first DWORD with ANY bits set. Edsger Dijkstra would likely remark that the above "algorithms" assume that your computer uses Binary Digits, while from his kind of lofty "algorithmic" perspective you should think about Turing machines or something - obviously I am of the more pragmatic style.

Solution 4 - C++

In the spirit of Quake II's 0x5f3759df and the Bit Twiddling Hacks' IEEE version - this solution reaches into a double to extract the exponent as a means to calculate floor(lg2(n)). It's a bit faster than the accepted solution and much faster than the Bit Twiddling IEEE version since it avoids floating point math. As coded, it assumes a double is a real*8 IEEE float on a little endian machine.

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d = n-1; 
    return 1 << ((((int*)&d)[1]>>20)-1022); 
} 

Edit: Add optimized x86 assembly version with the help of a co-worker. A 4% speed gain but still about 50% slower than a bsr version (6 sec vs 4 on my laptop for n=1..2^31-2).

int nextPow2(int n) 
{ 
    if ( n <= 1 ) return n;
    double d;
    n--;
    __asm {
      fild    n 
      mov     eax,4
      fstp    d 
      mov     ecx, dword ptr d[eax]
      sar     ecx,14h 
      rol     eax,cl 
  }
} 

Solution 5 - C++

Here's a template version of the bit shifting technique.

template<typename T> T next_power2(T value)
{
    --value;
    for(size_t i = 1; i < sizeof(T) * CHAR_BIT; i*=2)
        value |= value >> i;
    return value+1;
}

Since the loop only uses constants it gets flattened by the compiler. (I checked) The function is also future proof.

Here's one that uses __builtin_clz. (Also future proof)

template<typename T> T next_power2(T value)
{
    return 1 << ((sizeof(T) * CHAR_BIT) - __builtin_clz(value-1));
}

Solution 6 - C++

Your implementation is not naive, it's actually the logical one, except that it's wrong - it returns a negative for numbers greater that 1/2 the maximum integer size.

Assuming you can restrict numbers to the range 0 through 2^30 (for 32-bit ints), it'll work just fine, and a lot faster than any mathematical functions involving logarithms.

Unsigned ints would work better but you'd end up with an infinite loop (for numbers greater than 2^31) since you can never reach 2^32 with the << operator.

Solution 7 - C++

pow ( 2 , ceil( log2(value) );

log2(value) = log(value) / log(2);

Solution 8 - C++

An exploration of the possible solutions to closely related problem (that is, rounding down instead of up), many of which are significantly faster than the naive approach, is available on the Bit Twiddling Hacks page, an excellent resource for doing the kinds of optimization you are looking for. The fastest solution is to use a lookup table with 256 entries, that reduces the total operation count to around 7, from an average of 62 (by a similar operation counting methodology) for the naive approach. Adapting those solutions to your problem is a matter of a single comparison and increment.

Solution 9 - C++

You don't really say what you mean by "better algorithm" but as the one you present is perfectly clear (if somewhat flawed), I'll assume you are after a more efficient algorithm.

Larry Gritz has given what is probably the most efficient c/c++ algorithm without the overhead of a look up table and it would suffice in most cases (see http://www.hackersdelight.org for similar algorithms).

As mentioned elsewhere most CPUs these days have machine instructions to count the number of leading zeroes (or equivalently return the ms set bit) however their use is non-portable and - in most cases - not worth the effort.

However most compilers have "intrinsic" functions that allow the use of machine instructions but in a more portable way.

Microsoft C++ has _BitScanReverse() and gcc provides __builtin_clz() to do the bulk of the work efficiently.

Solution 10 - C++

How about a recursive template version to generate a compile constant:

template<uint32_t A, uint8_t B = 16>
struct Pow2RoundDown { enum{ value = Pow2RoundDown<(A | (A >> B)), B/2>::value }; };
template<uint32_t A>
struct Pow2RoundDown<A, 1> { enum{ value = (A | (A >> 1)) - ((A | (A >> 1)) >> 1) }; };

template<uint32_t A, uint8_t B = 16>
struct Pow2RoundUp { enum{ value = Pow2RoundUp<((B == 16 ? (A-1) : A) | ((B == 16 ? (A-1) : A) >> B)), B/2>::value }; };
template<uint32_t A >
struct Pow2RoundUp<A, 1> { enum{ value = ((A | (A >> 1)) + 1) }; };

Can be used like so:

Pow2RoundDown<3221>::value, Pow2RoundUp<3221>::value

Solution 11 - C++

The code below repeatedly strips the lowest bit off until the number is a power of two, then doubles the result unless the number is a power of two to begin with. It has the advantage of running in a time proportional to the number of bits set. Unfortunately, it has the disadvantage of requiring more instructions in almost all cases than either the code in the question or the assembly suggestions. I include it only for completeness.

int nextPow(int x) {
  int y = x
  while (x &= (x^(~x+1))) 
    y = x << 1;
  return y
}

Solution 12 - C++

I know this is downvote-bait, but if the number is small enough (like 8 or 16-bits) a direct lookup might be fastest.

// fill in the table
unsigned short tab[65536];
unsigned short bit = tab[i];

It might be possible to extend it to 32 bits by first doing the high word and then the low.

//
unsigned long bitHigh = ((unsigned long)tab[(unsigned short)(i >> 16)]) << 16;
unsigned long bitLow = 0;
if (bitHigh == 0){
    bitLow = tab[(unsigned short)(i & 0xffff)];
}
unsigned long answer = bitHigh | bitLow;

It's probably no better that the shift-or methods, but maybe could be extended to larger word sizes.

(Actually, this gives the highest 1-bit. You'd have to shift it left by 1 to get the next higher power of 2.)

Solution 13 - C++

My version of the same:

int pwr2Test(size_t x) {
    return (x & (x - 1))? 0 : 1; 
}

size_t pwr2Floor(size_t x) {
    // A lookup table for rounding up 4 bit numbers to
    // the nearest power of 2.
    static const unsigned char pwr2lut[] = {
        0x00, 0x01, 0x02, 0x02,     //  0,  1,  2,  3
        0x04, 0x04, 0x04, 0x04,     //  4,  5,  6,  7
        0x08, 0x08, 0x08, 0x08,     //  8,  9, 10, 11
        0x08, 0x08, 0x08, 0x08      // 12, 13, 14, 15
    };

    size_t pwr2 = 0;                // The return value
    unsigned int i = 0;             // The nybble interator

    for( i = 0; x != 0; ++i ) {     // Iterate through nybbles
        pwr2 = pwr2lut[x & 0x0f];   // rounding up to powers of 2.
        x >>= 4;                    // (i - 1) will contain the
    }                               // highest non-zero nybble index.

    i = i? (i - 1) : i;
    pwr2 <<= (i * 4);
    return pwr2; 
}

size_t pwr2Size(size_t x) {
    if( pwr2Test(x) ) { return x; }
    return pwr2Floor(x) * 2; 
 }

Solution 14 - C++

i love the shift.

i'll settle for

    int bufferPow = 1;
    while ( bufferPow<bufferSize && bufferPow>0) bufferPow <<= 1;

that way the loop always terminates, and the part after && is evaluated almost never. And i do not think two lines are worth a function call. Also you can make a long, or short, depending on your judgment, and it is very readable. (if bufferPow becomes negative, hopefully your main code will exit fast.)

Usually you compute 2-power only once at the start of an algorithm, so optimizing would be silly anyway. However, would be interested if anyone bored enough would care for a speed contest... using the above examples and 255 256 257 .. 4195 4196 4197

Solution 15 - C++

An arbitrary log function can be converted to a log base 2 by dividing by the log of 2:

$ /usr/local/pypy-1.9/bin/pypy
Python 2.7.2 (341e1e3821ff, Jun 07 2012, 15:38:48)
[PyPy 1.9.0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``<arigato> yes but there is not
much sense if I explain all about today's greatest idea if tomorrow it's
completely outdated''
>>>> import math
>>>> print math.log(65535)/math.log(2)
15.9999779861
>>>> print math.log(65536)/math.log(2)
16.0
>>>>

It of course won't be 100% precise, since there is floating point arithmetic involved.

Solution 16 - C++

This works and is really fast (on my 2.66 GHz Intel Core 2 Duo 64-bit processor).

#include <iostream>
int main(void) {
    int testinput,counter;
    std::cin >> testinput;
    while (testinput > 1) {
        testinput = testinput >> 1;
        counter++;
    }
    int finalnum = testinput << counter+1;
    printf("Is %i\n",finalnum);
    return 0;
}

I tested it on 3, 6, and 65496, and correct answers (4, 8, and 65536) were given.

Sorry if this seems a bit arcane, I was under the influence of a couple of hours of Doom just before writing. :)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBoyanView Question on Stackoverflow
Solution 1 - C++Larry GritzView Answer on Stackoverflow
Solution 2 - C++jfsView Answer on Stackoverflow
Solution 3 - C++pngazView Answer on Stackoverflow
Solution 4 - C++Tony LeeView Answer on Stackoverflow
Solution 5 - C++ZacrathView Answer on Stackoverflow
Solution 6 - C++paxdiabloView Answer on Stackoverflow
Solution 7 - C++SoranaView Answer on Stackoverflow
Solution 8 - C++SparrView Answer on Stackoverflow
Solution 9 - C++DipstickView Answer on Stackoverflow
Solution 10 - C++duncan.forsterView Answer on Stackoverflow
Solution 11 - C++DocMaxView Answer on Stackoverflow
Solution 12 - C++Mike DunlaveyView Answer on Stackoverflow
Solution 13 - C++natersozView Answer on Stackoverflow
Solution 14 - C++Kos PetoussisView Answer on Stackoverflow
Solution 15 - C++user1277476View Answer on Stackoverflow
Solution 16 - C++Anonymous GuestView Answer on Stackoverflow