Is there an elegant and fast way to test for the 1-bits in an integer to be in a contiguous region?

C++CBit Manipulation

C++ Problem Overview


I need to test whether the positions (from 0 to 31 for a 32bit integer) with bit value 1 form a contiguous region. For example:

00111111000000000000000000000000      is contiguous
00111111000000000000000011000000      is not contiguous

I want this test, i.e. some function has_contiguous_one_bits(int), to be portable.

One obvious way is to loop over positions to find the first set bit, then the first non-set bit and check for any more set bits.

I wonder whether there exists a faster way? If there are fast methods to find the highest and lowest set bits (but from this question it appears there aren't any portable ones), then a possible implementation is

bool has_contiguous_one_bits(int val)
{
    auto h = highest_set_bit(val);
    auto l = lowest_set_bit(val);
    return val == (((1 << (h-l+1))-1)<<l);
}

Just for fun, here are the first 100 integers with contiguous bits:

0 1 2 3 4 6 7 8 12 14 15 16 24 28 30 31 32 48 56 60 62 63 64 96 112 120 124 126 127 128 192 224 240 248 252 254 255 256 384 448 480 496 504 508 510 511 512 768 896 960 992 1008 1016 1020 1022 1023 1024 1536 1792 1920 1984 2016 2032 2040 2044 2046 2047 2048 3072 3584 3840 3968 4032 4064 4080 4088 4092 4094 4095 4096 6144 7168 7680 7936 8064 8128 8160 8176 8184 8188 8190 8191 8192 12288 14336 15360 15872 16128 16256 16320

they are (of course) of the form (1<<m)*(1<<n-1) with non-negative m and n.

C++ Solutions


Solution 1 - C++

Solution:
static _Bool IsCompact(unsigned x)
{
    return (x & x + (x & -x)) == 0;
}
Briefly:

x & -x gives the lowest bit set in x (or zero if x is zero).

x + (x & -x) converts the lowest string of consecutive 1s to a single 1 higher up (or wraps to zero).

x & x + (x & -x) clears those 1 bits.

(x & x + (x & -x)) == 0 tests whether any other 1 bits remain.

Longer:

-x equals ~x+1 (for the int in the question, we assume two’s complement, but unsigned is preferable). After the bits are flipped in ~x, adding 1 carries so that it flips back the low 1 bits in ~x and the first 0 bit but then stops. Thus, the low bits of -x up to and including its first 1 are the same as the low bits of x, but all higher bits are flipped. (Example: ~10011100 gives 01100011, and adding 1 gives 01100100, so the low 100 are the same, but the high 10011 are flipped to 01100.) Then x & -x gives us the only bit that is 1 in both, which is that lowest 1 bit (00000100). (If x is zero, x & -x is zero.)

Adding this to x causes a carry through all the consecutive 1s, changing them to 0s. It will leave a 1 at the next higher 0 bit (or carry through the high end, leaving a wrapped total of zero) (10100000.)

When this is ANDed with x, there are 0s in the places where the 1s were changed to 0s (and also where the carry changed a 0 to a 1). So the result is not zero only if there is another 1 bit higher up.

Solution 2 - C++

There is actually no need to use any intrinsics.

First flip all the 0s before the first 1. Then test if the new value is a mersenne number. In this algo, zero is mapped to true.

bool has_compact_bits( unsigned const x )
{
	// fill up the low order zeroes
	unsigned const y = x | ( x - 1 );
	// test if the 1's is one solid block
	return not ( y & ( y + 1 ) );
}

Of course, if you want to use intrinsics, here is the popcount method:

bool has_compact_bits( unsigned const x )
{
	size_t const num_bits = CHAR_BIT * sizeof(unsigned);
	size_t const sum = __builtin_ctz(x) + __builtin_popcount(x) + __builtin_clz(z);
	return sum == num_bits;
}

Solution 3 - C++

Actually you don't need to count leading zeros. As suggested by pmg in the comments, exploiting the fact that the numbers you are looking for are those of sequence OEIS A023758, i.e. Numbers of the form 2^i - 2^j with i >= j, you may just count trailing zeros (i.e. j - 1), toggle those bits in the original value (equivalent to add 2^j - 1), and then check if that value is of the form 2^i - 1. With GCC/clang intrinsics,

bool has_compact_bits(int val) {
    if (val == 0) return true; // __builtin_ctz undefined if argument is zero
    int j = __builtin_ctz(val) + 1;
    val |= (1 << j) - 1; // add 2^j - 1
    val &= (val + 1); // val set to zero if of the form (2^i - 1)
    return val == 0;
}

This version is slightly faster then yours and the one proposed by KamilCuk and the one by Yuri Feldman with popcount only.

If you are using C++20, you may get a portable function by replacing __builtin_ctz with std::countr_zero:

#include <bit>

bool has_compact_bits(int val) {
    int j = std::countr_zero(static_cast<unsigned>(val)) + 1; // ugly cast
    val |= (1 << j) - 1; // add 2^j - 1
    val &= (val + 1); // val set to zero if of the form (2^i - 1)
    return val == 0;
}

The cast is ugly, but it is warning you that it is better to work with unsigned types when manipulating bits. Pre-C++20 alternatives are boost::multiprecision::lsb.

Edit:

The benchmark on the strikethrough link was limited by the fact that no popcount instruction had been emitted for Yuri Feldman version. Trying to compile them on my PC with -march=westmere, I've measured the following time for 1 billion iterations with identical sequences from std::mt19937:

  • your version: 5.7 s
  • KamilCuk's second version: 4.7 s
  • my version: 4.7 s
  • Eric Postpischil's first version: 4.3 s
  • Yuri Feldman's version (using explicitly __builtin_popcount): 4.1 s

So, at least on my architecture, the fastest seems to be the one with popcount.

Edit 2:

I've updated my benchmark with the new Eric Postpischil's version. As requested in the comments, code of my test can be found here. I've added a no-op loop to estimate the time needed by the PRNG. I've also added the two versions by KevinZ. Code has been compiled on clang with -O3 -msse4 -mbmi to get popcnt and blsi instruction (thanks to Peter Cordes).

Results: At least on my architecture, Eric Postpischil's version is exactly as fast as Yuri Feldman's one, and at least twice faster than any other version proposed so far.

Solution 4 - C++

Not sure about fast, but can do a one-liner by verifying that val^(val>>1) has at most 2 bits on.

This only works with unsigned types: shifting in a 0 at the top (logical shift) is necessary, not an arithmetic right shift that shifts in a copy of the sign bit.

#include <bitset>
bool has_compact_bits(unsigned val)
{
    return std::bitset<8*sizeof(val)>((val ^ (val>>1))).count() <= 2;
}

To reject 0 (i.e. only accept inputs that have exactly 1 contiguous bit-group), logical-AND with val being non-zero. Other answers on this question accept 0 as compact.

bool has_compact_bits(unsigned val)
{
    return std::bitset<8*sizeof(val)>((val ^ (val>>1))).count() <= 2 and val;
}

C++ portably exposes popcount via std::bitset::count(), or in C++20 via std::popcount. C still doesn't have a portable way that reliably compiles to a popcnt or similar instruction on targets where one is available.

Solution 5 - C++

CPUs have dedicated instructions for that, very fast. On PC they are BSR/BSF (introduced in 80386 in 1985), on ARM they are CLZ/CTZ

Use one to find the index of least significant set bit, shift integer right by that amount. Use another one to find an index of the most significant set bit, compare your integer with (1u<<(bsr+1))-1.

Unfortunately, 35 years wasn't enough to update the C++ language to match the hardware. To use these instructions from C++ you'll need intrinsics, these aren't portable, and return results in slightly different formats. Use preprocessor, #ifdef etc, to detect the compiler and then use appropriate intrinsics. In MSVC they are _BitScanForward, _BitScanForward64, _BitScanReverse, _BitScanReverse64. In GCC and clang they are __builtin_clz and __builtin_ctz.

Solution 6 - C++

Comparison with zeros instead of ones will save some operations:

bool has_compact_bits2(int val) {
    if (val == 0) return true;
    int h = __builtin_clz(val);
    // Clear bits to the left
    val = (unsigned)val << h;
    int l = __builtin_ctz(val);
    // Invert
    // >>l - Clear bits to the right
    return (~(unsigned)val)>>l == 0;
}

The following results in one instructions less then the above on gcc10 -O3 on x86_64 and uses on sign extension:

bool has_compact_bits3(int val) {
    if (val == 0) return true;
    int h = __builtin_clz(val);
    val <<= h;
    int l = __builtin_ctz(val);
    return ~(val>>l) == 0;
}

Tested on godbolt.

Solution 7 - C++

You can rephrase the requirement:

  • set N the number of bits that are different than the previous one (by iterating through the bits)
  • if N=2 and and the first or last bit is 0 then answer is yes
  • if N=1 then answer is yes (because all the 1s are on one side)
  • if N=0 then and any bit is 0 then you have no 1s, up to you if you consider the answer to be yes or no
  • anything else: the answer is no

Going through all bits could look like this:

unsigned int count_bit_changes (uint32_t value) {
  unsigned int bit;
  unsigned int changes = 0;
  uint32_t last_bit = value & 1;
  for (bit = 1; bit < 32; bit++) {
    value = value >> 1;
    if (value & 1 != last_bit  {
      changes++;
      last_bit = value & 1;
    }
  }
  return changes;
}

But this can surely be optimized (e.g. by aborting the for loop when value reached 0 which means no more significant bits with value 1 are present).

Solution 8 - C++

You can do this sequence of calculations (assuming val as an input):

uint32_t x = val;
x |= x >>  1;
x |= x >>  2;
x |= x >>  4;
x |= x >>  8;
x |= x >> 16;

to obtain a number with all zeros below the most significant 1 filled with ones.

You can also calculate y = val & -val to strip all except the least significant 1 bit in val (for example, 7 & -7 == 1 and 12 & -12 == 4).
Warning: this will fail for val == INT_MIN, so you'll have to handle this case separately, but this is immediate.

Then right-shift y by one position, to get a bit below the actual LSB of val, and do the same routine as for x:

uint32_t y = (val & -val) >> 1;
y |= y >>  1;
y |= y >>  2;
y |= y >>  4;
y |= y >>  8;
y |= y >> 16;

Then x - y or x & ~y or x ^ y produces the 'compact' bit mask spanning the whole length of val. Just compare it to val to see if val is 'compact'.

Solution 9 - C++

We can make use of the gcc builtin instructions to check if:

The count of set bits > int __builtin_popcount (unsigned int x)
Returns the number of 1-bits in x.

is equal to (a - b):

a: Index of the highest set bit (32 - CTZ) (32 because 32 bits in an unsigned integer).

> int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

b: Index of the lowest set bit (CLZ):

> int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

For example if n = 0b0001100110; we will obtain 4 with popcount but the index difference (a - b) will return 6.

bool has_contiguous_one_bits(unsigned n) {
	return (32 - __builtin_clz(n) - __builtin_ctz(n)) == __builtin_popcount(n);
}

which can also be written as:

bool has_contiguous_one_bits(unsigned n) {
	return (__builtin_popcount(n) + __builtin_clz(n) + __builtin_ctz(n)) == 32;
}

I don't think it is more elegant or efficient than the current most upvoted answer:

return (x & x + (x & -x)) == 0;

with following assembly:

mov     eax, edi
neg     eax
and     eax, edi
add     eax, edi
test    eax, edi
sete    al

but it is probably easier to understand.

Solution 10 - C++

Okay, here is a version that loops over bits

template<typename Integer>
inline constexpr bool has_compact_bits(Integer val) noexcept
{
    Integer test = 1;
    while(!(test & val) && test) test<<=1; // skip unset bits to find first set bit
    while( (test & val) && test) test<<=1; // skip set bits to find next unset bit
    while(!(test & val) && test) test<<=1; // skip unset bits to find an offending set bit
    return !test;
}

The first two loops found the first compact region. The final loop checks whether there is any other set bit beyond that region.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionWalterView Question on Stackoverflow
Solution 1 - C++Eric PostpischilView Answer on Stackoverflow
Solution 2 - C++KevinZView Answer on Stackoverflow
Solution 3 - C++Giovanni CerretaniView Answer on Stackoverflow
Solution 4 - C++Yuri FeldmanView Answer on Stackoverflow
Solution 5 - C++SoontsView Answer on Stackoverflow
Solution 6 - C++KamilCukView Answer on Stackoverflow
Solution 7 - C++Brecht SandersView Answer on Stackoverflow
Solution 8 - C++CiaPanView Answer on Stackoverflow
Solution 9 - C++Antonin GAVRELView Answer on Stackoverflow
Solution 10 - C++WalterView Answer on Stackoverflow