Fast way of counting non-zero bits in positive integer

PythonBinaryCounting

Python Problem Overview


I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

PS: (i am representing a big 2D binary array as a single list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now I would like to get rid of those extra minutes.

Edit:

  1. it has to be in python 2.7 or 2.6

and optimizing for small numbers does not matter that much since that would not be a clear bottleneck, but I do have numbers with 10 000 + bits at some places

for example this is a 2000 bit case:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

Python Solutions


Solution 1 - Python

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar's and Adam's solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I'd use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

Then it's counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.

Solution 2 - Python

Python 3.10 introduces int.bit_count():

>>> n = 19
>>> bin(n)
'0b10011'
>>> n.bit_count()
3
>>> (-n).bit_count()
3

This is functionally equivalent to bin(n).count("1") but should be ~6 times faster. See also Issue29882.

Solution 3 - Python

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it's easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket's value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets - n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.

Solution 4 - Python

Here's a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.

Solution 5 - Python

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don't mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

Solution 6 - Python

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It's actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
	  n += 1
	  value &= value-1
   return n

Solution 7 - Python

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

Solution 8 - Python

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I'm surprised no one suggested you write a C module.

Solution 9 - Python

It's possible to combine a lookup table with int.to_bytes (Python 3 only):

popcount8bit = bytes([popcount(x) for x in range(1<<8)])  # use any method to initialize this lookup table
popcount = lambda x: sum(map(popcount8bit.__getitem__,
                             x.to_bytes((x.bit_length()+7)//8, "little")))

This solution unfortunately is about 20% slower than bin(x).count('1') on Python 3, but twice faster on PyPy3.


This is a benchmark script, compares several different solutions presented here, for different number of bits:

from __future__ import print_function  #for Python 2

import sys
from timeit import timeit
import random

def popcount(x): return bin(x).count("1")

version3=sys.version.startswith("3")

for numBit in (2, 4, 8, 16, 31, 32, 63, 64, 1000, 10000):
	maximum=int((1<<numBit)-1)  #int cast just in case it overflows to long in Python 2

	functions=[
			(popcount, "bin count"),
			(lambda x: "{:b}".format(x).count("1"), "format string count"),
			]

	try:
		import gmpy
		functions.append((gmpy.popcount, "gmpy"))
	except ImportError:
		pass

	if sys.version.startswith("3"):
		exec('''functions.append((lambda x: f"{x:b}".count("1"), "f-string count"))''')

	if numBit<=16:
		table1=[popcount(x) for x in range(maximum+1)]
		functions.append((lambda x: table1[x], "lookup list"))
		functions.append((table1.__getitem__, "lookup list without lambda"))

		table2="".join(map(chr, table1))
		functions.append((lambda x: ord(table2[x]), "lookup str"))

		if version3:
			table3=bytes(table1)
			functions.append((lambda x: table3[x], "lookup bytes"))

			if numBit==8:
				functions.append((
						b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08'
						.__getitem__, "lookup bytes hard coded 8 bit"))
				table_hardcoded=(
						b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
						b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')
				functions.append((
						table_hardcoded.__getitem__, "lookup bytes hard coded 8 bit local variable"))
			functions.append((table3.__getitem__, "lookup bytes without lambda"))

	if version3:
		popcount8bit=bytes([popcount(x) for x in range(1<<8)]) #bytes because benchmark says that it's fastest
		functions.append((
			lambda x: sum(popcount8bit[x] for x in x.to_bytes((x.bit_length()+7)//8, "big")),
			"to_bytes"
			))
		functions.append((
			lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "big"))),
			"to_bytes without list comprehension"
			))
		functions.append((
			lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "little"))),
			"to_bytes little endian, without list comprehension"
			))

	#for x in (2, 4, 8, 16, 32, 64):
	#	table1=[popcount(x) for x in range(1<<8)]


	print("====== numBit=", numBit)
	data=[]
	numRepeat=10**7//(numBit+100)
	for popcountFunction, description in functions:
		random.seed(10) #make randint returns the same value
		data.append((
			timeit(lambda: popcountFunction(random.randint(0, maximum)), number=numRepeat),
			description
			))

	time1, name1=data[0]
	assert name1=="bin count"
	data.sort()
	maxLength=0
	for time, description in data:
		maxLength=max(maxLength, len(description))
	for time, description in data:
		print("{:{}} -> {:2f} = {} * {:2f}".format(description, maxLength+2, time, name1, time/time1))

It works with both Python 2 and 3; however, if a solution is unavailable for Python 2, it's not measured.

Some solutions are not listed here.

Result:

  • Python 2: "lookup list without lambda" is the fastest (25% faster than "bin count", 6% faster than "lookup list" (with lambda)) for <= 16 bits, larger than that "bin count" is the fastest. (I didn't install gmpy for Python 2)
  • Python 3: Roughly the same result.
    • "Lookup bytes without lambda" is comparable (+/-2% compared to "lookup list without lambda").
    • gmpy is faster than "bin count` in all cases, but slower than "lookup list without lambda" for about 5% with numBit <= 16.
    • "to_bytes" is comparable.
    • Using f-string is about 10% slower than "bin count".
  • PyPy3: The lambda no longer incurs much cost, and the to_bytes version becomes much faster (twice faster than "bin count"); however, I could not get gmpy to install.

Solution 10 - Python

@Robotbugs' answer, but wrapped in numba's njit decorator was faster than the gmpy in my case.

@njit(int64(uint64))
def get_bit_count(bitboard):
    n = 0
    bitboard = int64(bitboard)
    while bitboard:
        n += 1
        bitboard &= bitboard - 1
    return n

I had to set uint64 as argument type to avoid OverlowError.

Solution 11 - Python

class popcount_lk:
    """ Creates an instance for calculating the population count of
        bitstring, based on a lookup table of 8 bits. """

    def __init__(self):
        """ Creates a large lookup table of the Hamming weight of every 8 bit integer. """
        self.lookup_table = bytes.maketrans(bytes(range(1<<8)),bytes((bin(i).count('1') for i in range(1<<8))))
        self.byteorder = sys.byteorder
    
    def __call__(self,x):
        """ Breaks x, which is a python integer type, into chuncks of 8 bits.
        Calls the lookup table to get the population count of each chunck and returns
        the aggregated population count. """

        return sum(x.to_bytes((x.bit_length()>>3)+1,self.byteorder).translate(self.lookup_table))

popcount = popcount_lk
print(popcount(56437865483765))

This should be 3 times faster than bin(56437865483765).count('1') in CPython and PyPy3.

Solution 12 - Python

#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)

Solution 13 - Python

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.


The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti's answer for a good implementation of this. Of course, you don't have to keep this all in memory - you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionzidarsk8View Question on Stackoverflow
Solution 1 - PythonkindallView Answer on Stackoverflow
Solution 2 - PythonChris_RandsView Answer on Stackoverflow
Solution 3 - PythonAdam ZalcmanView Answer on Stackoverflow
Solution 4 - PythonÓscar LópezView Answer on Stackoverflow
Solution 5 - PythonPaolo MorettiView Answer on Stackoverflow
Solution 6 - PythonRobotbugsView Answer on Stackoverflow
Solution 7 - PythonManuelView Answer on Stackoverflow
Solution 8 - PythonleewzView Answer on Stackoverflow
Solution 9 - Pythonuser202729View Answer on Stackoverflow
Solution 10 - PythonArekKulczyckiView Answer on Stackoverflow
Solution 11 - PythonAriadView Answer on Stackoverflow
Solution 12 - PythonPraveen NaralaView Answer on Stackoverflow
Solution 13 - PythonMarcinView Answer on Stackoverflow