How to convert a string of bytes into an int?

PythonArraysString

Python Problem Overview


How can I convert a string of bytes into an int in python?

Say like this: 'y\xcc\xa6\xbb'

I came up with a clever/stupid way of doing it:

sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))

I know there has to be something builtin or in the standard library that does this more simply...

This is different from converting a string of hex digits for which you can use int(xxx, 16), but instead I want to convert a string of actual byte values.

UPDATE:

I kind of like James' answer a little better because it doesn't require importing another module, but Greg's method is faster:

>>> from timeit import Timer
>>> Timer('struct.unpack("<L", "y\xcc\xa6\xbb")[0]', 'import struct').timeit()
0.36242198944091797
>>> Timer("int('y\xcc\xa6\xbb'.encode('hex'), 16)").timeit()
1.1432669162750244

My hacky method:

>>> Timer("sum(ord(c) << (i * 8) for i, c in enumerate('y\xcc\xa6\xbb'[::-1]))").timeit()
2.8819329738616943

FURTHER UPDATE:

Someone asked in comments what's the problem with importing another module. Well, importing a module isn't necessarily cheap, take a look:

>>> Timer("""import struct\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""").timeit()
0.98822188377380371

Including the cost of importing the module negates almost all of the advantage that this method has. I believe that this will only include the expense of importing it once for the entire benchmark run; look what happens when I force it to reload every time:

>>> Timer("""reload(struct)\nstruct.unpack(">L", "y\xcc\xa6\xbb")[0]""", 'import struct').timeit()
68.474128007888794

Needless to say, if you're doing a lot of executions of this method per one import than this becomes proportionally less of an issue. It's also probably i/o cost rather than cpu so it may depend on the capacity and load characteristics of the particular machine.

Python Solutions


Solution 1 - Python

In Python 3.2 and later, use

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='big')
2043455163

or

>>> int.from_bytes(b'y\xcc\xa6\xbb', byteorder='little')
3148270713

according to the endianness of your byte-string.

This also works for bytestring-integers of arbitrary length, and for two's-complement signed integers by specifying signed=True. See the docs for from_bytes.

Solution 2 - Python

You can also use the struct module to do this:

>>> struct.unpack("<L", "y\xcc\xa6\xbb")[0]
3148270713L

Solution 3 - Python

As Greg said, you can use struct if you are dealing with binary values, but if you just have a "hex number" but in byte format you might want to just convert it like:

s = 'y\xcc\xa6\xbb'
num = int(s.encode('hex'), 16)

...this is the same as:

num = struct.unpack(">L", s)[0]

...except it'll work for any number of bytes.

Solution 4 - Python

I use the following function to convert data between int, hex and bytes.

def bytes2int(str):
 return int(str.encode('hex'), 16)

def bytes2hex(str):
 return '0x'+str.encode('hex')

def int2bytes(i):
 h = int2hex(i)
 return hex2bytes(h)

def int2hex(i):
 return hex(i)

def hex2int(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return int(h, 16)

def hex2bytes(h):
 if len(h) > 1 and h[0:2] == '0x':
  h = h[2:]

 if len(h) % 2:
  h = "0" + h

 return h.decode('hex')

Source: http://opentechnotes.blogspot.com.au/2014/04/convert-values-to-from-integer-hex.html

Solution 5 - Python

import array
integerValue = array.array("I", 'y\xcc\xa6\xbb')[0]

Warning: the above is strongly platform-specific. Both the "I" specifier and the endianness of the string->int conversion are dependent on your particular Python implementation. But if you want to convert many integers/strings at once, then the array module does it quickly.

Solution 6 - Python

In Python 2.x, you could use the format specifiers <B for unsigned bytes, and <b for signed bytes with struct.unpack/struct.pack.

E.g:

Let x = '\xff\x10\x11'

data_ints = struct.unpack('<' + 'B'*len(x), x) # [255, 16, 17]

And:

data_bytes = struct.pack('<' + 'B'*len(data_ints), *data_ints) # '\xff\x10\x11'

That * is required!

See https://docs.python.org/2/library/struct.html#format-characters">https://docs.python.org/2/library/struct.html#format-characters</a> for a list of the format specifiers.

Solution 7 - Python

>>> reduce(lambda s, x: s*256 + x, bytearray("y\xcc\xa6\xbb"))
2043455163

Test 1: inverse:

>>> hex(2043455163)
'0x79cca6bb'

Test 2: Number of bytes > 8:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAA"))
338822822454978555838225329091068225L

Test 3: Increment by one:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAAB"))
338822822454978555838225329091068226L

Test 4: Append one byte, say 'A':

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))
86738642548474510294585684247313465921L

Test 5: Divide by 256:

>>> reduce(lambda s, x: s*256 + x, bytearray("AAAAAAAAAAAAAABA"))/256
338822822454978555838225329091068226L

Result equals the result of Test 4, as expected.

Solution 8 - Python

I was struggling to find a solution for arbitrary length byte sequences that would work under Python 2.x. Finally I wrote this one, it's a bit hacky because it performs a string conversion, but it works.

Function for Python 2.x, arbitrary length

def signedbytes(data):
	"""Convert a bytearray into an integer, considering the first bit as
	sign. The data must be big-endian."""
	negative = data[0] & 0x80 > 0

	if negative:
	    inverted = bytearray(~d % 256 for d in data)
	    return -signedbytes(inverted) - 1

	encoded = str(data).encode('hex')
	return int(encoded, 16)

This function has two requirements:

  • The input data needs to be a bytearray. You may call the function like this:

      s = 'y\xcc\xa6\xbb'
      n = signedbytes(s)
    
  • The data needs to be big-endian. In case you have a little-endian value, you should reverse it first:

      n = signedbytes(s[::-1])
    

Of course, this should be used only if arbitrary length is needed. Otherwise, stick with more standard ways (e.g. struct).

Solution 9 - Python

int.from_bytes is the best solution if you are at version >=3.2. The "struct.unpack" solution requires a string so it will not apply to arrays of bytes. Here is another solution:

def bytes2int( tb, order='big'):
    if order == 'big': seq=[0,1,2,3]
    elif order == 'little': seq=[3,2,1,0]
    i = 0
    for j in seq: i = (i<<8)+tb[j]
    return i

hex( bytes2int( [0x87, 0x65, 0x43, 0x21])) returns '0x87654321'.

It handles big and little endianness and is easily modifiable for 8 bytes

Solution 10 - Python

As mentioned above using unpack function of struct is a good way. If you want to implement your own function there is an another solution:

def bytes_to_int(bytes):
    result = 0
    for b in bytes:
        result = result * 256 + int(b)
return result

Solution 11 - Python

In python 3 you can easily convert a byte string into a list of integers (0..255) by

>>> list(b'y\xcc\xa6\xbb')
[121, 204, 166, 187]

Solution 12 - Python

A decently speedy method utilizing array.array I've been using for some time:

predefined variables:

offset = 0
size = 4
big = True # endian
arr = array('B')
arr.fromstring("\x00\x00\xff\x00") # 5 bytes (encoding issues) [0, 0, 195, 191, 0]

to int: (read)

val = 0
for v in arr[offset:offset+size][::pow(-1,not big)]: val = (val<<8)|v

from int: (write)

val = 16384
arr[offset:offset+size] = \
    array('B',((val>>(i<<3))&255 for i in range(size)))[::pow(-1,not big)]

It's possible these could be faster though.

EDIT:
For some numbers, here's a performance test (Anaconda 2.3.0) showing stable averages on read in comparison to reduce():

========================= byte array to int.py =========================
5000 iterations; threshold of min + 5000ns:
______________________________________code___|_______min______|_______max______|_______avg______|_efficiency
⣿⠀⠀⠀⠀⡇⢀⡀⠀⠀⠀⠀⠀⠀⡇⠀⠀⠀⡀⠀⢰⠀⠀⠀⢰⠀⠀⠀⢸⠀⠀⢀⡇⠀⢀⠀⠀⠀⠀⢠⠀⠀⠀⠀⢰⠀⠀⠀⢸⡀⠀⠀⠀⢸⠀⡇⠀⠀⢠⠀⢰⠀⢸⠀
⣿⣦⣴⣰⣦⣿⣾⣧⣤⣷⣦⣤⣶⣾⣿⣦⣼⣶⣷⣶⣸⣴⣤⣀⣾⣾⣄⣤⣾⡆⣾⣿⣿⣶⣾⣾⣶⣿⣤⣾⣤⣤⣴⣼⣾⣼⣴⣤⣼⣷⣆⣴⣴⣿⣾⣷⣧⣶⣼⣴⣿⣶⣿⣶
    val = 0 \nfor v in arr: val = (val<<8)|v |     5373.848ns |   850009.965ns |     ~8649.64ns |  62.128%
⡇⠀⠀⢀⠀⠀⠀⡇⠀⡇⠀⠀⣠⠀⣿⠀⠀⠀⠀⡀⠀⠀⡆⠀⡆⢰⠀⠀⡆⠀⡄⠀⠀⠀⢠⢀⣼⠀⠀⡇⣠⣸⣤⡇⠀⡆⢸⠀⠀⠀⠀⢠⠀⢠⣿⠀⠀⢠⠀⠀⢸⢠⠀⡀
⣧⣶⣶⣾⣶⣷⣴⣿⣾⡇⣤⣶⣿⣸⣿⣶⣶⣶⣶⣧⣷⣼⣷⣷⣷⣿⣦⣴⣧⣄⣷⣠⣷⣶⣾⣸⣿⣶⣶⣷⣿⣿⣿⣷⣧⣷⣼⣦⣶⣾⣿⣾⣼⣿⣿⣶⣶⣼⣦⣼⣾⣿⣶⣷
                  val = reduce( shift, arr ) |     6489.921ns |  5094212.014ns |   ~12040.269ns |  53.902%

This is a raw performance test, so the endian pow-flip is left out.
The shift function shown applies the same shift-oring operation as the for loop, and arr is just array.array('B',[0,0,255,0]) as it has the fastest iterative performance next to dict.

I should probably also note efficiency is measured by accuracy to the average time.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionʞɔıuView Question on Stackoverflow
Solution 1 - PythonMechanical snailView Answer on Stackoverflow
Solution 2 - PythonGreg HewgillView Answer on Stackoverflow
Solution 3 - PythonJames AntillView Answer on Stackoverflow
Solution 4 - PythonJrmView Answer on Stackoverflow
Solution 5 - PythonRafał DowgirdView Answer on Stackoverflow
Solution 6 - PythonTetraluxView Answer on Stackoverflow
Solution 7 - Pythonuser3076105View Answer on Stackoverflow
Solution 8 - PythonAndrea LazzarottoView Answer on Stackoverflow
Solution 9 - Pythonuser3435121View Answer on Stackoverflow
Solution 10 - PythonabdullahselekView Answer on Stackoverflow
Solution 11 - PythonfhgdView Answer on Stackoverflow
Solution 12 - PythonTcllView Answer on Stackoverflow