How do I convert a Python 3 byte-string variable into a regular string?

Python 3.xStringType Conversion

Python 3.x Problem Overview


I have read in an XML email attachment with

bytes_string=part.get_payload(decode=False)

The payload comes in as a byte string, as my variable name suggests.

I am trying to use the recommended Python 3 approach to turn this string into a usable string that I can manipulate.

The example shows:

> str(b'abc','utf-8')

How can I apply the b (bytes) keyword argument to my variable bytes_string and use the recommended approach?

The way I tried doesn't work:

str(bbytes_string, 'utf-8')

Python 3.x Solutions


Solution 1 - Python 3.x

You had it nearly right in the last line. You want

str(bytes_string, 'utf-8')

because the type of bytes_string is bytes, the same as the type of b'abc'.

Solution 2 - Python 3.x

Call decode() on a bytes instance to get the text which it encodes.

str = bytes.decode()

Solution 3 - Python 3.x

> How to filter (skip) non-UTF8 charachers from array?

To address this comment in @uname01's post and the OP, ignore the errors:

Code

>>> b'\x80abc'.decode("utf-8", errors="ignore")
'abc'

Details

From the docs, here are more examples using the same errors parameter:

>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte

> The errors argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are 'strict' (raise a UnicodeDecodeError exception), 'replace' (use U+FFFD, REPLACEMENT CHARACTER), or 'ignore' (just leave the character out of the Unicode result).

Solution 4 - Python 3.x

UPDATED:

> TO NOT HAVE ANY b and quotes at first and end > > How to convert bytes as seen to strings, even in weird situations.

As your code may have unrecognizable characters to 'utf-8' encoding, it's better to use just str without any additional parameters:

some_bad_bytes = b'\x02-\xdfI#)'
text = str( some_bad_bytes )[2:-1]

print(text)
Output: \x02-\xdfI

if you add 'utf-8' parameter, to these specific bytes, you should receive error.

As PYTHON 3 standard says, text would be in utf-8 now with no concern.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDjangoTangoView Question on Stackoverflow
Solution 1 - Python 3.xToby SpeightView Answer on Stackoverflow
Solution 2 - Python 3.xuname01View Answer on Stackoverflow
Solution 3 - Python 3.xpylangView Answer on Stackoverflow
Solution 4 - Python 3.xSeyfiView Answer on Stackoverflow