How to make unicode string with python3

PythonUnicodePython 3.x

Python Problem Overview


I used this :

u = unicode(text, 'utf-8')

But getting error with Python 3 (or... maybe I just forgot to include something) :

NameError: global name 'unicode' is not defined

Thank you.

Python Solutions


Solution 1 - Python

Literal strings are unicode by default in Python3.

Assuming that text is a bytes object, just use text.decode('utf-8')

unicode of Python2 is equivalent to str in Python3, so you can also write:

str(text, 'utf-8')

if you prefer.

Solution 2 - Python

What's new in Python 3.0 says:

> All text is Unicode; however encoded Unicode is represented as binary > data

If you want to ensure you are outputting utf-8, here's an example from this page on unicode in 3.0:

b'\x80abc'.decode("utf-8", "strict")

Solution 3 - Python

As a workaround, I've been using this:

# Fix Python 2.x.
try:
	UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)

Solution 4 - Python

This how I solved my problem to convert chars like \uFE0F, \u000A, etc. And also emojis that encoded with 16 bytes.

example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream😍😍❤️ Present Moment Cafè in St.Augustine❤️❤️ '

Solution 5 - Python

the easiest way in python 3.x

text = "hi , I'm text"
text.encode('utf-8')

Solution 6 - Python

In a Python 2 program that I used for many years there was this line:

ocd[i].namn=unicode(a[:b], 'utf-8')

This did not work in Python 3.

However, the program turned out to work with:

ocd[i].namn=a[:b]

I don't remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without "unicode".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncndView Question on Stackoverflow
Solution 1 - PythonJohn La RooyView Answer on Stackoverflow
Solution 2 - PythonTremmorsView Answer on Stackoverflow
Solution 3 - PythonmagicrebirthView Answer on Stackoverflow
Solution 4 - PythonIlyasView Answer on Stackoverflow
Solution 5 - Pythonmosi_khaView Answer on Stackoverflow
Solution 6 - PythonPer PerssonView Answer on Stackoverflow