UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte

PythonPandasCsv

Python Problem Overview


I am new to Python, I am trying to read csv file using below script.

Past=pd.read_csv("C:/Users/Admin/Desktop/Python/Past.csv",encoding='utf-8')

But, getting error "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte", Please help me to know issue here, I used encoding in script thought it will resolve error.

Python Solutions


Solution 1 - Python

This happens because you chose the wrong encoding.

Since you are working on a Windows machine, just replacing

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='utf-8') 

with

Past=pd.read_csv("C:/Users/.../Past.csv",encoding='cp1252')

should solve the problem.

Solution 2 - Python

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

Solution 3 - Python

Try using :

pd.read_csv("Your filename", encoding="ISO-8859-1")

The code that I parsed from some website was converted in this encoding instead of default UTF-8 encoding which is standard.

Solution 4 - Python

The following works very well for me:

encoding = 'latin1'

Solution 5 - Python

Using the code bellow works for me:

with open(keeniz_dir + '/world_cities.csv',  'r', encoding='latin1') as input:

Solution 6 - Python

Its an old question but shows up while searching for solutions to this error. So I thought to answer for all who still stumble on this thread. The encoding for the file can be checked before passing the correct value for the encoding argument. To get the encoding, a simple option in Windows is to open the file in Notepad++ and look at the encoding. The correct value for the encoding argument can then be found in the python documentation. Look at this question and the answers on stackoverflow for more details on different possibilities to get the file encoding.

Solution 7 - Python

Don't pass encoding option unless you are sure about file encoding. Default value encoding=None passes errors="replace" to open() function called. Characters with encoding errors will be substituted with replacements, you can then figure out correct encoding or just use the resulting Dataframe. If wrong encoding is provided pd will pass errors="strict" to open() and get ValueError if encoding is incorrect.

Solution 8 - Python

df = pd.read_csv( "/content/data.csv",encoding='latin1')

Just add ,encoding='latin1' and it will work

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser3734568View Question on Stackoverflow
Solution 1 - PythonLiamView Answer on Stackoverflow
Solution 2 - PythonNitish Kumar PalView Answer on Stackoverflow
Solution 3 - Pythonask_meView Answer on Stackoverflow
Solution 4 - PythonJason GoalView Answer on Stackoverflow
Solution 5 - PythonJuba FouraliView Answer on Stackoverflow
Solution 6 - PythonKumar SaurabhView Answer on Stackoverflow
Solution 7 - PythonJacek BłockiView Answer on Stackoverflow
Solution 8 - PythonDeveloper-FelixView Answer on Stackoverflow