error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

PythonPython 3.xUtf 8

Python Problem Overview


https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling "process.py" on the above site.

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python's version is 3.5.2.

Python Solutions


Solution 1 - Python

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Since you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.

Solution 2 - Python

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You'll just lose some characters. but if your don't care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference

Solution 3 - Python

Use encoding format ISO-8859-1 to solve the issue.

Solution 4 - Python

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.

Solution 5 - Python

I've come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn't appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

Heavily edited after I found out the real answer

Solution 6 - Python

This is due to the different encoding method when read the file. In python, it defaultly encode the data with unicode. However, it may not works in various platforms.

I propose an encoding method which can help you solve this if 'utf-8' not works.

with open(path, newline='', encoding='cp1252') as csvfile:
    reader = csv.reader(csvfile)

It should works if you change the encoding method here. Also, you can find other encoding method here standard-encodings , if above doesn't work for you.

Solution 7 - Python

It simply means that one chose the wrong encoding to read the file.

On Mac, use file -I file.txt to find the correct encoding. On Linux, use file -i file.txt.

Solution 8 - Python

I had a similar issue with PNG files. and I tried the solutions above without success. this one worked for me in python 3.8

with open(path, "rb") as f:

Solution 9 - Python

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

Solution 10 - Python

If you are on a mac check if you for a hidden file, .DS_Store. After removing the file my program worked.

Solution 11 - Python

Those getting similar errors while handling Pandas for data frames use the following solution.

example solution.

df = pd.read_csv("File path", encoding='cp1252')

Solution 12 - Python

I had this UnicodeDecodeError while trying to read a '.csv' file using pandas.read_csv(). In my case, I could not manage to overcome this issue using other encoder types. But instead of using

pd.read_csv(filename, delimiter=';')

I used:

pd.read_csv(open(filename, 'r'), delimiter=';')

which just seems working fine for me.

Note that: In open() function, use 'r' instead of 'rb'. Because 'rb' returns bytes object that causes to happen this decoder error in the first place, that is the same problem in the read_csv(). But 'r' returns str which is needed since our data is in .csv, and using the default encoding='utf-8' parameter, we can easily parse the data using read_csv() function.

Solution 13 - Python

if you are receiving data from a serial port, make sure you are using the right baudrate (and the other configs ) : decoding using (utf-8) but the wrong config will generate the same error

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

to check your serial port config on linux use : stty -F /dev/ttyUSBX -a

Solution 14 - Python

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Solution 15 - Python

I had a similar issue and searched all the internet for this problem

if you have this problem just copy your HTML code in a new HTML file and use the normal <meta charset="UTF-8"> and it will work....

just create a new HTML file in the same location and use a different name

Solution 16 - Python

You have to use the encoding as latin1 to read this file as there are some special character in this file, use the below code snippet to read the file.

The problem here is the encoding type. When Python can't convert the data to be read, it gives an error.

You can you latin1 or other encoding values.

I say try and test to find the right one for your dataset.

Solution 17 - Python

I had a similar problem.

Solved it by:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

However, I had another problem. Some html files (in my case) were not utf-8, so I received a similar error. When I excluded those html files, everything worked smoothly.

So, except from fixing the code, check also the files you are reading from, maybe there is an incompatibility there indeed.

Solution 18 - Python

I have the same issue when processing a file generated from Linux. It turns out it was related with files containing question marks..

Solution 19 - Python

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.

Solution 20 - Python

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpieView Question on Stackoverflow
Solution 1 - PythonAlfeView Answer on Stackoverflow
Solution 2 - PythonNitish Kumar PalView Answer on Stackoverflow
Solution 3 - PythonRamineni Ravi TejaView Answer on Stackoverflow
Solution 4 - Pythontattmoney76View Answer on Stackoverflow
Solution 5 - PythonPeter OgdenView Answer on Stackoverflow
Solution 6 - PythonJie YinView Answer on Stackoverflow
Solution 7 - PythonMinh TrietView Answer on Stackoverflow
Solution 8 - PythonNwawel A IroumeView Answer on Stackoverflow
Solution 9 - Pythonpradeep karunathilakaView Answer on Stackoverflow
Solution 10 - PythonJuan NavarreteView Answer on Stackoverflow
Solution 11 - Python13TracsoView Answer on Stackoverflow
Solution 12 - PythonOnur KirmanView Answer on Stackoverflow
Solution 13 - PythonSaif FaidiView Answer on Stackoverflow
Solution 14 - PythonRex131xOView Answer on Stackoverflow
Solution 15 - PythonMoShamroukhView Answer on Stackoverflow
Solution 16 - PythonAli HassanView Answer on Stackoverflow
Solution 17 - PythonKostas TsiligkirisView Answer on Stackoverflow
Solution 18 - PythonwfolkertsView Answer on Stackoverflow
Solution 19 - PythonManoj JoshiView Answer on Stackoverflow
Solution 20 - Pythonuser8665083View Answer on Stackoverflow