Pandas cannot open an Excel (.xlsx) file

PythonExcelPandas

Python Problem Overview


Please see my code below:

import pandas
df = pandas.read_excel('cat.xlsx')

After running that, it gives me the following error:

Traceback (most recent call last):
  File "d:\OneDrive\桌面\practice.py", line 4, in <module>
    df = pandas.read_excel('cat.xlsx')
  File "D:\python\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\xlrd\__init__.py", line 170, in open_workbook
    raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported

I tried uninstall and reinstall Pandas with the pip command. The error persists. I have xlrd 2.0.1 and Pandas 1.1.5 installed.

Python Solutions


Solution 1 - Python

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange but still present in the readme on the repo and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

This is due to potential security vulnerabilities relating to the use of xlrd version 1.2 or earlier for reading .xlsx files.

In your case, the solution is to:

  • make sure you are on a recent version of pandas, at least 1.0.1, and preferably the latest release.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your pandas code to be:
    pandas.read_excel('cat.xlsx', engine='openpyxl')
    

Edit: Currently, pandas >= 1.2 addresses this issue. (Release Notes)

Solution 2 - Python

The latest version of xlrd (2.0.1) only supports .xls files.

If you are prepared to risk potential security vulnerabilities, and risk incorrect parsing of certain files, this error can be solved by installing an older version of xlrd.

Use the command below in a shell or cmd prompt:

pip install xlrd==1.2.0

Solution 3 - Python

Best way is to probably make openpyxl you're default reader for read_excel() in case you have old code that broke because of this update.

You can do it by changing the default values of the method by going to the _base.py inside the environment's pandas folder. You can find it as follows:

import pandas as pd
print(pd.__file__)

Open the file and find

def read_excel(...)

You will find the default value for engine. Change it to 'openpyxl'

Original tip/answer here: https://stackoverflow.com/a/69577391/7151338

Solution 4 - Python

I had the same problem using the ExcelFile constructor (for a file containing multiple worksheets) instead of the read_excel method. In that case the solution is:

import pandas

xlsx = pandas.ExcelFile('cat.xlsx', engine='openpyxl')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLNQView Question on Stackoverflow
Solution 1 - PythonChris WithersView Answer on Stackoverflow
Solution 2 - PythonLNQView Answer on Stackoverflow
Solution 3 - PythonLuisSilvaView Answer on Stackoverflow
Solution 4 - PythonavandeursenView Answer on Stackoverflow