Read XLSB File in Pandas Python

PythonPandas

Python Problem Overview


There are many questions on this, but there has been no simple answer on how to read an xlsb file into pandas. Is there an easy way to do this?

Python Solutions


Solution 1 - Python

With the 1.0.0 release of pandas - January 29, 2020, support for binary Excel files was added.

import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')

Notes:

  • You will need to upgrade pandas - pip install pandas --upgrade
  • You will need to install pyxlsb - pip install pyxlsb

Solution 2 - Python

Hi actually there is a way. Just use pyxlsb library.

import pandas as pd
from pyxlsb import open_workbook as open_xlsb

df = []

with open_xlsb('some.xlsb') as wb:
    with wb.get_sheet(1) as sheet:
        for row in sheet.rows():
            df.append([item.v for item in row])

df = pd.DataFrame(df[1:], columns=df[0])

UPDATE: as of pandas version 1.0 read_excel() now can read binary Excel (.xlsb) files by passing engine='pyxlsb'

Source: https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

Solution 3 - Python

Pyxlsb indeed is an option to read xlsb file, however, is rather limited.

I suggest using the xlwings package which makes it possible to read and write xlsb files without losing sheet formating, formulas, etc. in the xlsb file. There is extensive documentation available.

import pandas as pd
import xlwings as xw

app = xw.App()
book = xw.Book('file.xlsb')
sheet = book.sheets('sheet_name')
df = sheet.range('A1').options(pd.DataFrame, expand='table').value
book.close()
app.kill()

'A1' in this case is the starting position of the excel table. To write to xlsb file, simply write:

sheet.range('A1').value = df

Solution 4 - Python

If you want to read a big binary file or any excel file with some ranges you can directly put at this code

range = (your_index_number)
first_dataframe = []
second_dataframe = []
with open_xlsb('Test.xlsb') as wb:
    with wb.get_sheet('Sheet1') as sheet:
        i=0
        for row in sheet.rows():
            if(i!=range):
                first_dataframe.append([item.v for item in row])
                i=i+1
            else:
                second_dataframe.append([item.v for item in row])
        

first_dataframe = pd.DataFrame(first_dataframe[1:], columns=first[0])
second_dataframe = pd.DataFrame(second_dataframe[:], columns=first.columns)

Solution 5 - Python

To be able to read xlsb files, it is necessary to have openpyxl installed.

As per https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel

engine: str, default None

If io is not a buffer or path, this must be set to identify io. Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”. Engine compatibility :

“xlrd” supports old-style Excel files (.xls).

“openpyxl” supports newer Excel file formats.

“odf” supports OpenDocument file formats (.odf, .ods, .odt).

“pyxlsb” supports Binary Excel files.

Changed in version 1.2.0: The engine xlrd now only supports old-style .xls files. When engine=None, the following logic will be used to determine the engine:

If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.

Otherwise if path_or_buffer is an xls format, xlrd will be used.

Otherwise if openpyxl is installed, then openpyxl will be used.

Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.

Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.

xlsb reading without index_col:

import pandas as pd

dfcluster = pd.read_excel('c:/xml/baseline/distribucion.xlsb', sheet_name='Cluster', index_col=0, engine='pyxlsb')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGayatriView Question on Stackoverflow
Solution 1 - PythonGlen ThompsonView Answer on Stackoverflow
Solution 2 - PythonFinrod FelagundView Answer on Stackoverflow
Solution 3 - PythongmarView Answer on Stackoverflow
Solution 4 - PythonRishabh KaushikView Answer on Stackoverflow
Solution 5 - PythonGERMAN RODRIGUEZView Answer on Stackoverflow