how to read certain columns from Excel using Pandas - Python
PythonNumpyPandasDataframePython Problem Overview
I am reading from an Excel sheet and I want to read certain columns: column 0 because it is the row-index, and columns 22:37. Now here is what I do:
import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df= pd.concat([df[df.columns[0]], df[df.columns[22:]]], axis=1)
But I would hope there is better way to do that! I know if I do parse_cols=[0, 22,..,37]
I can do it, but for large datasets this doesn't make sense.
I also did this:
s = pd.Series(0)
s[1]=22
for i in range(2,14):
s[i]=s[i-1]+1
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)
But it reads the first 15 columns which is the length of s
.
Python Solutions
Solution 1 - Python
You can use column indices (letters) like this:
import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols="A,C:AA")
print(df)
> usecols : int, str, list-like, or callable default None > > - If None, then parse all columns. > - If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides. > - If list of int, then indicates list of column numbers to be parsed. > - If list of string, then indicates list of column names to be parsed. > > New in version 0.24.0. > > - If callable, then evaluate each column name against it and parse the column if the callable returns True. > > Returns a subset of the columns according to behavior above. > > New in version 0.24.0.
Solution 2 - Python
parse_cols
is deprecated, use usecols
instead
that is:
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")
Solution 3 - Python
"usecols" should help, use range of columns (as per excel worksheet, A,B...etc.) below are the examples
1. Selected Columns
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A,C,F")
2. Range of Columns and selected column
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H")
3. Multiple Ranges
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H,J:N")
4. Range of columns
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:N")