Pythonic type hints with pandas?

PythonPandasType Hinting

Python Problem Overview


Let's take a simple function that takes a str and returns a dataframe:

import pandas as pd
def csv_to_df(path):
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

What is the recommended pythonic way of adding type hints to this function?

If I ask python for the type of a DataFrame it returns pandas.core.frame.DataFrame. The following won't work though, as it'll tell me that pandas is not defined.

 def csv_to_df(path: str) -> pandas.core.frame.DataFrame:
     return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

Python Solutions


Solution 1 - Python

Why not just use pd.DataFrame?

import pandas as pd
def csv_to_df(path: str) -> pd.DataFrame:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

Result is the same:

> help(csv_to_df)
Help on function csv_to_df in module __main__:
csv_to_df(path:str) -> pandas.core.frame.DataFrame

Solution 2 - Python

I'm currently doing the following:

from typing import TypeVar
PandasDataFrame = TypeVar('pandas.core.frame.DataFrame')
def csv_to_df(path: str) -> PandasDataFrame:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

Which gives:

> help(csv_to_df)
Help on function csv_to_df in module __main__:

csv_to_df(path:str) -> ~pandas.core.frame.DataFrame

Don't know how pythonic that is, but it's understandable enough as a type hint, I find.

Solution 3 - Python

Now there is a pip package that can help with this. https://github.com/CedricFR/dataenforce

You can install it with pip install dataenforce and use very pythonic type hints like:

def preprocess(dataset: Dataset["id", "name", "location"]) -> Dataset["location", "count"]:
    pass

Solution 4 - Python

Check out the answer given here which explains the usage of the package data-science-types.

pip install data-science-types

Demo

# program.py

import pandas as pd

df: pd.DataFrame = pd.DataFrame({'col1': [1,2,3], 'col2': [4,5,6]}) # OK
df1: pd.DataFrame = pd.Series([1,2,3]) # error: Incompatible types in assignment

Run using mypy the same way:

$ mypy program.py

Solution 5 - Python

This is straying from the original question but building off of @dangom's answer using TypeVar and @Georgy's comment that there is no way to specify datatypes for DataFrame columns in type hints, you could use a simple work-around like this to specify datatypes in a DataFrame:

from typing import TypeVar
DataFrameStr = TypeVar("pandas.core.frame.DataFrame(str)")
def csv_to_df(path: str) -> DataFrameStr:
    return pd.read_csv(path, skiprows=1, sep='\t', comment='#')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondangomView Question on Stackoverflow
Solution 1 - PythonGeorgyView Answer on Stackoverflow
Solution 2 - PythondangomView Answer on Stackoverflow
Solution 3 - PythonluksfarrisView Answer on Stackoverflow
Solution 4 - Pythonkevin_theinfinityfundView Answer on Stackoverflow
Solution 5 - PythonKeithView Answer on Stackoverflow