Convert Pandas dataframe to PyTorch tensor?
PythonPandasDataframePytorchPython Problem Overview
I want to train a simple neural network with PyTorch on a pandas dataframe df
.
One of the columns is named "Target"
, and it is the target variable of the network. How can I use this dataframe as input to the PyTorch network?
I tried this, but it doesn't work:
import pandas as pd
import torch.utils.data as data_utils
target = pd.DataFrame(df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
Python Solutions
Solution 1 - Python
I'm referring to the question in the title as you haven't really specified anything else in the text, so just converting the DataFrame into a PyTorch tensor.
Without information about your data, I'm just taking float values as example targets here.
Convert Pandas dataframe to PyTorch tensor?
import pandas as pd
import torch
import random
# creating dummy targets (float values)
targets_data = [random.random() for i in range(10)]
# creating DataFrame from targets_data
targets_df = pd.DataFrame(data=targets_data)
targets_df.columns = ['targets']
# creating tensor from targets_df
torch_tensor = torch.tensor(targets_df['targets'].values)
# printing out result
print(torch_tensor)
Output:
tensor([ 0.5827, 0.5881, 0.1543, 0.6815, 0.9400, 0.8683, 0.4289,
0.5940, 0.6438, 0.7514], dtype=torch.float64)
Tested with Pytorch 0.4.0.
I hope this helps, if you have any further questions - just ask. :)
Solution 2 - Python
Maybe try this to see if it can fix your problem(based on your sample code)?
train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32))
train_tensor = data_utils.TensorDataset(train, train_target)
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)
Solution 3 - Python
You can use below functions to convert any dataframe or pandas series to a pytorch tensor
import pandas as pd
import torch
# determine the supported device
def get_device():
if torch.cuda.is_available():
device = torch.device('cuda:0')
else:
device = torch.device('cpu') # don't have GPU
return device
# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
device = get_device()
return torch.from_numpy(df.values).float().to(device)
df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)
Solution 4 - Python
You can pass the df.values
attribute (a numpy array) to the Dataset constructor directly:
import torch.utils.data as data_utils
# Creating np arrays
target = df['Target'].values
features = df.drop('Target', axis=1).values
# Passing to DataLoader
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
Note: Your features (df
) also contains the target variable (df['Target']
) i.e. your network is 'cheating', since it can see the targets in the input. You need to remove this column from the set of features.
Solution 5 - Python
Simply convert the pandas dataframe -> numpy array -> pytorch tensor
. An example of this is described below:
import pandas as pd
import numpy as np
import torch
df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)
Hopefully, this will help you to create your own datasets using pytorch (Compatible with the latest version of pytorch).
Solution 6 - Python
#This works for me
target = torch.tensor(df['Targets'].values)
features = torch.tensor(df.drop('Targets', axis = 1).values)
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10, shuffle=True)
Solution 7 - Python
To convert dataframe to pytorch tensor: [you can use this to tackle any df to convert it into pytorch tensor]
steps:
- convert df to numpy using df.to_numpy() or df.to_numpy().astype(np.float32) to change the datatype of each numpy array to float32
- convert the numpy to tensor using torch.from_numpy(df) method
example:
tensor_ = torch.from_numpy(df.to_numpy().astype(np.float32))