PyTorch: How to use DataLoaders for custom Datasets

PythonTorchPytorch

Python Problem Overview


How to make use of the torch.utils.data.Dataset and torch.utils.data.DataLoader on your own data (not just the torchvision.datasets)?

Is there a way to use the inbuilt DataLoaders which they use on TorchVisionDatasets to be used on any dataset?

Python Solutions


Solution 1 - Python

Yes, that is possible. Just create the objects by yourself, e.g.

import torch.utils.data as data_utils

train = data_utils.TensorDataset(features, targets)
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

where features and targets are tensors. features has to be 2-D, i.e. a matrix where each line represents one training sample, and targets may be 1-D or 2-D, depending on whether you are trying to predict a scalar or a vector.

Hope that helps!


EDIT: response to @sarthak's question

Basically yes. If you create an object of type TensorData, then the constructor investigates whether the first dimensions of the feature tensor (which is actually called data_tensor) and the target tensor (called target_tensor) have the same length:

assert data_tensor.size(0) == target_tensor.size(0)

However, if you want to feed these data into a neural network subsequently, then you need to be careful. While convolution layers work on data like yours, (I think) all of the other types of layers expect the data to be given in matrix form. So, if you run into an issue like this, then an easy solution would be to convert your 4D-dataset (given as some kind of tensor, e.g. FloatTensor) into a matrix by using the method view. For your 5000xnxnx3 dataset, this would look like this:

2d_dataset = 4d_dataset.view(5000, -1)

(The value -1 tells PyTorch to figure out the length of the second dimension automatically.)

Solution 2 - Python

You can easily do this be extending the data.Dataset class. According to the API, all you have to do is implement two function: __getitem__ and __len__.

You can then wrap the dataset with the DataLoader as shown in the API and in @pho7 's answer.

I think the ImageFolder class is a reference. See code here.

Solution 3 - Python

Yes, you can do it. Hope this helps for future readers.

from torch.utils.data import TensorDataset, DataLoader
import torch.utils.data as data_utils

inputs = [[ 1,  2,  3,  4,  5],[ 2,  3,  4,  5,  6]]
targets = [ 6,7]
batch_size = 2

inputs  = torch.tensor(inputs)
targets = torch.IntTensor(targets)
    
dataset =TensorDataset(inputs, targets)
data_loader = DataLoader(dataset, batch_size, shuffle = True)

Solution 4 - Python

In addition to user3693922's answer and the accepted answer, which respectively link the "quick" PyTorch documentation example to create custom dataloaders for custom datasets, and create a custom dataloader in the "simplest" case, there is a much more detailed dedicated official PyTorch tutorial on how to create a custom dataloader with the associated preprocessing: "writing custom datasets, dataloaders and transforms" official PyTorch tutorial

Solution 5 - Python

Yes. Pytorch's DataLoader is designed to take a Dataset object as input, but all it requires is an object with a __getitem__ and __len__ attribute, so any generic container will suffice.

E.g. a list of tuples with your features (x values) as the first element, and targets (y values) as the second element can be passed directly to DataLoader like so:

x = [6,3,8,2,5,9,7]
y = [1,0,1,0,0,1,1]

data = [*zip(x,y)]
dataloader = torch.utils.data.DataLoader(data)

for features, targets in dataloader:
    #...

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSarthakView Question on Stackoverflow
Solution 1 - PythonpahoView Answer on Stackoverflow
Solution 2 - Pythonuser3693922View Answer on Stackoverflow
Solution 3 - PythonKhubaib RazaView Answer on Stackoverflow
Solution 4 - PythonBluponView Answer on Stackoverflow
Solution 5 - PythoniacobView Answer on Stackoverflow