C++ Tuple vs Struct

C++ Problem Overview

Is there is any difference between using a std::tuple and a data-only struct?

typedef std::tuple<int, double, bool> foo_t;

struct bar_t {
    int id;
    double value;
    bool dirty;
}

From what I have found online, I found that there are two major differences: the struct is more readable, while the tuple has many generic functions that can be used. Should there be any significant performance difference? Also, is the data layout compatible with each other (interchangeably casted)?

C++ Solutions

Solution 1 - C++

We have a similar discussion about tuple and struct and I write some simple benchmarks with the help from one of my colleague to identify the differences in term of performance between tuple and struct. We first start with a default struct and a tuple.

struct StructData {
    int X;
    int Y;
    double Cost;
    std::string Label;
   
    bool operator==(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }

    bool operator<(const StructData &rhs) {
        return X < rhs.X || (X == rhs.X && (Y < rhs.Y || (Y == rhs.Y && (Cost < rhs.Cost || (Cost == rhs.Cost && Label < rhs.Label)))));
    }
};

using TupleData = std::tuple<int, int, double, std::string>;

We then use Celero to compare the performance of our simple struct and tuple. Below is the benchmark code and performance results collected using gcc-4.9.2 and clang-4.0.0:

std::vector<StructData> test_struct_data(const size_t N) {
    std::vector<StructData> data(N);
    std::transform(data.begin(), data.end(), data.begin(), [N](auto item) {
        std::random_device rd;
        std::mt19937 gen(rd());
        std::uniform_int_distribution<> dis(0, N);
        item.X = dis(gen);
        item.Y = dis(gen);
        item.Cost = item.X * item.Y;
        item.Label = std::to_string(item.Cost);
        return item;
    });
    return data;
}

std::vector<TupleData> test_tuple_data(const std::vector<StructData> &input) {
    std::vector<TupleData> data(input.size());
    std::transform(input.cbegin(), input.cend(), data.begin(),
                   [](auto item) { return std::tie(item.X, item.Y, item.Cost, item.Label); });
    return data;
}

constexpr int NumberOfSamples = 10;
constexpr int NumberOfIterations = 5;
constexpr size_t N = 1000000;
auto const sdata = test_struct_data(N);
auto const tdata = test_tuple_data(sdata);

CELERO_MAIN

BASELINE(Sort, struct, NumberOfSamples, NumberOfIterations) {
    std::vector<StructData> data(sdata.begin(), sdata.end());
    std::sort(data.begin(), data.end());
    // print(data);

}

BENCHMARK(Sort, tuple, NumberOfSamples, NumberOfIterations) {
    std::vector<TupleData> data(tdata.begin(), tdata.end());
    std::sort(data.begin(), data.end());
    // print(data);
}

Performance results collected with clang-4.0.0

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    196663.40000 |            5.08 | 
Sort            | tuple           | Null            |              10 |               5 |         0.92471 |    181857.20000 |            5.50 | 
Complete.

And performance results collected using gcc-4.9.2

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    219096.00000 |            4.56 | 
Sort            | tuple           | Null            |              10 |               5 |         0.91463 |    200391.80000 |            4.99 | 
Complete.

From the above results we can clearly see that

Tuple is faster than a default struct
Binary produce by clang has higher performance that that of gcc. clang-vs-gcc is not the purpose of this discussion so I won't dive into the detail.

We all know that writing a == or < or > operator for every single struct definition will be a painful and buggy task. Let replace our custom comparator using std::tie and rerun our benchmark.

bool operator<(const StructData &rhs) {
    return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
}

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    200508.20000 |            4.99 | 
Sort            | tuple           | Null            |              10 |               5 |         0.90033 |    180523.80000 |            5.54 | 
Complete.

Now we can see that using std::tie makes our code more elegant and it is harder to make mistake, however, we will loose about 1% performance. I will stay with the std::tie solution for now since I also receive a warning about comparing floating point numbers with the customized comparator.

Until now we have not has any solution to make our struct code run faster yet. Let take a look at the swap function and rewrite it to see if we can gain any performance:

struct StructData {
    int X;
    int Y;
    double Cost;
    std::string Label;
   
    bool operator==(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) == std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }

    void swap(StructData & other)
    {
        std::swap(X, other.X);
        std::swap(Y, other.Y);
        std::swap(Cost, other.Cost);
        std::swap(Label, other.Label);
    }  

    bool operator<(const StructData &rhs) {
        return std::tie(X,Y,Cost, Label) < std::tie(rhs.X, rhs.Y, rhs.Cost, rhs.Label);
    }
};

Performance results collected using clang-4.0.0

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    176308.80000 |            5.67 | 
Sort            | tuple           | Null            |              10 |               5 |         1.02699 |    181067.60000 |            5.52 | 
Complete.

And the performance results collected using gcc-4.9.2

Celero
Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
Sort            | struct          | Null            |              10 |               5 |         1.00000 |    198844.80000 |            5.03 | 
Sort            | tuple           | Null            |              10 |               5 |         1.00601 |    200039.80000 |            5.00 | 
Complete.

Now our struct is slightly faster than that of a tuple now (around 3% with clang and less than 1% with gcc), however, we do need to write our customized swap function for all of our structs.

Solution 2 - C++

If you're using several different tuples in your code you can get away with condensing the number of functors you are using. I say this because I've often used the following forms of functors:

template<int N>
struct tuple_less{
    template<typename Tuple>
    bool operator()(const Tuple& aLeft, const Tuple& aRight) const{
        typedef typename boost::tuples::element<N, Tuple>::type value_type;
        BOOST_CONCEPT_REQUIRES((boost::LessThanComparable<value_type>));

        return boost::tuples::get<N>(aLeft) < boost::tuples::get<N>(aRight);
    }
};

This might seem like overkill but for each place within the struct I'd have to make a whole new functor object using a struct but for a tuple, I just change N. Better than that, I can do this for every single tuple as opposed to creating a whole new functor for each struct and for each member variable. If I have N structs with M member variables that NxM functors I would need to create (worse case scenario) that can be condensed down to one little bit of code.

Naturally, if you're going to go with the Tuple way, you're also going to need to create Enums for working with them:

typedef boost::tuples::tuple<double,double,double> JackPot;
enum JackPotIndex{
    MAX_POT,
    CURRENT_POT,
    MIN_POT
};

and boom, you're code is completely readable:

double guessWhatThisIs = boost::tuples::get<CURRENT_POT>(someJackPotTuple);

because it describes itself when you want to get the items contained within it.

Solution 3 - C++

Tuple has built in default(for == and != it compares every element, for <.<=... compares first, if same compares second...) comparators: http://en.cppreference.com/w/cpp/utility/tuple/operator_cmp

edit: as noted in the comment C++20 spaceship operator gives you a way to specify this functionality with one(ugly, but still just one) line of code.

Solution 4 - C++

Well, here's a benchmark that doesn't construct a bunch of tuples inside the struct operator==(). Turns out there's a pretty significant performance impact from using tuple, as one would expect given that there's no performance impact at all from using PODs. (The address resolver finds the value in the instruction pipeline before the logic unit ever even sees it.)

Common results from running this on my machine with VS2015CE using the default 'Release' settings:

Structs took 0.0814905 seconds.
Tuples took 0.282463 seconds.

Please monkey with it until you're satisfied.

#include <iostream>
#include <string>
#include <tuple>
#include <vector>
#include <random>
#include <chrono>
#include <algorithm>

class Timer {
public:
  Timer() { reset(); }
  void reset() { start = now(); }

  double getElapsedSeconds() {
    std::chrono::duration<double> seconds = now() - start;
    return seconds.count();
  }

private:
  static std::chrono::time_point<std::chrono::high_resolution_clock> now() {
    return std::chrono::high_resolution_clock::now();
  }

  std::chrono::time_point<std::chrono::high_resolution_clock> start;

};

struct ST {
  int X;
  int Y;
  double Cost;
  std::string Label;

  bool operator==(const ST &rhs) {
    return
      (X == rhs.X) &&
      (Y == rhs.Y) &&
      (Cost == rhs.Cost) &&
      (Label == rhs.Label);
  }

  bool operator<(const ST &rhs) {
    if(X > rhs.X) { return false; }
    if(Y > rhs.Y) { return false; }
    if(Cost > rhs.Cost) { return false; }
    if(Label >= rhs.Label) { return false; }
    return true;
  }
};

using TP = std::tuple<int, int, double, std::string>;

std::pair<std::vector<ST>, std::vector<TP>> generate() {
  std::mt19937 mt(std::random_device{}());
  std::uniform_int_distribution<int> dist;

  constexpr size_t SZ = 1000000;

  std::pair<std::vector<ST>, std::vector<TP>> p;
  auto& s = p.first;
  auto& d = p.second;
  s.reserve(SZ);
  d.reserve(SZ);

  for(size_t i = 0; i < SZ; i++) {
    s.emplace_back();
    auto& sb = s.back();
    sb.X = dist(mt);
    sb.Y = dist(mt);
    sb.Cost = sb.X * sb.Y;
    sb.Label = std::to_string(sb.Cost);

    d.emplace_back(std::tie(sb.X, sb.Y, sb.Cost, sb.Label));
  }

  return p;
}

int main() {
  Timer timer;

  auto p = generate();
  auto& structs = p.first;
  auto& tuples = p.second;

  timer.reset();
  std::sort(structs.begin(), structs.end());
  double stSecs = timer.getElapsedSeconds();

  timer.reset();
  std::sort(tuples.begin(), tuples.end());
  double tpSecs = timer.getElapsedSeconds();

  std::cout << "Structs took " << stSecs << " seconds.\nTuples took " << tpSecs << " seconds.\n";

  std::cin.get();
}

Solution 5 - C++

> Also, is the data layout compatible with each other (interchangeably casted)?

Oddly I can't see a direct response to this part of the question.

The answer is: no. Or at least not reliably, as the layout of the tuple is unspecified.

Firstly, your struct is a Standard Layout Type. The ordering, padding and alignment of the members are well-defined by a combination of the standard and your platform ABI.

If a tuple was a standard layout type, and we knew the fields were laid out in the order the types are specified, we might have some confidence it would match the struct.

The tuple is normally implemented using inheritance, in one of two ways: the old Loki/Modern C++ Design recursive style, or the newer variadic style. Neither is a Standard Layout type, because both violate the following conditions:

(prior to C++14)

> - has no base classes with non-static data members, or

> - has no non-static data members in the most derived class and at most one base class with non-static data members
(for C++14 and later)

> - Has all non-static data members and bit-fields declared in the same class (either all in the derived or all in some base)

since each leaf base class contains a single tuple element (NB. a single-element tuple probably is a standard layout type, albeit not a very useful one). So, we know the standard does not guarantee the tuple has the same padding or alignment as the struct.

Additionally, it's worth noting that the older recursive-style tuple will generally lay out the data members in reverse order.

Anecdotally, it has sometimes worked in practice for some compilers and combinations of field types in the past (in one case, using recursive tuples, after reversing the field order). It definitely doesn't work reliably (across compilers, versions etc.) now, and was never guaranteed in the first place.

Solution 6 - C++

Well, a POD struct can often be (ab)used in low-level contiguous chunk reading and serializing. A tuple might be more optimized in certain situations and support more functions, as you said.

Use whatever is more appropriate for the situation, there ain't no general preference. I think (but I haven't benchmarked it) that performance differences won't be significant. The data layout is most likely not compatible and implementation specific.

Solution 7 - C++

As far as the "generic function" go, Boost.Fusion deserves some love... and especially BOOST_FUSION_ADAPT_STRUCT.

Ripping from the page: ABRACADBRA

namespace demo
{
    struct employee
    {
        std::string name;
        int age;
    };
}

// demo::employee is now a Fusion sequence
BOOST_FUSION_ADAPT_STRUCT(
    demo::employee
    (std::string, name)
    (int, age))

This means that all Fusion algorithms are now applicable to the struct demo::employee.

EDIT: Regarding the performance difference or layout compatibility, tuple's layout is implementation defined so not compatible (and thus you should not cast between either representation) and in general I would expect no difference performance-wise (at least in Release) thanks to the inlining of get<N>.

Solution 8 - C++

Don't worry about speed or layout, that's nano-optimisation, and depends on the compiler, and there's never enough difference to influence your decision.

You use a struct for things that meaningfully belong together to form a whole.

You use a tuple for things that are together coincidentally. You can use a tuple spontaneously in your code.

Solution 9 - C++

Judging by other answers, performance considerations are minimal at best.

So it really should come down to practicality, readability and maintainability. And struct is generally better because it creates types which are easier to read and to understand.

Sometimes, a std::tuple (or even std::pair) might be necessary to deal with code in a highly generic way. For example, some operations related to variadic parameter packs would be impossible without something like std::tuple. std::tie is a great example of when std::tuple can improve code (prior to C++20).

But anywhere you can use a struct, you probably should use a struct. It will bestow semantic meaning on the elements of your type. That's invaluable in understanding and using the type. In turn, this can help avoid silly mistakes:

// hard to get wrong; easy to understand
cat.arms = 0;
cat.legs = 4;

// easy to get wrong; hard to understand
std::get<0>(cat) = 0;
std::get<1>(cat) = 4;

Solution 10 - C++

My experience is that over time functionality starts to creep up on types (like POD structs) which used to be pure data holders. Things like certain modifications which shouldn't require inside knowledge of the data, maintaining invariants etc.

That is a good thing; it's the foundation of object orientation. It is the reason why C with classes was invented. Using pure data collections like tuples are not open to such logical extension; structs are. That's why I would almost always opt for structs.

Related is that like all "open data objects", tuples violate the information hiding paradigm. You cannot change that later without throwing out the tuple wholesale. With a struct, you can move gradually towards access functions.

Another issue is type safety and self-documenting code. If your function receives an object of type inbound_telegram or location_3D it's clear; if it receives a unsigned char * or tuple<double, double, double> it is not: the telegram could be outbound, and the tuple could be a translation instead of a location, or perhaps the minimum temperature readings from the long weekend. Yes, you can typedef to make intentions clear but that does not actually prevent you from passing temperatures.

These issues tend to become important in projects which exceed a certain size; the disadvantages of tuples and advantages of elaborate classes become not visible and indeed are an overhead in small projects. Starting with proper classes even for inconspicuous little data aggregates pays late dividends.

Of course one viable strategy would be to use a pure data holder as the underlying data provider for a class wrapper which provides operations on that data.

Solution 11 - C++

There shouldn't be a performance difference (even an insignificant one). At least in the normal case, they will result in the same memory layout. Nonetheless, casting between them probably isn't required to work (though I'd guess there's a pretty fair chance it normally will).

Solution 12 - C++

I know it is an old theme, however I am now about to make a decision about part of my project: should I go the tuple-way or struct-way. After reading this thread I have some ideas.

About the wheaties and the performance test: please note that you can usually use memcpy, memset and similar tricks for structs. This would make the performance MUCH better than for tuples.
I see some advantages in tuples:

You can use tuples to return a collection of variables from function or method and decrease a number of types you use.
Based on the fact that tuple has predefined <,==,> operators you can also use tuple as a key in map or hash_map which is much more cost effective that struct where you need to implement these operators.

I have searched the web and eventually reached this page: https://arne-mertz.de/2017/03/smelly-pair-tuple/

Generally I agree with a final conclusion from above.

Solution 13 - C++

There is no burden of compatible C memory layout, etc., which is more conducive to optimization.

Content Type	Original Author	Original Content on Stackoverflow
Question	Alex Koay	View Question on Stackoverflow
Solution 1 - C++	hungptit	View Answer on Stackoverflow
Solution 2 - C++	wheaties	View Answer on Stackoverflow
Solution 3 - C++	NoSenseEtAl	View Answer on Stackoverflow
Solution 4 - C++	Khatharr	View Answer on Stackoverflow
Solution 5 - C++	Useless	View Answer on Stackoverflow
Solution 6 - C++	orlp	View Answer on Stackoverflow
Solution 7 - C++	Matthieu M.	View Answer on Stackoverflow
Solution 8 - C++	gnasher729	View Answer on Stackoverflow
Solution 9 - C++	John McFarlane	View Answer on Stackoverflow
Solution 10 - C++	Peter - Reinstate Monica	View Answer on Stackoverflow
Solution 11 - C++	Jerry Coffin	View Answer on Stackoverflow
Solution 12 - C++	Tom K	View Answer on Stackoverflow
Solution 13 - C++	sunxing	View Answer on Stackoverflow