In C, how would I choose whether to return a struct or a pointer to a struct?

C Problem Overview

Working on my C muscle lately and looking through the many libraries I've been working with its certainly gave me a good idea of what is good practice. One thing that I have NOT seen is a function that returns a struct:

something_t make_something() { ... }

From what I've absorbed this is the "right" way of doing this:

something_t *make_something() { ... }
void destroy_something(something_t *object) { ... }

The architecture in code snippet 2 is FAR more popular than snippet 1. So now I ask, why would I ever return a struct directly, as in snippet 1? What differences should I take into account when I'm choosing between the two options?

Furthermore, how does this option compare?

void make_something(something_t *object)

C Solutions

Solution 1 - C

When something_t is small (read: copying it is about as cheap as copying a pointer) and you want it to be stack-allocated by default:

something_t make_something(void);

something_t stack_thing = make_something();

something_t *heap_thing = malloc(sizeof *heap_thing);
*heap_thing = make_something();

When something_t is large or you want it to be heap-allocated:

something_t *make_something(void);

something_t *heap_thing = make_something();

Regardless of the size of something_t, and if you don’t care where it’s allocated:

void make_something(something_t *);

something_t stack_thing;
make_something(&stack_thing);

something_t *heap_thing = malloc(sizeof *heap_thing);
make_something(heap_thing);

Solution 2 - C

This is almost always about ABI stability. Binary stability between versions of the library. In the cases where it is not, it is sometimes about having dynamically sized structs. Rarely it is about extremely large structs or performance.

It is exceedingly rare that allocating a struct on the heap and returning it is nearly as fast as returning it by-value. The struct would have to be huge.

Really, speed is not the reason behind technique 2, return-by-pointer, instead of return-by-value.

Technique 2 exists for ABI stability. If you have a struct and your next version of the library adds another 20 fields to it, consumers of your previous version of the library are binary compatible if they are handed pre-constructed pointers. The extra data beyond the end of the struct they know about is something they don't have to know about.

If you return it on the stack, the caller is allocating the memory for it, and they must agree with you on how big it is. If your library updated since they last rebuilt, you are going to trash the stack.

Technique 2 also permits you to hide extra data both before and after the pointer you return (which versions appending data to the end of the struct is a variant of). You could end the structure with a variable sized array, or prepend the pointer with some extra data, or both.

If you want stack-allocated structs in a stable ABI, almost all functions that talk to the struct need to be passed version information.

something_t make_something(unsigned library_version) { ... }

where library_version is used by the library to determine what version of something_t it is expected to return and it changes how much of the stack it manipulates. This isn't possible using standard C, but

void make_something(something_t* here) { ... }

is. In this case, something_t might have a version field as its first element (or a size field), and you would require that it be populated prior to calling make_something.

Other library code taking a something_t would then query the version field to determine what version of something_t they are working with.

Solution 3 - C

As a rule of thumb, you should never pass struct objects by value. In practice, it will be fine to do so as long as they are smaller or equal to the maximum size that your CPU can handle in a single instruction. But stylistically, one typically avoids it even then. If you never pass structs by value you can later on add members to the struct and it won't affect performance.

I think that void make_something(something_t *object) is the most common way to use structures in C. You leave the allocation to the caller. It is efficient but not pretty.

However, object-oriented C programs use something_t *make_something() since they are built with the concept of opaque type, which forces you to use pointers. Whether the returned pointer points at dynamic memory or something else depends on the implementation. OO with opaque type is often one of the most elegant and best ways to design more complex C programs, but sadly, few C programmers know/care about it.

Solution 4 - C

Some pros of the first approach:

Less code to write.
More idiomatic for the use case of returning multiple values.
Works on systems that don't have dynamic allocation.
Probably faster for small or smallish objects.
No memory leak due to forgetting to free.

Some cons:

If the object is large (say, a megabyte) , may cause stack overflow, or may be slow if compilers don't optimize it well.
May surprise people who learned C in the 1970s when this was not possible, and haven't kept up to date.
Does not work with objects that contain a pointer to a part of themself.

Solution 5 - C

I'm somewhat surprised.

The difference is that example 1 creates a structure on the stack, example 2 creates it on the heap. In C, or C++ code which is effectively C, it's idiomatic and convenient to create most objects on the heap. In C++ it is not, mostly they go on the stack. The reason is that if you create an object on the stack, the destructor is called automatically, if you create it on the heap, it must be called explicitly.So it's a lot easier to ensure there are no memory leaks and to handle exceptions is everything goes on the stack. In C, the destructor must be called explictly anyway, and there's no concept of a special destructor function (you have destructors, of course, but they are just normal functions with names like destroy_myobject()).

Now the exception in C++ is for low-level container objects, e.g. vectors, trees, hash maps and so on. These do retain heap members, and they have destructors. Now most memory-heavy objects consist of a few immediate data members giving sizes, ids, tags and so on, and then the rest of the information in STL structures, maybe a vector of pixel data or a map of English word / value pairs. So most of the data is in fact on the heap, even in C++.

And modern C++ is designed so that this pattern

class big
{
    std::vector<double> observations; // thousands of observations
    int station_x;                    // a bit of data associated with them
    int station_y; 
    std::string station_name; 
}  

big retrieveobservations(int a, int b, int c)
{
    big answer;
    //  lots of code to fill in the structure here

    return answer;
}

void high_level()
{
   big myobservations = retriveobservations(1, 2, 3);
}

Will compile to pretty efficient code. The large observation member won't generate unnecessary makework copies.

Solution 6 - C

Unlike some other languages (like Python), C does not have the concept of a tuple. For example, the following is legal in Python:

def foo():
    return 1,2

x,y = foo()
print x, y

The function foo returns two values as a tuple, which are assigned to x and y.

Since C doesn't have the concept of a tuple, it's inconvenient to return multiple values from a function. One way around this is to define a structure to hold the values, and then return the structure, like this:

typedef struct { int x, y; } stPoint;

stPoint foo( void )
{
    stPoint point = { 1, 2 };
    return point;
}

int main( void )
{
    stPoint point = foo();
    printf( "%d %d\n", point.x, point.y );
}

This is but one example where you might see a function return a structure.

Content Type	Original Author	Original Content on Stackoverflow
Question	Dellowar	View Question on Stackoverflow
Solution 1 - C	Jon Purdy	View Answer on Stackoverflow
Solution 2 - C	Yakk - Adam Nevraumont	View Answer on Stackoverflow
Solution 3 - C	Lundin	View Answer on Stackoverflow
Solution 4 - C	M.M	View Answer on Stackoverflow
Solution 5 - C	Malcolm McLean	View Answer on Stackoverflow
Solution 6 - C	user3386109	View Answer on Stackoverflow