C API design: Who should allocate?

C Problem Overview

What is the proper/preferred way to allocate memory in a C API?

I can see, at first, two options:

Let the caller do all the (outer) memory handling:

myStruct *s = malloc(sizeof(s)); myStruct_init(s);

myStruct_foo(s);

myStruct_destroy(s); free(s);

The _init and _destroy functions are necessary since some more memory may be allocated inside, and it must be handled somewhere.

This has the disadvantage of being longer, but also the malloc can be eliminated in some cases (e.g., it can be passed a stack-allocated struct:

int bar() {
    myStruct s;
    myStruct_init(&s);

    myStruct_foo(&s);

    myStruct_destroy(&s);
}

Also, it's necessary for the caller to know the size of the struct.

Hide mallocs in _init and frees in _destroy.

Advantages: shorter code, since the functions are going to be called anyway. Completely opaque structures.

Disadvantages: Can't be passed a struct allocated in a different way.

myStruct *s = myStruct_init();

myStruct_foo(s);

myStruct_destroy(foo);

I'm currently leaning for the first case; then again, I don't know about C API design.

C Solutions

Solution 1 - C

Another disadvantage of #2 is that the caller doesn't have control over how things are allocated. This can be worked around by providing an API for the client to register his own allocation/deallocation functions (like SDL does), but even that may not be sufficiently fine-grained.

The disadvantage of #1 is that it doesn't work well when output buffers are not fixed-size (e.g. strings). At best, you will then need to provide another function to obtain the length of the buffer first so that the caller can allocate it. At worst, it is simply impossible to do so efficiently (i.e. computing length on a separate path is overly expensive over computing-and-copying in one go).

The advantage of #2 is that it allows you to expose your datatype strictly as an opaque pointer (i.e. declare the struct but don't define it, and use pointers consistently). Then you can change the definition of the struct as you see fit in future versions of your library, while clients remain compatible on binary level. With #1, you have to do it by requiring the client to specify the version inside the struct in some way (e.g. all those cbSize fields in Win32 API), and then manually write code that can handle both older and newer versions of the struct to remain binary-compatible as your library evolves.

In general, if your structs are transparent data which will not change with future minor revision of the library, I'd go with #1. If it is a more or less complicated data object and you want full encapsulation to fool-proof it for future development, go with #2.

Solution 2 - C

Method number 2 every time.

Why? because with method number 1 you have to leak implementation details to the caller. The caller has to know at least how big the struct is. You can't change the internal implementation of the object without recompiling any code that uses it.

Solution 3 - C

Why not provide both, to get the best of both worlds?

Use _init and _terminate functions to use method #1 (or whatever naming you see fit).

Use additional _create and _destroy functions for the dynamic allocation. Since _init and _terminate already exist, it effectively boils down to:

myStruct *myStruct_create ()
{
    myStruct *s = malloc(sizeof(*s));
    if (s) 
    {
        myStruct_init(s);
    }
    return (s);
}

void myStruct_destroy (myStruct *s)
{
    myStruct_terminate(s);
    free(s);
}

If you want it to be opaque, then make _init and _terminate static and do not expose them in the API, only provide _create and _destroy. If you need other allocations, e.g. with a given callback, provide another set of functions for this, e.g. _createcalled, _destroycalled.

The important thing is to keep track of the allocations, but you have to do this anyway. You must always use the counterpart of the used allocator for deallocation.

Solution 4 - C

My favourite example of a well-design C API is GTK+ which uses method #2 that you describe.

Although another advantage of your method #1 is not just that you could allocate the object on the stack, but also that you could reuse the same instance multiple times. If that's not going to be a common use case, then the simplicity of #2 is probably an advantage.

Of course, that's just my opinion :)

Solution 5 - C

Both are functionally equivalent. But, in my opinion, method #2 is easier to use. A few reasons for prefering 2 over 1 are:

It is more intuitive. Why should I have to call free on the object after I have (apparently) destroyed it using myStruct_Destroy.
Hides details of myStruct from user. He does not have to worry about it's size, etc.
In method #2, myStruct_init does not have to worry about the initial state of the object.
You don't have to worry about memory leaks from user forgetting to call free.

If your API implementation is being shipped as a separate shared library however, method #2 is a must. To isolate your module from any mismatch in implementations of malloc/new and free/delete across compiler versions you should keep memory allocation and de-allocation to yourself. Note, this is more true of C++ than of C.

Solution 6 - C

The problem I have with the first method is not so much that it is longer for the caller, it's that the api now is handcuffed on being able to expand the amount of memory it is using precisely because it doesn't know how the memory it received was alloced. The caller doesn't always know ahead of time how much memory it will need (imagine if you were trying to implement a vector).

Another option you didn't mention, which is going to be overkill most of the time, is to pass in a function pointer that the api uses as an allocator. This doesn't allow you to use the stack, but does allow you to do something like replace the use of malloc with a memory pool, which still keeping the api in control of when it wants to allocate.

As for which method is proper api design, it's done both ways in the C standard library. strdup() and stdio uses the second method while sprintf and strcat use the first method. Personally I prefer the second method (or third) unless 1) I know I will never need to realloc and 2) I expect the lifetime of my objects to be short and thus using the stack is very convienent

edit: There is actually 1 other option, and it is a bad one with a prominent precedent. You could do it the way strtok() does it with statics. Not good, just mentioned for completeness sake.

Solution 7 - C

Both ways are ok, I tend to do the first way as a lot of the C I do is for embedded systems and all the memory is either tiny variables on the stack or statically allocated. This way there can be no running out of memory, either you have enough at the beginning or you're screwed from the start. Good to know when you have 2K of Ram :-) So all my libraries are like #1 where the memory is assumed to be allocated.

But this is an edge case of C development.

Having said that, I'd probablly go with #1 still. Perhaps using init and finalize/dispose (rather than destroy) for names.

Solution 8 - C

That could give some element of reflexion:

case #1 mimick the memory allocation scheme of C++, with more or less the same benefits :

easy allocation of temporaries on stack (or in static arrays or such to write you own struct allocator replacing malloc).
easy free of memory if anything goes wrong in init

case #2 hides more informations on used structure and can also be used for opaque structures, typically when structure as seen by user is not exactly the same as internally used by the lib (say there could be some more fields hidden at the end of structure).

Mixed API between case#1 and case #2 is also common : there is a field used to pass in a pointer to some already initialized structure, if it is null it is allocated (and pointer is always returned). With such API the free is usually responsibility of caller even if init performed allocation.

In most cases I would probably go for case #1.

Solution 9 - C

Both are acceptable - there's tradeoffs between them, as you've noted.

There's large real world examples of both - as Dean Harding says, GTK+ uses the second method; OpenSSL is an example that uses the first.

Solution 10 - C

I would go for (1) with one simple extension, that is to have your _init function always return the pointer to the object. Your pointer initialization then may just read:

myStruct *s = myStruct_init(malloc(sizeof(myStruct)));

As you can see the right hand side then only has a reference to the type and not to the variable anymore. A simple macro then gives you (2) at least partially

#define NEW(T) (T ## _init(malloc(sizeof(T))))

and your pointer initialization reads

myStruct *s = NEW(myStruct);

Solution 11 - C

See your method #2 says

myStruct *s = myStruct_init();

myStruct_foo(s);

myStruct_destroy(s);

Now see if myStruct_init() needs return some error code for various reason then lets go this way.

myStruct *s;
int ret = myStruct_init(&s);  // int myStruct_init(myStruct **s);

myStruct_foo(s);

myStruct_destroy(s);

Content Type	Original Author	Original Content on Stackoverflow
Question	Tordek	View Question on Stackoverflow
Solution 1 - C	Pavel Minaev	View Answer on Stackoverflow
Solution 2 - C	JeremyP	View Answer on Stackoverflow
Solution 3 - C	Secure	View Answer on Stackoverflow
Solution 4 - C	Dean Harding	View Answer on Stackoverflow
Solution 5 - C	341008	View Answer on Stackoverflow
Solution 6 - C	frankc	View Answer on Stackoverflow
Solution 7 - C	Keith Nicholas	View Answer on Stackoverflow
Solution 8 - C	kriss	View Answer on Stackoverflow
Solution 9 - C	caf	View Answer on Stackoverflow
Solution 10 - C	Jens Gustedt	View Answer on Stackoverflow
Solution 11 - C	Jeegar Patel	View Answer on Stackoverflow