What's the need of array with zero elements?

CStructureFlexible Array-Member

C Problem Overview


In the Linux kernel code I found the following thing which I can not understand.

 struct bts_action {
         u16 type;
         u16 size;
         u8 data[0];
 } __attribute__ ((packed));

The code is here: http://lxr.free-electrons.com/source/include/linux/ti_wilink_st.h

What's the need and purpose of an array of data with zero elements?

C Solutions


Solution 1 - C

This is a way to have variable sizes of data, without having to call malloc (kmalloc in this case) twice. You would use it like this:

struct bts_action *var = kmalloc(sizeof(*var) + extra, GFP_KERNEL);

This used to be not standard and was considered a hack (as Aniket said), but it was standardized in C99. The standard format for it now is:

struct bts_action {
     u16 type;
     u16 size;
     u8 data[];
} __attribute__ ((packed)); /* Note: the __attribute__ is irrelevant here */

Note that you don't mention any size for the data field. Note also that this special variable can only come at the end of the struct.


In C99, this matter is explained in 6.7.2.1.16 (emphasis mine):

> As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Or in other words, if you have:

struct something
{
    /* other variables */
    char data[];
}

struct something *var = malloc(sizeof(*var) + extra);

You can access var->data with indices in [0, extra). Note that sizeof(struct something) will only give the size accounting for the other variables, i.e. gives data a size of 0.


It may be interesting also to note how the standard actually gives examples of mallocing such a construct (6.7.2.1.17):

struct s { int n; double d[]; };

int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

Another interesting note by the standard in the same location is (emphasis mine):

> assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:

> struct { int n; double d[m]; } *p;

> (there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

Solution 2 - C

This is a hack actually, for GCC (C90) in fact.

It's also called a struct hack.

So the next time, I would say:

struct bts_action *bts = malloc(sizeof(struct bts_action) + sizeof(char)*100);

It will be equivalent to saying:

struct bts_action{
    u16 type;
    u16 size;
    u8 data[100];
};

And I can create any number of such struct objects.

Solution 3 - C

The idea is to allow for a variable-sized array at the end of the struct. Presumably, bts_action is some data packet with a fixed-size header (the type and size fields), and variable-size data member. By declaring it as a 0-length array, it can be indexed just as any other array. You'd then allocate a bts_action struct, of say 1024-byte data size, like so:

size_t size = 1024;
struct bts_action* action = (struct bts_action*)malloc(sizeof(struct bts_action) + size);

See also: http://c2.com/cgi/wiki?StructHack

Solution 4 - C

The code is not valid C (see this). The Linux kernel is, for obvious reasons, not in the slightest concerned with portability, so it uses plenty of non-standard code.

What they are doing is a GCC non-standard extention with array size 0. A standard compliant program would have written u8 data[]; and it would have meant the very same thing. The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself.

In older C standards, ending a struct with an empty array was known as "the struct hack". Others have already explained its purpose in other answers. The struct hack, in the C90 standard, was undefined behavior and could cause crashes, mainly since a C compiler is free to add any number of padding bytes at the end of the struct. Such padding bytes may collide with the data you tried to "hack" in at the end of the struct.

GCC early on made a non-standard extension to change this from undefined to well-defined behavior. The C99 standard then adapted this concept and any modern C program can therefore use this feature without risk. It is known as flexible array member in C99/C11.

Solution 5 - C

Another usage of zero length array is as a named label inside a struct to assist compile time struct offset check.

Suppose you have some large struct definitions (spans multiple cache lines) that you want to make sure they are aligned to cache line boundary both in the beginning and in the middle where it crosses the boundary.

struct example_large_s
{
    u32 first; // align to CL
    u32 data;
    ....
    u64 *second;  // align to second CL after the first one
    ....
};

In code you can declare them using GCC extensions like:

__attribute__((aligned(CACHE_LINE_BYTES)))

But you still want to make sure this is enforced in runtime.

ASSERT (offsetof (example_large_s, first) == 0);
ASSERT (offsetof (example_large_s, second) == CACHE_LINE_BYTES);

This would work for a single struct, but it would be hard to cover many structs, each has different member name to be aligned. You would most likely get code like below where you have to find names of the first member of each struct:

assert (offsetof (one_struct,     <name_of_first_member>) == 0);
assert (offsetof (one_struct,     <name_of_second_member>) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, <name_of_first_member>) == 0);
assert (offsetof (another_struct, <name_of_second_member>) == CACHE_LINE_BYTES);

Instead of going this way, you can declare a zero length array in the struct acting as a named label with a consistent name but does not consume any space.

#define CACHE_LINE_ALIGN_MARK(mark) u8 mark[0] __attribute__((aligned(CACHE_LINE_BYTES)))
struct example_large_s
{
    CACHE_LINE_ALIGN_MARK (cacheline0);
    u32 first; // align to CL
    u32 data;
    ....
    CACHE_LINE_ALIGN_MARK (cacheline1);
    u64 *second;  // align to second CL after the first one
    ....
};

Then the runtime assertion code would be much easier to maintain:

assert (offsetof (one_struct,     cacheline0) == 0);
assert (offsetof (one_struct,     cacheline1) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, cacheline0) == 0);
assert (offsetof (another_struct, cacheline1) == CACHE_LINE_BYTES);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJeegar PatelView Question on Stackoverflow
Solution 1 - CShahbazView Answer on Stackoverflow
Solution 2 - CAniket IngeView Answer on Stackoverflow
Solution 3 - CsheuView Answer on Stackoverflow
Solution 4 - CLundinView Answer on Stackoverflow
Solution 5 - CWei ShenView Answer on Stackoverflow