In C, are arrays pointers or used as pointers?

CArraysPointers

C Problem Overview


My understanding was that arrays were simply constant pointers to a sequence of values, and when you declared an array in C, you were declaring a pointer and allocating space for the sequence it points to.

But this confuses me: the following code:

char y[20];
char *z = y;

printf("y size is %lu\n", sizeof(y));
printf("y is %p\n", y);
printf("z size is %lu\n", sizeof(z));
printf("z is %p\n", z);

when compiled with Apple GCC gives the following result:

y size is 20
y is 0x7fff5fbff930
z size is 8
z is 0x7fff5fbff930

(my machine is 64 bit, pointers are 8 bytes long).

If 'y' is a constant pointer, why does it have a size of 20, like the sequence of values it points to? Is the variable name 'y' replaced by a memory address during compilation time whenever it is appropiate? Are arrays, then, some sort of syntactic sugar in C that is just translated to pointer stuff when compiled?

C Solutions


Solution 1 - C

Here's the exact language from the C standard (n1256):

6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

The important thing to remember here is that there is a difference between an object (in C terms, meaning something that takes up memory) and the expression used to refer to that object.

When you declare an array such as

int a[10];

the object designated by the expression a is an array (i.e., a contiguous block of memory large enough to hold 10 int values), and the type of the expression a is "10-element array of int", or int [10]. If the expression a appears in a context other than as the operand of the sizeof or & operators, then its type is implicitly converted to int *, and its value is the address of the first element.

In the case of the sizeof operator, if the operand is an expression of type T [N], then the result is the number of bytes in the array object, not in a pointer to that object: N * sizeof T.

In the case of the & operator, the value is the address of the array, which is the same as the address of the first element of the array, but the type of the expression is different: given the declaration T a[N];, the type of the expression &a is T (*)[N], or pointer to N-element array of T. The value is the same as a or &a[0] (the address of the array is the same as the address of the first element in the array), but the difference in types matters. For example, given the code

int a[10];
int *p = a;
int (*ap)[10] = &a;

printf("p = %p, ap = %p\n", (void *) p, (void *) ap);
p++;
ap++;
printf("p = %p, ap = %p\n", (void *) p, (void *) ap);

you'll see output on the order of

p = 0xbff11e58, ap = 0xbff11e58
p = 0xbff11e5c, ap = 0xbff11e80

IOW, advancing p adds sizeof int (4) to the original value, whereas advancing ap adds 10 * sizeof int (40).

More standard language:

6.5.2.1 Array subscripting

Constraints

1 One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

Semantics

2 A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

Thus, when you subscript an array expression, what happens under the hood is that the offset from the address of the first element in the array is computed and the result is dereferenced. The expression

a[i] = 10;

is equivalent to

*((a)+(i)) = 10;

which is equivalent to

*((i)+(a)) = 10;

which is equivalent to

 i[a] = 10;

Yes, array subscripting in C is commutative; for the love of God, never do this in production code.

Since array subscripting is defined in terms of pointer operations, you can apply the subscript operator to expressions of pointer type as well as array type:

int *p = malloc(sizeof *p * 10);
int i;
for (i = 0; i < 10; i++)
  p[i] = some_initial_value(); 

Here's a handy table to remember some of these concepts:

Declaration: T a[N];

Expression Type Converts to Value


     a    T [N]   T *             Address of the first element in a;
                                    identical to writing &a[0]
    &a    T (*)[N]                Address of the array; value is the same
                                    as above, but the type is different

sizeof a size_t Number of bytes contained in the array object (N * sizeof T) *a T Value at a[0] a[i] T Value at a[i] &a[i] T * Address of a[i]

Declaration: T a[N][M];

Expression Type Converts to Value


      a    T [N][M]    T (*)[M]        Address of the first subarray (&a[0])
     &a    T (*)[N][M]                 Address of the array (same value as
                                         above, but different type)

sizeof a size_t Number of bytes contained in the array object (N * M * sizeof T) a T [M] T * Value of a[0], which is the address of the first element of the first subarray (same as &a[0][0]) a[i] T [M] T * Value of a[i], which is the address of the first element of the i'th subarray &a[i] T ()[M] Address of the i-th subarray; same value as above, but different type sizeof a[i] size_t Number of bytes contained in the i'th subarray object (M * sizeof T) *a[i] T Value of the first element of the i'th subarray (a[i][0]) a[i][j] T Value at a[i][j] &a[i][j] T * Address of a[i][j]

Declaration: T a[N][M][O];

Expression Type Converts to


     a        T [N][M][O]      T (*)[M][O]
    &a        T (*)[N][M][O]
    *a        T [M][O]         T (*)[O]
  a[i]        T [M][O]         T (*)[O]
 &a[i]        T (*)[M][O]
 *a[i]        T [O]            T *

a[i][j] T [O] T * &a[i][j] T (*)[O] *a[i][j] T a[i][j][k] T

From here, the pattern for higher-dimensional arrays should be clear.

So, in summary: arrays are not pointers. In most contexts, array expressions are converted to pointer types.

Solution 2 - C

Arrays are not pointers, though in most expressions an array name evaluates to a pointer to the first element of the array. So it is very, very easy to use an array name as a pointer. You will often see the term 'decay' used to describe this, as in "the array decayed to a pointer".

One exception is as the operand to the sizeof operator, where the result is the size of the array (in bytes, not elements).

A couple additional of issues related to this:

An array parameter to a function is a fiction - the compiler really passes a plain pointer (this doesn't apply to reference-to-array parameters in C++), so you cannot determine the actual size of an array passed to a function - you must pass that information some other way (maybe using an explicit additional parameter, or using a sentinel element - like C strings do)

Also, a common idiom to get the number of elements in an array is to use a macro like:

#define ARRAY_SIZE(arr) ((sizeof(arr))/sizeof(arr[0]))

This has the problem of accepting either an array name, where it will work, or a pointer, where it will give a nonsense result without warning from the compiler. There exist safer versions of the macro (particularly for C++) that will generate a warning or error when it's used with a pointer instead of an array. See the following SO items:


Note: C99 VLAs (variable length arrays) might not follow all of these rules (in particular, they can be passed as parameters with the array size known by the called function). I have little experience with VLAs, and as far as I know they're not widely used. However, I do want to point out that the above discussion might apply differently to VLAs.

Solution 3 - C

sizeof is evaluated at compile-time, and the compiler knows whether the operand is an array or a pointer. For arrays it gives the number of bytes occupied by the array. Your array is a char[] (and sizeof(char) is 1), thus sizeof happens to give you the number of elements. To get the number of elements in the general case, a common idiom is (here for int):

int y[20];
printf("number of elements in y is %lu\n", sizeof(y) / sizeof(int));

For pointers sizeof gives the number of bytes occupied by the raw pointer type.

Solution 4 - C

In

char hello[] = "hello there"
int i;

and

char* hello = "hello there";
int i;

In the first instance (discounting alignment) 12 bytes will be stored for hello with the allocated space initialised to hello there while in the second hello there is stored elsewhere (possibly static space) and hello is initialised to point to the given string.

hello[2] as well as *(hello + 2) will return 'e' in both instances however.

Solution 5 - C

In addition to what the others said, perhaps this article helps: http://en.wikipedia.org/wiki/C_%28programming_language%29#Array-pointer_interchangeability

Solution 6 - C

> If 'y' is a constant pointer, why does it have a size of 20, like the sequence of values it points to?

Because z is the address of the variable, and will always return 8 for your machine. You need to use the dereference pointer (&) in order to get the contents of a variable.

EDIT: A good distinction between the two: http://www.cs.cf.ac.uk/Dave/C/node10.html

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionsalvador pView Question on Stackoverflow
Solution 1 - CJohn BodeView Answer on Stackoverflow
Solution 2 - CMichael BurrView Answer on Stackoverflow
Solution 3 - CPéter TörökView Answer on Stackoverflow
Solution 4 - CdoronView Answer on Stackoverflow
Solution 5 - CMark LoeserView Answer on Stackoverflow
Solution 6 - CAndrew SledgeView Answer on Stackoverflow