What is the difference between char a[] = ?string?; and char *p = ?string?;?

C++Pointers

C++ Problem Overview


As the heading says, What is the difference between

char a[] = ?string?; and 
char *p = ?string?;  

This question was asked to me in interview. I even dont understand the statement.

char a[] = ?string?

Here what is ? operator? Is it a part of a string or it has some specific meaning?

C++ Solutions


Solution 1 - C++

The ? seems to be a typo, it is not semantically valid. So the answer assumes the ? is a typo and explains what probably the interviewer actually meant to ask.


Both are distinctly different, for a start:

  1. The first creates a pointer.
  2. The second creates an array.

Read on for more detailed explanation:

The Array version:

char a[] = "string";  

Creates an array that is large enough to hold the string literal "string", including its NULL terminator. The array string is initialized with the string literal "string". The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.


The pointer version:

char *p  = "string"; 

Creates a pointer to point to a string literal "string". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in a read only implementation-defined memory. Modifying such an string literal results in Undefined Behavior.

In fact C++03 deprecates[Ref 1] use of string literal without the const keyword. So the declaration should be:

const char *p = "string";

Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.


Which version is better and which one shall I use?

Depends on the Usage.

  • If you do not need to make any changes to the string, use the pointer version.
  • If you intend to change the data, use the array version.

Note: This is a not C++ but this is C specific.

Note that, use of string literal without the const keyword is perfectly valid in C. However, modifying a string literal is still an Undefined Behavior in C[Ref 2].

This brings up an interesting question,
What is the difference between char* and const char* when used with string literals in C?


For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2

>A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]

C++11 simply removes the above quotation which implies that it is illegal code in C++11.

[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":

>In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters... > >It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Solution 2 - C++

The first one is array the other is pointer.

> The array declaration char a[6]; requests that space for six characters be set aside, to be known by the name a. That is, there is a location named a at which six characters can sit. The pointer declaration char *p; on the other hand, requests a place which holds a pointer. The pointer is to be known by the name p, and can point to any char (or contiguous array of chars) anywhere. > > The statements > > char a[] = "string"; > char *p = "string"; > > would result in data structures which could be represented like this: > >
> >
> >
> > +---+---+---+---+---+---+----+ > a: | s | t | r | i | n | g | \0 | > +---+---+---+---+---+---+----+ > +-----+ +---+---+---+---+---+---+---+ > p: | *======> | s | t | r | i | n | g |\0 |
> +-----+ +---+---+---+---+---+---+---+ > > It is important to realize that a reference like x[3] generates different code depending on whether x is an array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location a, move three elements past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location p, fetch the pointer value there, add three element sizes to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character l, but the compiler gets there differently.

Source: comp.lang.c FAQ list · Question 6.2

Solution 3 - C++

char a[] = "string";

This allocates the string on the stack.

char *p = "string";

This creates a pointer on the stack that points to the literal in the data segment of the process.

? is whoever wrote it not knowing what they were doing.

Solution 4 - C++

Stack, heap, datasegment(and BSS) and text segement are the four segments of process memory. All the local variables defined will be in stack. Dynmically allocated memory using malloc and calloc will be in heap. All the global and static variables will be in data segment. Text segment will have the assembly code of the program and some constants.

In these 4 segements, text segment is the READ ONLY segment and in the all the other three is for READ and WRITE.

char a[] = "string"; - This statemnt will allocate memory for 7 bytes in stack(because local variable) and it will keep all the 6 characters(s, t, r, i, n, g) plus NULL character (\0) at the end.

char *p = "string"; - This statement will allocate memory for 4 bytes(if it is 32 bit machine) in stack(because this is also a local variable) and it will hold the pointer of the constant string which value is "string". This 6 byte of constant string will be in text segment. This is a constant value. Pointer variable p just points to that string.

Now a[0] (index can be 0 to 5) means, it will access first character of that string which is in stack. So we can do write also at this position. a[0] = 'x'. This operation is allowed because we have READ WRITE access in stack.

But p[0] = 'x' will leads to crash, because we have only READ access to text segement. Segmentation fault will happen if we do any write on text segment.

But you can change the value of variable p, because its local variable in stack. like below

char *p = "string";
printf("%s", p);
p = "start";
printf("%s", p);

This is allowed. Here we are changing the address stored in the pointer variable p to address of the string start(again start is also a read only data in text segement). If you want to modify values present in *p means go for dynamically allocated memory.

char *p = NULL;
p = malloc(sizeof(char)*7);
strcpy(p, "string");

Now p[0] = 'x' operation is allowed, because now we are writing in heap.

Solution 5 - C++

char *p = "string"; creates a pointer to read-only memory where string literal "string" is stored. Trying to modify string that p points to leads to undefined behaviour.

char a[] = "string"; creates an array and initializes its content by using string literal "string".

Solution 6 - C++

They do differ as to where the memory is stored. Ideally the second one should use const char *.

The first one

char buf[] = "hello";

creates an automatic buffer big enough to hold the characters and copies them in (including the null terminator).

The second one

const char * buf = "hello";

should use const and simply creates a pointer that points at memory usually stored in static space where it is illegal to modify it.

The converse (of the fact you can modify the first safely and not the second) is that it is safe to return the second pointer from a function, but not the first. This is because the second one will remain a valid memory pointer outside the scope of the function, the first will not.

const char * sayHello()
{
     const char * buf = "hello";
     return buf; // valid
}

const char * sayHelloBroken()
{
     char buf[] = "hello";
     return buf; // invalid
}

Solution 7 - C++

a declares an array of char values -- an array of chars which is terminated.

p declares a pointer, which refers to an immutable, terminated, C string, whose exact storage location is implementation-defined. Note that this should be const-qualified (e.g. const char *p = "string";).

If you print it out using std::cout << "a: " << sizeof(a) << "\np: " << sizeof(p) << std::endl;, you will see differences their sizes (note: values may vary by system):

a: 7
p: 8

> Here what is ? operator? Is it a part of a string or it has some specific meaning? >> char a[] = ?string?

I assume they were once double quotes "string", which potentially were converted to "smart quotes", then could not be represented as such along the way, and were converted to ?.

Solution 8 - C++

C and C++ have very similar Pointer to Array relationships...

I can't speak for the exact memory locations of the two statements you are asking about, but I found they articles interesting and useful for understanding some of the differences between the char Pointer declaration, and a char Array declaration.

For clarity:

C Pointer and Array relationship

C++ Pointer to an Array

I think it's important to remember that an array, in C and C++, is a constant pointer to the first element of the array. And consequently you can perform pointer arithmetic on the array.

char *p = "string"; <--- This is a pointer that points to the first address of a character string.

the following is also possible:

char *p;
char a[] = "string";

p = a; 

At this point p now references the first memory address of a (the address of the first element)

and so *p == 's'

*(p++) == 't' and so on. (or *(p+1) == 't')

and the same thing would work for a: *(a++) or *(a+1) would also equal 't'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSachin MhetreView Question on Stackoverflow
Solution 1 - C++Alok SaveView Answer on Stackoverflow
Solution 2 - C++user1208519View Answer on Stackoverflow
Solution 3 - C++Ignacio Vazquez-AbramsView Answer on Stackoverflow
Solution 4 - C++rashokView Answer on Stackoverflow
Solution 5 - C++LihOView Answer on Stackoverflow
Solution 6 - C++CashCowView Answer on Stackoverflow
Solution 7 - C++justinView Answer on Stackoverflow
Solution 8 - C++meltdownmonkView Answer on Stackoverflow