How does a C++ reference look, memory-wise?

C++Memory ManagementReference

C++ Problem Overview


Given:

int i = 42;
int j = 43;
int k = 44;

By looking at the variables addresses we know that each one takes up 4 bytes (on most platforms).

However, considering:

int i = 42;
int& j = i;
int k = 44;

We will see that variable i indeed takes 4 bytes, but j takes none and k takes again 4 bytes on the stack.

What is happening here? It looks like j is simply non-existent in runtime. And what about a reference I receive as a function argument? That must take some space on the stack...

And while we're at it - why can't I define an array or references?

int&[] arr = new int&[SIZE]; // compiler error! array of references is illegal

C++ Solutions


Solution 1 - C++

everywhere the reference j is encountered, it is replaced with the address of i. So basically the reference content address is resolved at compile time, and there is not need to dereference it like a pointer at run time.

Just to clarify what I mean by the address of i :

void function(int& x)
{
	x = 10;
}

int main()
{
	int i = 5;
	int& j = i;

	function(j);
}

In the above code, j should not take space on the main stack, but the reference x of function will take a place on its stack. That means when calling function with j as an argument, the address of i that will be pushed on the stack of function. The compiler can and should not reserve space on the main stack for j.

For the array part the standards say ::

> C++ Standard 8.3.2/4: > > There shall be no references to references, no arrays of references, > and no pointers to references.

Why arrays of references are illegal?

Solution 2 - C++

> How does a C++ reference look, > memory-wise?

It doesn't. The C++ standard only says how it should behave, not how it should be implemented.

In the general case, compilers usually implement references as pointers. But they generally have more information about what a reference may point to, and use that for optimization.

Remember that the only requirement for a reference is that it behaves as an alias for the referenced object. So if the compiler encounters this code:

int i = 42;
int& j = i;
int k = 44;

what it sees is not "create a pointer to the variable i" (although that is how the compiler may choose to implement it in some cases), but rather "make a note in the symbol table that j is now an alias for i."

The compiler doesn't have to create a new variable for j, it simply has to remember that whenever j is referenced from now on, it should really swap it out and use i instead.

As for creating an array of references, you can't do it because it'd be useless and meaningless.

When you create an array, all elements are default-constructed. What does it mean to default-construct a reference? What does it point to? The entire point in references is that they re initialized to reference another object, after which they can not be reseated.

So if it could be done, you would end up with an array of references to nothing. And you'd be unable to change them to reference something because they'd been initialized already.

Solution 3 - C++

Sorry for using assembly to explain this, but I think this is the best way to understand references.

#include <iostream>

using namespace std;

int main()
{
    int i = 10;
    int *ptrToI = &i;
    int &refToI = i;

    cout << "i = " << i << "\n";
    cout << "&i = " << &i << "\n";

    cout << "ptrToI = " << ptrToI << "\n";
    cout << "*ptrToI = " << *ptrToI << "\n";
    cout << "&ptrToI = " << &ptrToI << "\n";

    cout << "refToI = " << refToI << "\n";
    //cout << "*refToI = " << *refToI << "\n";
    cout << "&refToI = " << &refToI << "\n";

    return 0;
}

Output of this code is like this

i = 10
&i = 0xbf9e52f8
ptrToI = 0xbf9e52f8
*ptrToI = 10
&ptrToI = 0xbf9e52f4
refToI = 10
&refToI = 0xbf9e52f8

Lets look at the disassembly (I used GDB for this. 8, 9, and 10 here are line numbers of code)

8           int i = 10;
0x08048698 <main()+18>: movl   $0xa,-0x10(%ebp)

Here $0xa is the 10(decimal) that we are assigning to i. -0x10(%ebp) here means content of ebp register –16(decimal). -0x10(%ebp) points to the address of i on stack.

9           int *ptrToI = &i;
0x0804869f <main()+25>: lea    -0x10(%ebp),%eax
0x080486a2 <main()+28>: mov    %eax,-0x14(%ebp)

Assign address of i to ptrToI. ptrToI is again on stack located at address -0x14(%ebp), that is ebp – 20(decimal).

10          int &refToI = i;
0x080486a5 <main()+31>: lea    -0x10(%ebp),%eax
0x080486a8 <main()+34>: mov    %eax,-0xc(%ebp)

Now here is the catch! Compare disassembly of line 9 and 10 and you will observer that -0x14(%ebp) is replaced by -0xc(%ebp) in line number 10. -0xc(%ebp) is the address of refToI. It is allocated on stack. But you will never be able to get this address from you code because you are not required to know the address.

So; a reference does occupy memory. In this case, it is the stack memory, since we have allocated it as a local variable.

How much memory does it occupy? As much a pointer occupies.

Now let's see how we access the reference and pointers. For simplicity I have shown only part of the assembly snippet

16          cout << "*ptrToI = " << *ptrToI << "\n";
0x08048746 <main()+192>:        mov    -0x14(%ebp),%eax
0x08048749 <main()+195>:        mov    (%eax),%ebx
19          cout << "refToI = " << refToI << "\n";
0x080487b0 <main()+298>:        mov    -0xc(%ebp),%eax
0x080487b3 <main()+301>:        mov    (%eax),%ebx

Now compare the above two lines, you will see striking similarity. -0xc(%ebp) is the actual address of refToI which is never accessible to you.

In simple terms, if you think of reference as a normal pointer, then accessing a reference is like fetching the value at address pointed to by the reference. Which means the below two lines of code will give you the same result

cout << "Value if i = " << *ptrToI << "\n";
cout << "Value if i = " << refToI << "\n";

Now compare these:

15          cout << "ptrToI = " << ptrToI << "\n";
0x08048713 <main()+141>:        mov    -0x14(%ebp),%ebx
21          cout << "&refToI = " << &refToI << "\n";
0x080487fb <main()+373>:        mov    -0xc(%ebp),%eax

I guess you are able to spot what is happening here. If you ask for &refToI:

  1. The contents of -0xc(%ebp) address location are returned.
  2. -0xc(%ebp) is where refToI resides, and its contents are nothing but address of i.

One last thing. Why is this line commented?

// cout << "*refToI = " << *refToI << "\n";

Because *refToI is not permitted, and it will give you a compile time error.

Solution 4 - C++

In practice, a reference is equivalent to a pointer, except that the extra constraints on how references are allowed to be used can allow a compiler to "optimize it away" in more cases (depending on how smart the compiler is, its optimization settings, etc etc of course).

Solution 5 - C++

You can't define an array of references because there is no syntax to initialize them. C++ does not allow uninitialized references. As for your first question, the compiler is under no obligation to allocate space for unnecessary variables. There is no way to have j point to another variable, so it's effectively just an alias for i in the function's scope, and that's how the compiler treats it.

Solution 6 - C++

Something that is only mentioned in passing elsewhere - how to get the compiler to devote some storage space to a reference:

class HasRef
{
    int &r;

public:
    HasRef(int &n)
        : r(n) { }
};

This denies the compiler the opportunity to simply treat it as a compile-time alias (an alternative name for the same storage).

Solution 7 - C++

References don't actually exist physically until they need to have a physical manifestation (i.e., as a member of an aggregate).

Having an array of references is illegal probably due to the above. But nothing prevents you from creating an array of structs/classes that have reference members.

I'm sure someone will point out the standard clause that mentions all this.

Solution 8 - C++

It's not fixed - the compiler has a great freedom in how to implement a reference on a case by case basis. So in your second example it treats j as an alias for i, nothing else needed. When passing a ref parameter it could also use a stack-offset, again no overhead. But in other situations it could use a pointer.

Solution 9 - C++

Most about what a reference is and why and how storage for it can be optimized away by the compiler has already been said in other answers. However, in some comments it was incorrectly stated, that for reference variables (in contrast to reference arguments in functions) the reference is always just an alias and never needs extra memory. This is true if the reference always refers to the same variable. However, if the reference can refer to different memory locations and the compiler cannot determine in advance to which one, it will need to allocate memory for it, like in the following example:

#include <ctime>
#include <iostream>
int i = 2;
int j = 3;
int& k = std::time(0)%2==1 ? i : j;

int main(){
    std::cout << k << std::endl;
}

If you try this on godbolt (https://godbolt.org/z/38x1Eq83o) you will see, that e.g. gcc on x86-64 will reserve 8 bytes for k in order to store a pointer to either i or j depending on the return value of std::time.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionYuval AdamView Question on Stackoverflow
Solution 1 - C++Khaled AlshayaView Answer on Stackoverflow
Solution 2 - C++jalfView Answer on Stackoverflow
Solution 3 - C++Prasad RaneView Answer on Stackoverflow
Solution 4 - C++Alex MartelliView Answer on Stackoverflow
Solution 5 - C++Peter RudermanView Answer on Stackoverflow
Solution 6 - C++Daniel EarwickerView Answer on Stackoverflow
Solution 7 - C++MSNView Answer on Stackoverflow
Solution 8 - C++Henk HoltermanView Answer on Stackoverflow
Solution 9 - C++Elmar ZanderView Answer on Stackoverflow