Why compiler doesn't allow std::string inside union?

C++

C++ Problem Overview


i want to use string inside Union. if i write as below

union U
{
   int i;
   float f;
   string s;
};

Compiler gives error saying U::S has copy constructor.

I read some other post for alternate ways for solving this issue. But i want to know why compiler doesn't allow this in the first place?

EDIT: @KennyTM: In any union, if member is initialized others will have garbage values, if none is initialized all will have garbage values. I think, tagged union just provides some comfort to access valid values from Union. Your question: how do you or the compiler write a copy constructor for the union above without extra information? sizeof(string) gives 4 bytes. Based on this, compiler can compare other members sizes and allocate largest allocation(4bytes in our example). Internal string length doesn't matter because it will be stored in a seperate location. Let the string be of any length. All that Union has to know is invoking string class copy constructor with string parameter. In whichever way compiler finds that copy constructor has to be invoked in normal case, similar method as to be followed even when string is inside Union. So i am thinking compiler could do like, allocate 4 bytes. Then if any string is assigned to s, then string class will take care of allocation and copying of that string using its own allocator. So there is no chance of memory corruption as well.

Is string not existed at the time of Union developement in compiler ? So the answer is not clear to me still. Am a new joinee in this site, if anything wrong, pls excuse me.

C++ Solutions


Solution 1 - C++

Because having a class with a non-trivial (copy/)constructor in a union doesn't make sense. Suppose we have

union U {
  string x;
  vector<int> y;
};

U u;  // <--

If U was a struct, u.x and u.y would be initialized to an empty string and empty vector respectively. But members of a union share the same address. So, if u.x is initialized, u.y will contain invalid data, and so is the reverse. If both of them are not initialized then they cannot be used. In any case, having these data in a union cannot be handled easily, so C++98 chooses to deny this: (§9.5/1):

> An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.

In C++0x this rule has been relaxed (§9.5/2):

> At most one non-static data member of a union may have a brace-or-equal-initializer. [Note: if any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union. — end note ]

but it is still a not possible to create (correct) con/destructors for the union, e.g. how do you or the compiler write a copy constructor for the union above without extra information? To ensure which member of the union is active, you need a tagged union, and you need to handle the construction and destruction manually e.g.

struct TU {
   int type;
   union {
     int i;
     float f;
     std::string s;
   } u;

   TU(const TU& tu) : type(tu.type) {
     switch (tu.type) {
       case TU_STRING: new(&u.s)(tu.u.s); break;
       case TU_INT:    u.i = tu.u.i;      break;
       case TU_FLOAT:  u.f = tu.u.f;      break;
     }
   }
   ~TU() {
     if (tu.type == TU_STRING)
       u.s.~string();
   }
   ...
};

But, as @DeadMG has mentioned, this is already implemented as boost::variant or boost::any.

Solution 2 - C++

Think about it. How does the compiler know what type is in the union?

It doesn't. The fundamental operation of a union is essentially a bitwise cast. Operations on values contained within unions are only safe when each type can essentially be filled with garbage. std::string can't, because that would result in memory corruption. Use boost::variant or boost::any.

Solution 3 - C++

In C++98/03, members of a union can't have constructors, destructors, virtual member functions, or base classes.

So basically, you can only use built-in data types, or PODs

Note that it is changing in C++0x: Unrestricted unions

union {
    int z;
    double w;
    string s;  // Illegal in C++98, legal in C++0x.
};

Solution 4 - C++

From the C++ spec §9.5.1: >An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union.

The reason for this rule is that the compiler will never know which of the destructors/constructors call, since it never really knows which of the possible objects is inside the union.

Solution 5 - C++

The garbage is introduced if you

  1. assign a string
  2. then assign an int or float
  3. then a string again

string manages memory somewhere else. This information is most likely some pointer. This pointer is garbaged when assigning the int. Assigning a new string should destroy the old string, which is not possible.

The second step should destroy the string, but does not know, if there has been a string.

They obviously have found a solution for this problem in the meantime.

Solution 6 - C++

You can now do it.
Of course if you initialize any other member of the union first, or simply don't initialize the string at all, then there's a problem.
Since the string class overloads the assignment operator, you can't then initialize the string with an assignment operation:

this->union_string = std::string("whatever");

Will fail because you're still using the assignment operator.

To properly initialize a union string after you've put something else in the union or not initialized it in the first place, you have to call the constructor directly on that memory:

new(&this->union_string) std::string("whatever");

This way you're simply not using the assignment function at all.

Another concern is your compiler should make you make a destructor, and if for some reason not, you should make it anyway. Since it's a union, by the end of your class's lifetime the compiler can't know whether that union memory is used by the string or something else, so your destructor should call the string's destructor if that's the case.
So if you don't do it, you'll have a memory leak since the constructor for the string is never called, and it never knows to release the memory it's using.

Solution 7 - C++

In new C++ standard (I tested it in C++17), you can use a complex type as a member of union.

    struct ustring
	{
		union
		{
			string s;
			wstring ws;
		};

		bool bAscii = true;
		~ustring()
	    {
		    if (bAscii)
		    {
			    s.~string();
		    }
		    else
		    {
			    ws.~wstring();
		    }
	    }
	};

However, you should be very careful. Think about you construct s but destruct ws.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionbjskishore123View Question on Stackoverflow
Solution 1 - C++kennytmView Answer on Stackoverflow
Solution 2 - C++PuppyView Answer on Stackoverflow
Solution 3 - C++KeatsPeeksView Answer on Stackoverflow
Solution 4 - C++FireAphisView Answer on Stackoverflow
Solution 5 - C++Robert RisackView Answer on Stackoverflow
Solution 6 - C++TrisTView Answer on Stackoverflow
Solution 7 - C++ZhangView Answer on Stackoverflow