Is this a known pitfall of C++11 for loops?
C++For LoopC++11Language LawyerForeachC++ Problem Overview
Let's imagine we have a struct for holding 3 doubles with some member functions:
struct Vector {
double x, y, z;
// ...
Vector &negate() {
x = -x; y = -y; z = -z;
return *this;
}
Vector &normalize() {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
// ...
};
This is a little contrived for simplicity, but I'm sure you agree that similar code is out there. The methods allow you to conveniently chain, for example:
Vector v = ...;
v.normalize().negate();
Or even:
Vector v = Vector{1., 2., 3.}.normalize().negate();
Now if we provided begin() and end() functions, we could use our Vector in a new-style for loop, say to loop over the 3 coordinates x, y, and z (you can no doubt construct more "useful" examples by replacing Vector with e.g. String):
Vector v = ...;
for (double x : v) { ... }
We can even do:
Vector v = ...;
for (double x : v.normalize().negate()) { ... }
and also:
for (double x : Vector{1., 2., 3.}) { ... }
However, the following (it seems to me) is broken:
for (double x : Vector{1., 2., 3.}.normalize()) { ... }
While it seems like a logical combination of the previous two usages, I think this last usage creates a dangling reference while the previous two are completely fine.
- Is this correct and Widely appreciated?
- Which part of the above is the "bad" part, that should be avoided?
- Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?
C++ Solutions
Solution 1 - C++
> Is this correct and Widely appreciated?
Yes, your understanding of things is correct.
> Which part of the above is the "bad" part, that should be avoided?
The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:
auto &&t = Vector{1., 2., 3.}.normalize();
The temporary Vector{1., 2., 3.}
's lifetime cannot be extended because the compiler has no idea that the return value from normalize
references it.
> Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?
That would be highly inconsistent with how C++ works.
Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.
A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this
, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.
Presently (if the compiler supports it), you can make it so that normalize
cannot be called on a temporary:
struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector &normalize() && = delete;
};
This will cause Vector{1., 2., 3.}.normalize()
to give a compile error, while v.normalize()
will work fine. Obviously you won't be able to do correct things like this:
Vector t = Vector{1., 2., 3.}.normalize();
But you also won't be able to do incorrect things.
Alternatively, as suggested in the comments, you can make the rvalue reference version return a value rather than a reference:
struct Vector {
double x, y, z;
// ...
Vector &normalize() & {
double s = 1./sqrt(x*x+y*y+z*z);
x *= s; y *= s; z *= s;
return *this;
}
Vector normalize() && {
Vector ret = *this;
ret.normalize();
return ret;
}
};
If Vector
was a type with actual resources to move, you could use Vector ret = std::move(*this);
instead. Named return value optimization makes this reasonably optimal in terms of performance.
Solution 2 - C++
> for (double x : Vector{1., 2., 3.}.normalize()) { ... }
That is not a limitation of the language, but a problem with your code. The expression Vector{1., 2., 3.}
creates a temporary, but the normalize
function returns an lvalue-reference. Because the expression is an lvalue, the compiler assumes that the object will be alive, but because it is a reference to a temporary, the object dies after the full expression is evaluated, so you are left with a dangling reference.
Now, if you change your design to return a new object by value rather than a reference to the current object, then there would be no issue and the code would work as expected.
Solution 3 - C++
IMHO, the second example is already flawed. That the modifying operators return *this
is convenient in the way you mentioned: it allows chaining of modifiers. It can be used for simply handing on the result of the modification, but doing this is error-prone because it can easily be overlooked. If I see something like
Vector v{1., 2., 3.};
auto foo = somefunction1(v, 17);
auto bar = somefunction2(true, v, 2, foo);
auto baz = somefunction3(bar.quun(v), 93.2, v.qwarv(foo));
I wouldn't automatically suspect that the functions modify v
as a side-effect. Of course, they could, but it would be confusing. So if I was to write something like this, I would make sure that v
stays constant. For your example, I would add free functions
auto normalized(Vector v) -> Vector {return v.normalize();}
auto negated(Vector v) -> Vector {return v.negate();}
and then write the loops
for( double x : negated(normalized(v)) ) { ... }
and
for( double x : normalized(Vector{1., 2., 3}) ) { ... }
That is IMO better readable, and it's safer. Of course, it requires an extra copy, however for heap-allocated data this could likely be done in a cheap C++11 move operation.