C++11 initializer list fails - but only on lists of length 2

C++C++11Initializer List

C++ Problem Overview


I tracked down an obscure logging bug to the fact that initializer lists of length 2 appear to be a special case! How is this possible?

The code was compiled with Apple LLVM version 5.1 (clang-503.0.40), using CXXFLAGS=-std=c++11 -stdlib=libc++.

#include <stdio.h>

#include <string>
#include <vector>

using namespace std;

typedef vector<string> Strings;

void print(string const& s) {
    printf(s.c_str());
    printf("\n");
}

void print(Strings const& ss, string const& name) {
    print("Test " + name);
    print("Number of strings: " + to_string(ss.size()));
    for (auto& s: ss) {
        auto t = "length = " + to_string(s.size()) + ": " + s;
        print(t);
    }
    print("\n");
}

void test() {
    Strings a{{"hello"}};                  print(a, "a");
    Strings b{{"hello", "there"}};         print(b, "b");
    Strings c{{"hello", "there", "kids"}}; print(c, "c");

    Strings A{"hello"};                    print(A, "A");
    Strings B{"hello", "there"};           print(B, "B");
    Strings C{"hello", "there", "kids"};   print(C, "C");
}

int main() {
    test();
}

Output:

Test a
Number of strings: 1
length = 5: hello

Test b
Number of strings: 1
length = 8: hello

Test c
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids

Test A
Number of strings: 1
length = 5: hello

Test B
Number of strings: 2
length = 5: hello
length = 5: there

Test C
Number of strings: 3
length = 5: hello
length = 5: there
length = 4: kids

I should also add that the length of the bogus string in test b seems to be indeterminate - it's always greater than the first initializer string but has varied from one more than the length of the first string to the total of the lengths of the two strings in the initializer.

C++ Solutions


Solution 1 - C++

Introduction

Imagine the following declaration, and usage:

struct A {
  A (std::initializer_list<std::string>);
};

A {{"a"          }}; // (A), initialization of 1 string
A {{"a", "b"     }}; // (B), initialization of 1 string << !!
A {{"a", "b", "c"}}; // (C), initialization of 3 strings

In (A) and (C), each c-style string is causing the initialization of one (1) std::string, but, as you have stated in your question, (B) differs.

The compiler sees that it's possible to construct a std::string using a begin- and end-iterator, and upon parsing statement (B) it will prefer such construct over using "a" and "b" as individual initializers for two elements.

A { std::string { "a", "b" } }; // the compiler's interpretation of (B)


> Note: The type of "a" and "b" is char const[2], a type which can implicitly decay into a char const*, a pointer-type which is suitable to act like an iterator denoting either begin or end when creating a std::string.. but we must be careful: we are causing undefined-behavior since there is no (guaranteed) relation between the two pointers upon invoking said constructor.


Explanation

When you invoke a constructor taking an std::initializer_list using double braces {{ a, b, ... }}, there are two possible interpretations:

  1. The outer braces refer to the constructor itself, the inner braces denotes the elements to take part in the std::initializer_list, or:

  2. The outer braces refer to the std::initializer_list, whereas the inner braces denotes the initialization of an element inside it.

It's prefered to do 2) whenever that is possible, and since std::string has a constructor taking two iterators, it is the one being called when you have std::vector<std::string> {{ "hello", "there" }}.

Further example:

std::vector<std::string> {{"this", "is"}, {"stackoverflow"}}.size (); // yields 2

Solution

Don't use double braces for such initialization.

Solution 2 - C++

First of all, this is undefined behaviour unless I'm missing something obvious. Now let me explain. The vector is being constructed from an initializer list of strings. However this list only contains one string. This string is formed by the inner {"Hello", "there"}. How? With the iterator constructor. Essentially, for (auto it = "Hello"; it != "there"; ++it) is forming a string containing Hello\0.

For a simple example, see here. While UB is reason enough, it would seem the second literal is being placed right after the first in memory. As a bonus, do "Hello", "Hello" and you'll probably get a string of length 0. If you don't understand anything in here, I recommend reading Filip's excellent answer.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTom SwirlyView Question on Stackoverflow
Solution 1 - C++Filip Roséen - refpView Answer on Stackoverflow
Solution 2 - C++chrisView Answer on Stackoverflow