Initialize all the elements of an array to the same number

C++ Problem Overview

Some time ago my old teacher posted this code saying that it is another way to initialize an array to the same number (other than zero of course).

Three in this case.

He said that this way is slightly better than the for loop. Why do I need the left shift operator? Why do I need another array of long? I don't understand anything what's happening here.

int main() {

    short int A[100];

    long int v = 3;
    v = (v << 16) + 3;
    v = (v << 16) + 3;
    v = (v << 16) + 3;
    long *B = (long*)A;

    for(int i=0; i<25; i++)
        B[i] = v;

    cout << endl;
    print(A,100);
}

C++ Solutions

Solution 1 - C++

There are many ways to fill an array with the same value, and if you are concerned about performance then you need to measure.

C++ has a dedicated function for filling an array with a value, and I would use this (after #include <algorithm> and #include <iterator>):

std::fill(std::begin(A), std::end(A), 3);

You shouldn't underestimate what optimizing compilers can do with something like this.

If you are interested in seeing what the compiler does, then Matt Godbolt's Compiler Explorer is a very good tool if you're prepared to learn a little bit of assembler. As you can see from here, compilers can optimize the fill call to twelve (and a bit) 128-bit stores with any loops unrolled. Because compilers have knowledge of the target environment they can do this without encoding any target-specific assumptions in the source code.

Solution 2 - C++

He assumes that long is four times longer than short (that is not guaranteed; he should use int16_t and int64_t).

He takes that longer memory space (64 bits) and fills it with four short (16 bits) values. He is setting up the values by shifting bits by 16 spaces.

Then he wants to treat an array of shorts as an array of longs, so he can set up 100 16-bit values by doing only 25 loop iteration instead of 100.

That's the way your teacher thinks, but as others said this cast is undefined behavior.

Solution 3 - C++

What an absolute load of hogwash.

For starters, v will be computed at compile time.
The behaviour of dereferencing B following long *B = (long*)A; is undefined as the types are not related. B[i] is a dereference of B.
There's no justification whatsoever for the assumption that a long is four times larger than a short.

Use a for loop in the simple way and trust the compiler to optimise. Pretty please, with sugar on top.

Solution 4 - C++

The question has the C++ tag (no C tag), so this should be done in C++ style:

// C++ 03
std::vector<int> tab(100, 3);

// C++ 11
auto tab = std::vector<int>(100, 3);
auto tab2 = std::array<int, 100>{};
tab2.fill(3);

Also the teacher is trying outsmart the compiler which can do mind-blowing things. There is no point to do such tricks since the compiler can do it for you if configured properly:

As you can see, the -O2 result code is (almost) the same for each version. In case of -O1, tricks give some improvement.

So the bottom line, you have to make a choice:

Write hard-to-read code and do not use compiler optimizations
Write readable code and use -O2

Use the Godbolt site to experiment with other compilers and configurations. See also the latest cppCon talk.

Solution 5 - C++

As explained by other answers, the code violates type aliasing rules and makes assumptions that are not guaranteed by the standard.

If you really wanted to do this optimization by hand, this would be a correct way that has well-defined behaviour:

long v;
for(int i=0; i < sizeof v / sizeof *A; i++) {
    v = (v << sizeof *A * CHAR_BIT) + 3;
}

for(int i=0; i < sizeof A / sizeof v; i++) {
    std:memcpy(A + i * sizeof v, &v, sizeof v);
}

The unsafe assumptions about the sizes of the objects were fixed by the use of sizeof, and the aliasing violation was fixed by using std::memcpy, which has well-defined behaviour regardless of the underlying type.

That said, it's probably best to keep your code simple and let the compiler do its magic instead.

> Why I need the left shift operator?

The point is to fill a bigger integer with multiple copies of the smaller integer. If you write a two-byte value s to a big integer l, then shift the bits left for two bytes (my fixed version should be clearer about where those magic numbers came from) then you'll have an integer with two copies of the bytes that constitute the value s. This is repeated until all pairs of bytes in l are set to those same values. To do the shift, you need the shift operator.

When those values are copied over an array that contains an array of the two-byte integers, a single copy will set the value of multiple objects to the value of the bytes of the larger object. Since each pair of bytes has the same value, so will the smaller integers of the array.

> Why I need another array of long?

There are no arrays of long. Only an array of short.

Solution 6 - C++

The code your teacher has shown you is an ill-formed program, no diagnostic required, because it violates a requirement that pointers actually point to the thing they claim to be pointed to (otherwise known as "strict aliasing").

As a concrete example, a compiler can analyze your program, notice that A was not directly written to and that no short was written to, and prove that A was never changed once created.

All of that messing around with B can be proven, under the C++ standard, as not being able to modify A in a well formed program.

A for(;;) loop or even a ranged-for is likely to be optimized down to static initialization of A. Your teacher's code, under an optimizing compiler, will optimize to undefined behavior.

If you really need a way to create an array initialized with one value, you could use this:

template<std::size_t...Is>
auto index_over(std::index_sequence<Is...>) {
  return [](auto&&f)->decltype(auto) {
    return f( std::integral_constant<std::size_t, Is>{}... );
  };
}
template<std::size_t N>
auto index_upto(std::integral_constant<std::size_t, N> ={})
{
  return index_over( std::make_index_sequence<N>{} );
}
template<class T, std::size_t N, T value>
std::array<T, N> make_filled_array() {
  return index_upto<N>()( [](auto...Is)->std::array<T,N>{
    return {{ (void(Is),value)... }};
  });
}

and now:

int main() {

  auto A = make_filled_array<short, 100, 3>();

  std::cout << "\n";
  print(A.data(),100);
}

creates the filled array at compile time, no loops involved.

Using godbolt you can see that the array's value was computed at compile time, and the value 3 was extracted when I access the 50th element.

This is, however, overkill (and [tag:C++14]).

Solution 7 - C++

I think he is trying to reduce the number of loop iterations by copying multiple array elements at the same time. As other users already mentioned here, this logic would lead to undefined behavior.

If it is all about reducing iterations then with loop-unrolling we can reduce the number of iterations. But it won't be significantly faster for such smaller arrays.

int main() {

    short int A[100];

    for(int i=0; i<100; i+=4)
    {
        A[i] = 3;
        A[i + 1] = 3;
        A[i + 2] = 3;
        A[i + 3] = 3;
    }
    print(A, 100);
}

Content Type	Original Author	Original Content on Stackoverflow
Question	Cimmerian_Perspective	View Question on Stackoverflow
Solution 1 - C++	CB Bailey	View Answer on Stackoverflow
Solution 2 - C++	Paweł Dymowski	View Answer on Stackoverflow
Solution 3 - C++	Bathsheba	View Answer on Stackoverflow
Solution 4 - C++	Marek R	View Answer on Stackoverflow
Solution 5 - C++	eerorika	View Answer on Stackoverflow
Solution 6 - C++	Yakk - Adam Nevraumont	View Answer on Stackoverflow
Solution 7 - C++	nyemul	View Answer on Stackoverflow