Why are Standard iterator ranges [begin, end) instead of [begin, end]?

C++StlIteratorStandards

C++ Problem Overview


Why does the Standard define end() as one past the end, instead of at the actual end?

C++ Solutions


Solution 1 - C++

The best argument easily is the one made by Dijkstra himself:

  • You want the size of the range to be a simple difference end − begin;

  • including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calls to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

Solution 2 - C++

Actually, a lot of iterator related stuff suddenly makes much more sense if you consider the iterators not pointing at the elements of the sequence but in between, with dereferencing accessing the next element right to it. Then the "one past end" iterator suddenly makes immediate sense:

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^               ^
   |               |
 begin            end

Obviously begin points to the beginning of the sequence, and end points to the end of the same sequence. Dereferencing begin accesses the element A, and dereferencing end makes no sense because there's no element right to it. Also, adding an iterator i in the middle gives

   +---+---+---+---+
   | A | B | C | D |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
 begin     i      end

and you immediately see that the range of elements from begin to i contains the elements A and B while the range of elements from i to end contains the elements C and D. Dereferencing i gives the element right of it, that is the first element of the second sequence.

Even the "off-by-one" for reverse iterators suddenly becomes obvious that way: Reversing that sequence gives:

   +---+---+---+---+
   | D | C | B | A |
   +---+---+---+---+
   ^       ^       ^
   |       |       |
rbegin     ri     rend
 (end)    (i)   (begin)

I've written the corresponding non-reverse (base) iterators in parentheses below. You see, the reverse iterator belonging to i (which I've named ri) still points in between elements B and C. However due to reversing the sequence, now element B is on the right to it.

Solution 3 - C++

Why does the Standard define end() as one past the end, instead of at the actual end?

Because:

  1. It avoids special handling for empty ranges. For empty ranges, begin() is equal to end() &
  2. It makes the end criterion simple for loops that iterate over the elements: The loops simply continue as long as end() is not reached.

Solution 4 - C++

Because then

size() == end() - begin()   // For iterators for whom subtraction is valid

and you won't have to do awkward things like

// Never mind that this is INVALID for input iterators...
bool empty() { return begin() == end() + 1; }

and you won't accidentally write erroneous code like

bool empty() { return begin() == end() - 1; }    // a typo from the first version
                                                 // of this post
                                                 // (see, it really is confusing)

bool empty() { return end() - begin() == -1; }   // Signed/unsigned mismatch
// Plus the fact that subtracting is also invalid for many iterators

Also: What would find() return if end() pointed to a valid element?
Do you really want another member called invalid() which returns an invalid iterator?!
Two iterators is already painful enough...

Oh, and see this related post.


Also:

If the end was before the last element, how would you insert() at the true end?!

Solution 5 - C++

The iterator idiom of half-closed ranges [begin(), end()) is originally based on pointer arithmetic for plain arrays. In that mode of operation, you would have functions that were passed an array and a size.

void func(int* array, size_t size)

Converting to half-closed ranges [begin, end) is very simple when you have that information:

int* begin;
int* end = array + size;

for (int* it = begin; it < end; ++it) { ... }

To work with fully-closed ranges, it's harder:

int* begin;
int* end = array + size - 1;

for (int* it = begin; it <= end; ++it) { ... }

Since pointers to arrays are iterators in C++ (and the syntax was designed to allow this), it's much easier to call std::find(array, array + size, some_value) than it is to call std::find(array, array + size - 1, some_value).


Plus, if you work with half-closed ranges, you can use the != operator to check for the end condition, becuase (if your operators are defined correctly) < implies !=.

for (int* it = begin; it != end; ++ it) { ... }

However there's no easy way to do this with fully-closed ranges. You're stuck with <=.

The only kind of iterator that supports < and > operations in C++ are random-access iterators. If you had to write a <= operator for every iterator class in C++, you'd have to make all of your iterators fully comparable, and you'd fewer choices for creating less capable iterators (such as the bidirectional iterators on std::list, or the input iterators that operate on iostreams) if C++ used fully-closed ranges.

Solution 6 - C++

With the end() pointing one past the end, it is easy to iterate a collection with a for loop:

for (iterator it = collection.begin(); it != collection.end(); it++)
{
    DoStuff(*it);
}

With end() pointing to the last element, a loop would be more complex:

iterator it = collection.begin();
while (!collection.empty())
{
    DoStuff(*it);
  
    if (it == collection.end())
        break;
  
    it++;
}

Solution 7 - C++

  1. If a container is empty, begin() == end().
  2. C++ Programmers tend to use != instead of < (less than) in loop conditions, therefore having end() pointing to a position one off-the-end is convenient.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPuppyView Question on Stackoverflow
Solution 1 - C++Kerrek SBView Answer on Stackoverflow
Solution 2 - C++celtschkView Answer on Stackoverflow
Solution 3 - C++Alok SaveView Answer on Stackoverflow
Solution 4 - C++user541686View Answer on Stackoverflow
Solution 5 - C++Ken BloomView Answer on Stackoverflow
Solution 6 - C++Anders AbelView Answer on Stackoverflow
Solution 7 - C++Andreas DMView Answer on Stackoverflow