Why are slice and range upper-bound exclusive?

PythonLanguage DesignSlice

Python Problem Overview


Disclaimer: I am not asking if the upper-bound stopargument of slice()and range() is exclusive or how to use these functions.

Calls to the rangeand slicefunctions, as well as the slice notation [start:stop] all refer to sets of integers.

range([start], stop[, step])
slice([start], stop[, step])

In all these, the stop integer is excluded.

I am wondering why the language is designed this way.

Is it to make stopequal to the number of elements in the represented integer set when start equals 0 or is omitted?

Is it to have:

for i in range(start, stop):

look like the following C code?

for (i = start ; i < stop; i++) {

Python Solutions


Solution 1 - Python

The documentation implies this has a few useful properties:

word[:2]    # The first two characters
word[2:]    # Everything except the first two characters

> Here’s a useful invariant of slice operations: s[:i] + s[i:] equals s.

> For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of word[1:3] is 2.

I think we can assume that the range functions act the same for consistency.

Solution 2 - Python

Here's the opinion of some Google+ user:

> [...] I was swayed by the elegance of half-open intervals. Especially the > invariant that when two slices are adjacent, the first slice's end > index is the second slice's start index is just too beautiful to > ignore. For example, suppose you split a string into three parts at > indices i and j -- the parts would be a[:i], a[i:j], and a[j:].

Google+ is closed, so link doesn't work anymore. Spoiler alert: that was Guido van Rossum.

Solution 3 - Python

Elegant-ness VS Obvious-ness

To be honest, I thought the way of slicing in Python is quite counter-intuitive, it's actually trading the so called elegant-ness with more brain-processing, that is why you can see that this StackOverflow article has more than 2Ks of upvotes, I think it's because there's a lot of people don't understand it intially.

Just for example, the following code had already caused headache for a lot of Python newbies.

x = [1,2,3,4]
print(x[0:1])
# Output is [1]

Not only it is hard to process, it is also hard to explain properly, for example, the explanation for the code above would be take the zeroth element until the element before the first element.

Now look at Ruby which uses upper-bound inclusive.

x = [1,2,3,4]
puts x[0..1]
# Output is [1,2]

To be frank, I really thought the Ruby way of slicing is better for the brain.

Of course, when you are splitting a list into 2 parts based on an index, the exclusive upper bound approach would result in better-looking code.

# Python
x = [1,2,3,4]
pivot = 2
print(x[:pivot]) # [1,2]
print(x[pivot:]) # [3,4]

Now let's looking the inclusive upper bound approach

# Ruby
x = [1,2,3,4]
pivot = 2
puts x[0..(pivot-1)] # [1,2]
puts x[pivot..-1] # [3,4]

Obviously, the code is less elegant, but there's not much brain-processing to be done here.

Conclusion

In the end, it's really a matter about Elegant-ness VS Obvious-ness, and the designers of Python prefer elegant-ness over obvious-ness. Why? Because the Zen of Python states that Beautiful is better than ugly.

Solution 4 - Python

A bit late to this question, nonetheless, this attempts to answer the why-part of your question:

Part of the reason is because we use zero-based indexing/offsets when addressing memory.

The easiest example is an array. Think of an "array of 6 items" as a location to store 6 data items. If this array's start location is at memory address 100, then data, let's say the 6 characters 'apple\0', are stored like this:

memory/
array      contains
location   data
 100   ->   'a'
 101   ->   'p'
 102   ->   'p'
 103   ->   'l'
 104   ->   'e'
 105   ->   '\0'

So for 6 items, our index goes from 100 to 105. Addresses are generated using base + offset, so the first item is at base memory location 100 + offset 0 (i.e., 100 + 0), the second at 100 + 1, third at 100 + 2, ..., until 100

  • 5 is the last location.

This is the primary reason we use zero based indexing and leads to language constructs such as for loops in C:

for (int i = 0; i < LIMIT; i++)

or in Python:

for i in range(LIMIT):

When you program in a language like C where you deal with pointers more directly, or assembly even more so, this base+offset scheme becomes much more obvious.

Because of the above, many language constructs automatically use this range from start to length-1.

You might find this article on Zero-based numbering on Wikipedia interesting, and also this question from Software Engineering SE.

Example:

In C for instance if you have an array ar and you subscript it as ar[3] that really is equivalent to taking the (base) address of array ar and adding 3 to it => *(ar+3) which can lead to code like this printing the contents of an array, showing the simple base+offset approach:

for(i = 0; i < 5; i++)
   printf("%c\n", *(ar + i));

really equivalent to

for(i = 0; i < 5; i++)
   printf("%c\n", ar[i]);

Solution 5 - Python

Here is another reason why an exclusive upper bound is a saner approach:

Suppose you wished to write a function that applies some transform to a subsequence of items in a list. If intervals were to use an inclusive upper bound as you suggest, you might naively try writing it as:

def apply_range_bad(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end]"""
     left = lst[0 : start-1]
     middle = lst[start : end]
     right = lst[end+1 :]
     return left + [transform(i) for i in middle] + right

At first glance, this seems straightforward and correct, but unfortunately it is subtly wrong.

What would happen if:

  • start == 0
  • end == 0
  • end < 0

? In general, there might be even more boundary cases that you should consider. Who wants to waste time thinking about all of that? (These problems arise because by using inclusive lower and upper bounds, there no inherent way to express an empty interval.)

Instead, by using a model where upper bounds are exclusive, dividing a list into separate slices is simpler, more elegant, and thus less error-prone:

def apply_range_good(lst, transform, start, end):
     """Applies a transform on the elements of a list in the range [start, end)"""
     left = lst[0:start]
     middle = lst[start:end]
     right = lst[end:]
     return left + [transform(i) for i in middle] + right

(Note that apply_range_good does not transform lst[end]; it too treats end as an exclusive upper-bound. Trying to make it use an inclusive upper-bound would still have some of the problems I mentioned earlier. The moral is that inclusive upper-bounds are usually troublesome.)

(Mostly adapted from an old post of mine about inclusive upper-bounds in another scripting language.)

Solution 6 - Python

This upper bound exclusion improves code understanding greatly. I hope it comes to other languages.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionwap26View Question on Stackoverflow
Solution 1 - PythonToomaiView Answer on Stackoverflow
Solution 2 - PythonNigel TufnelView Answer on Stackoverflow
Solution 3 - PythonWong Jia HauView Answer on Stackoverflow
Solution 4 - PythonLevonView Answer on Stackoverflow
Solution 5 - PythonjamesdlinView Answer on Stackoverflow
Solution 6 - PythontelepinuView Answer on Stackoverflow