How is Python's List Implemented?

PythonArraysListLinked ListPython Internals

Python Problem Overview


Is it a linked list, an array? I searched around and only found people guessing. My C knowledge isn't good enough to look at the source code.

Python Solutions


Solution 1 - Python

The C code is pretty simple, actually. Expanding one macro and pruning some irrelevant comments, the basic structure is in listobject.h, which defines a list as:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD contains a reference count and a type identifier. So, it's a vector/array that overallocates. The code for resizing such an array when it's full is in listobject.c. It doesn't actually double the array, but grows by allocating

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

to the capacity each time, where newsize is the requested size (not necessarily allocated + 1 because you can extend by an arbitrary number of elements instead of append'ing them one by one).

See also the Python FAQ.

Solution 2 - Python

It's a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists - they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.

Solution 3 - Python

I would suggest Laurent Luce's article "Python list implementation". Was really useful for me because the author explains how the list is implemented in CPython and uses excellent diagrams for this purpose.

> List object C structure

>A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.

> C typedef struct { PyObject_VAR_HEAD PyObject **ob_item; Py_ssize_t allocated; } PyListObject; >

>It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list.

...

>Append

>We append an integer to the list: l.append(1). What happens?
>enter image description here

>We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.

>enter image description here

...

>Insert

>Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally. enter image description here

...

>Pop

>When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.enter image description here

>You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4. Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.

>You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.enter image description here

...

>Remove Python list object has a method to remove a specific element: l.remove(5).enter image description here

Solution 4 - Python

This is implementation dependent, but IIRC:

  • CPython uses an array of pointers
  • Jython uses an ArrayList
  • IronPython apparently also uses an array. You can browse the source code to find out.

Thus they all have O(1) random access.

Solution 5 - Python

In CPython, lists are arrays of pointers. Other implementations of Python may choose to store them in different ways.

Solution 6 - Python

According to the [documentation][1],

> Python’s lists are really variable-length arrays, not Lisp-style linked lists.

[1]: http://docs.python.org/faq/design.html#how-are-lists-implemented "How are lists implemented"

Solution 7 - Python

As others have stated above, the lists (when appreciably large) are implemented by allocating a fixed amount of space, and, if that space should fill, allocating a larger amount of space and copying over the elements.

To understand why the method is O(1) amortized, without loss of generality, assume we have inserted a = 2^n elements, and we now have to double our table to 2^(n+1) size. That means we're currently doing 2^(n+1) operations. Last copy, we did 2^n operations. Before that we did 2^(n-1)... all the way down to 8,4,2,1. Now, if we add these up, we get 1 + 2 + 4 + 8 + ... + 2^(n+1) = 2^(n+2) - 1 < 4*2^n = O(2^n) = O(a) total insertions (i.e. O(1) amortized time). Also, it should be noted that if the table allows deletions the table shrinking has to be done at a different factor (e.g 3x)

Solution 8 - Python

A list in Python is something like an array, where you can store multiple values. List is mutable that means you can change it. The more important thing you should know, when we create a list, Python automatically creates a reference_id for that list variable. If you change it by assigning others variable the main list will be change. Let's try with a example:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

We append my_list but our main list has changed. That mean's list didn't assign as a copy list assign as its reference.

Solution 9 - Python

In CPython list is implemented as dynamic array, and therefore when we append at that time not only one macro is added but some more space is allocated so that everytime new space should not be added.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGregView Question on Stackoverflow
Solution 1 - PythonFred FooView Answer on Stackoverflow
Solution 2 - Pythonuser395760View Answer on Stackoverflow
Solution 3 - PythonLesyaView Answer on Stackoverflow
Solution 4 - PythonNullUserExceptionView Answer on Stackoverflow
Solution 5 - PythonAmberView Answer on Stackoverflow
Solution 6 - Pythonravi77oView Answer on Stackoverflow
Solution 7 - PythonRussellStewartView Answer on Stackoverflow
Solution 8 - PythonhasibView Answer on Stackoverflow
Solution 9 - Pythongaurav View Answer on Stackoverflow