Under what circumstances are linked lists useful?

Language AgnosticData StructuresLinked List

Language Agnostic Problem Overview


Most times I see people try to use linked lists, it seems to me like a poor (or very poor) choice. Perhaps it would be useful to explore the circumstances under which a linked list is or is not a good choice of data structure.

Ideally, answers would expound on the criteria to use in selecting a data structure, and which data structures are likely to work best under specified circumstances.

Edit: I must say, I'm quite impressed by not only the number, but the quality of answers. I can only accept one, but there are two or three more I'd have to say would have been worth accepting if something a bit better hadn't been there. Only a couple (especially the one I ended up accepting) pointed to situations where a linked list provided a real advantage. I do think Steve Jessop deserves some sort of honorable mention for coming up with not just one, but three different answers, all of which I found quite impressive. Of course, even though it was posted only as a comment, not an answer, I think Neil's blog entry is well worth reading as well -- not only informative, but quite entertaining as well.

Language Agnostic Solutions


Solution 1 - Language Agnostic

Linked lists are very useful when you need to do a lot of insertions and removals, but not too much searching, on a list of arbitrary (unknown at compile-time) length.

Splitting and joining (bidirectionally-linked) lists is very efficient.

You can also combine linked lists - e.g. tree structures can be implemented as "vertical" linked lists (parent/child relationships) connecting together horizontal linked lists (siblings).

Using an array based list for these purposes has severe limitations:

  • Adding a new item means the array must be reallocated (or you must allocate more space than you need to allow for future growth and reduce the number of reallocations)
  • Removing items leaves wasted space or requires a reallocation
  • inserting items anywhere except the end involves (possibly reallocating and) copying lots of the data up one position

Solution 2 - Language Agnostic

They can be useful for concurrent data structures. (There is now a non-concurrent real-world usage sample below - that would not be there if @Neil hadn't mentioned FORTRAN. ;-)

For example, ConcurrentDictionary<TKey, TValue> in .NET 4.0 RC use linked lists to chain items that hash to the same bucket.

The underlying data structure for ConcurrentStack<T> is also a linked list.

ConcurrentStack<T> is one of the data structures that serve as the foundation for the new Thread Pool, (with the local "queues" implemented as stacks, essentially). (The other main supporting structure being ConcurrentQueue<T>.)

The new Thread Pool in turn provides the basis for the work scheduling of the new Task Parallel Library.

So they can certainly be useful - a linked list is currently serving as one of the main supporting structures of at least one great new technology.

(A singly-linked list makes a compelling lock-free - but not wait-free - choice in these cases, because main operations can be carried out with a single CAS (+retries). In a modern GC-d environment - such as Java and .NET - the ABA problem can easily be avoided. Just wrap items you add in freshly created nodes and do not reuse those nodes - let the GC do its work. The page on the ABA problem also provides the implementation of a lock-free stack - that actually works in .Net (&Java) with a (GC-ed) Node holding the items.)

Edit: @Neil: actually, what you mentioned about FORTRAN reminded me that the same kind of linked lists can be found in probably the most used and abused data structure in .NET: the plain .NET generic Dictionary<TKey, TValue>.

Not one, but many linked lists are stored in an array.

  • It avoids doing many small (de)allocations on inserts/deletes.
  • Initial loading of the hash table is pretty fast, because the array is filled sequentially (plays very nice with CPU cache).
  • Not to mention that a chaining hash table is expensive in terms of memory - and this "trick" cuts "pointer sizes" in half on x64.

Essentially, many linked lists are stored in an array. (one for each bucket used.) A free list of reusable nodes is "interwoven" between them (if there were deletes). An array is allocated at the start/on rehash and nodes of chains are kept in it. There is also a free pointer - an index into the array - that follows deletes. ;-) So - believe it or not - the FORTRAN technique still lives on. (...and nowhere else, than in one of the most commonly used .NET data structures ;-).

Solution 3 - Language Agnostic

Linked lists are very flexible: With the modification of one pointer, you can make a massive change, where the same operation would be very inefficient in an array list.

Solution 4 - Language Agnostic

Arrays are the data structures to which Linked Lists are usually compared.

Normally linked lists are useful when you have to make a lot of modification to the list itself while arrays performs better than lists on direct element access.

Here's a list of operations that can be performed on lists and arrays, compared with the relative operation cost (n = list/array length):

  • Adding an element:
  • on lists you just need to allocate memory for the new element and redirecting pointers. O(1)
  • on arrays you have to relocate the array. O(n)
  • Removing an element
  • on lists you just redirect pointers. O(1).
  • on arrays you spend O(n) time to relocate the array if the element to remove is not the first or the last element of the array; otherwise you can simply relocate the pointer to the start of the array or decrease the array length
  • Getting an element in a known position:
  • on lists you have to walk the list from the first element to the element in the specific position. Worst case: O(n)
  • on arrays you can access the element immediately. O(1)

This is a very low-level comparison of these two popular and basic data structures and you can see that lists performs better in situations where you have to make a lot of modifications to the list it self (removing or adding elements). On the other hand arrays performs better than lists when you have to access directly the elements of the array.

From the point of view of the memory allocation, lists are better because there's no need to have all the elements next to each others. On the other hand there's the (little) overhead of storing the pointers to the next (or even to the previous) element.

Knowing these differences is important to developers to choose between lists and arrays in their implementations.

Note that this is a comparison of lists and arrays. There are good solutions to the problems here reported (eg: SkipLists, Dynamic Arrays, etc...). In this answer I've taken into account the basic data structure every programmer should know about.

Solution 5 - Language Agnostic

They're useful when you need high-speed push, pop and rotate, and don't mind O(n) indexing.

Solution 6 - Language Agnostic

Singly-linked list is a good choice for the free list in a cell allocator or object pool:

  1. You only need a stack, so a singly-linked list is sufficient.
  2. Everything is divided into nodes already. There is no allocation overhead for an intrusive list node, provided the cells are large enough to contain a pointer.
  3. A vector or deque would impose an overhead of one pointer per block. This is significant given that when you first create the heap, all cells are free, so it's an up-front cost. In the worst case it doubles the memory requirement per cell.

Solution 7 - Language Agnostic

Doubly-linked list is a good choice to define the ordering of a hashmap which also defines an order on the elements (LinkedHashMap in Java), especially when ordered by last access:

  1. More memory overhead than an associated vector or deque (2 pointers instead of 1), but better insert/remove performance.
  2. No allocation overhead, since you need a node for a hash entry anyway.
  3. Locality of reference is no additional problem compared with a vector or deque of pointers, since you'd have to pull each object into memory either way.

Sure, you can argue about whether an LRU cache is a good idea in the first place, compared with something more sophisticated and tuneable, but if you're going to have one, this is a fairly decent implementation. You do not want to perform a delete-from-middle-and-add-to-the-end on a vector or deque at every read access, but moving a node to the tail is typically fine.

Solution 8 - Language Agnostic

Linked lists are one of the natural choices when you cannot control where your data is stored, but you still need to somehow get from one object to the next.

For example when implementing memory tracking in C++ (new/delete replacement) you either need some control data structure that keeps track of which pointers have been freed, which you fully need to implement yourself. The alternative is to overallocate and add a linked list to the beginning of each data chunk.

Because you always imediately know, where you are in the list when delete is called, you can easily give up memory in O(1). Also adding a new chunk that has just been malloced is in O(1). Walking the list is very rarely needed in this case, so the O(n) cost is not an issue here (walking a structure is O(n) anyway).

Solution 9 - Language Agnostic

Singly-linked lists are the obvious implementation of the common "list" data type in functional programming languages:

  1. Adding to the head is fast, and (append (list x) (L)) and (append (list y) (L)) can share almost all of their data. No need for copy-on-write in a language with no writes. Functional programmers know how to take advantage of this.
  2. Adding to the tail is unfortunately slow, but so would be any other implementation.

By comparison, a vector or deque would typically be slow to add at either end, requiring (at least in my example of two distinct appends) that a copy be taken of the entire list (vector), or the index block and the data block being appended to (deque). Actually, there may be something to be said there for deque on large lists which do need to add at the tail for some reason, I'm not sufficiently informed about functional programming to judge.

Solution 10 - Language Agnostic

One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.

Solution 11 - Language Agnostic

One of the most useful cases I find for linked lists working in performance-critical fields like mesh and image processing, physics engines, and raytracing is when using linked lists actually improves locality of reference and reduces heap allocations and sometimes even reduces memory use compared to the straightforward alternatives.

Now that can seem like a complete oxymoron that linked lists could do all that since they're notorious for often doing the opposite, but they have a unique property in that each list node has a fixed size and alignment requirements which we can exploit to allow them to be stored contiguously and removed in constant-time in ways that variable-sized things cannot.

As a result, let's take a case where we want to do the analogical equivalent of storing a variable-length sequence which contains a million nested variable-length sub-sequences. A concrete example is an indexed mesh storing a million polygons (some triangles, some quads, some pentagons, some hexagons, etc) and sometimes polygons are removed from anywhere in the mesh and sometimes polygons are rebuilt to insert a vertex to an existing polygon or remove one. In that case, if we store a million tiny std::vectors, then we end up facing a heap allocation for every single vector as well as potentially explosive memory use. A million tiny SmallVectors might not suffer this problem as much in common cases, but then their preallocated buffer which isn't separately heap-allocated might still cause explosive memory use.

The problem here is that a million std::vector instances would be trying to store a million variable-length things. Variable-length things tend to want a heap allocation since they cannot very effectively be stored contiguously and removed in constant-time (at least in a straightforward way without a very complex allocator) if they didn't store their contents elsewhere on the heap.

If, instead, we do this:

struct FaceVertex
{
    // Points to next vertex in polygon or -1
    // if we're at the end of the polygon.
    int next;
    ...
};

struct Polygon
{
     // Points to first vertex in polygon.
    int first_vertex;
    ...
};

struct Mesh
{
    // Stores all the face vertices for all polygons.
    std::vector<FaceVertex> fvs;

    // Stores all the polygons.
    std::vector<Polygon> polys;
};

... then we've dramatically reduced the number of heap allocations and cache misses. Instead of requiring a heap allocation and potentially compulsory cache misses for every single polygon we access, we now only require that heap allocation when one of the two vectors stored in the entire mesh exceeds their capacity (an amortized cost). And while the stride to get from one vertex to the next might still cause its share of cache misses, it's still often less than if every single polygon stored a separate dynamic array since the nodes are stored contiguously and there's a probability that a neighboring vertex might be accessed prior to eviction (especially considering that many polygons will add their vertices all at once which makes the lion's share of polygon vertices perfectly contiguous).

Here is another example:

enter image description here

... where the grid cells are used to accelerate particle-particle collision for, say, 16 million particles moving every single frame. In that particle grid example, using linked lists we can move a particle from one grid cell to another by just changing 3 indices. Erasing from a vector and pushing back to another can be considerably more expensive and introduce more heap allocations. The linked lists also reduce the memory of a cell down to 32-bits. A vector, depending on implementation, can preallocate its dynamic array to the point where it can take 32 bytes for an empty vector. If we have around a million grid cells, that's quite a difference.

... and this is where I find linked lists most useful these days, and I specifically find the "indexed linked list" variety useful since 32-bit indices halve the memory requirements of the links on 64-bit machines and they imply that the nodes are stored contiguously in an array.

Often I also combine them with indexed free lists to allow constant-time removals and insertions anywhere:

enter image description here

In that case, the next index either points to the next free index if the node has been removed or the next used index if the node has not been removed.

And this is the number one use case I find for linked lists these days. When we want to store, say, a million variable-length sub-sequences averaging, say, 4 elements each (but sometimes with elements being removed and added to one of these sub-sequences), the linked list allows us to store 4 million linked list nodes contiguously instead of 1 million containers which are each individually heap-allocated: one giant vector, i.e., not a million small ones.

Solution 12 - Language Agnostic

From my experience, implementing sparse-matrices and fibonacci heaps. Linked lists give you more control over the overall structure for such data structures. Though I'm not sure if sparse-matrices are best implemented using linked lists - probably there is a better way, but it really helped learning the ins-and-outs of sparse matrices using linked lists in undergrad CS :)

Solution 13 - Language Agnostic

There are two complementary operations which are trivially O(1) on lists and very hard to implement in O(1) in other data structures - removing and inserting an element from arbitrary position, assuming you need to maintain the order of elements.

Hash maps can obviously do insertion and deletion in O(1) but then you cannot iterate over the elements in order.

Given the fact above, hash map can be combined with a linked list to create a nifty LRU cache: A map that stores a fixed number of key-value pairs and drops the least recently accessed key to make room for new ones.

The entries in the hash map need to have pointers to the linked list nodes. When accessing the hash map, the linked list node is unlinked from its current position and moved to the head of the list (O(1), yay for linked lists!). When there's need to remove the least recently used element, the one from the tail of the list needs to be dropped (again O(1) assuming you keep the pointer to the tail node) together with the associated hash map entry (so backlinks from the list to the hash map are necessary.)

Solution 14 - Language Agnostic

Consider that a linked list might be very useful in a Domain Driven Design style implementation of a system that includes parts that interlock with repetition.

An example that comes to mind might be if you were to be modelling a hanging chain. If you wanted to know what the tension on any particular link was, your interface could include a getter for "apparent" weight. The implementation of which would include a link asking its next link for its apparent weight, then adding its own weight to the result. This way, the whole length down to the bottom would be evaluated with a single call from the chain's client.

Being a proponent of code that reads like natural language, I like how this would let the programmer ask a chain link how much weight it's carrying. It also keeps the concern of calculating these kids of properties within the boundary of the link implementation, removing the need for a chain weight calculating service".

Solution 15 - Language Agnostic

I have used linked lists (even doubly linked lists) in the past in a C/C++ application. This was prior to .NET and even stl.

I probably wouldn't use a linked list now in a .NET language because all the traversal code you need is provided for you via the Linq extension methods.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJerry CoffinView Question on Stackoverflow
Solution 1 - Language AgnosticJason WilliamsView Answer on Stackoverflow
Solution 2 - Language AgnosticAndras VassView Answer on Stackoverflow
Solution 3 - Language AgnosticChris LercherView Answer on Stackoverflow
Solution 4 - Language AgnosticAndrea ZilioView Answer on Stackoverflow
Solution 5 - Language AgnosticIgnacio Vazquez-AbramsView Answer on Stackoverflow
Solution 6 - Language AgnosticSteve JessopView Answer on Stackoverflow
Solution 7 - Language AgnosticSteve JessopView Answer on Stackoverflow
Solution 8 - Language AgnosticLiKaoView Answer on Stackoverflow
Solution 9 - Language AgnosticSteve JessopView Answer on Stackoverflow
Solution 10 - Language AgnosticmetamorphosisView Answer on Stackoverflow
Solution 11 - Language Agnosticuser4842163View Answer on Stackoverflow
Solution 12 - Language AgnosticzakishaheenView Answer on Stackoverflow
Solution 13 - Language AgnosticRafał DowgirdView Answer on Stackoverflow
Solution 14 - Language Agnostic780FarvaView Answer on Stackoverflow
Solution 15 - Language AgnosticChrisFView Answer on Stackoverflow