Amortized complexity in layman's terms?

Algorithm Problem Overview

Can someone explain amortized complexity in layman's terms? I've been having a hard time finding a precise definition online and I don't know how it entirely relates to the analysis of algorithms. Anything useful, even if externally referenced, would be highly appreciated.

Algorithm Solutions

Solution 1 - Algorithm

> Amortized complexity is the total expense per operation, evaluated over a sequence of operations.

The idea is to guarantee the total expense of the entire sequence, while permitting individual operations to be much more expensive than the amortized cost.

Example:
The behavior of C++ std::vector<>. When push_back() increases the vector size above its pre-allocated value, it doubles the allocated length.

So a single push_back() may take O(N) time to execute (as the contents of the array are copied to the new memory allocation).

However, because the size of the allocation was doubled, the next N-1 calls to push_back() will each take O(1) time to execute. So, the total of N operations will still take O(N) time; thereby giving push_back() an amortized cost of O(1) per operation.

Unless otherwise specified, amortized complexity is an asymptotic worst-case guarantee for any sequence of operations. This means:

Just as with non-amortized complexity, the big-O notation used for amortized complexity ignores both fixed initial overhead and constant performance factors. So, for the purpose of evaluating big-O amortized performance, you can generally assume that any sequence of amortized operations will be "long enough" to amortize away a fixed startup expense. Specifically, for the std::vector<> example, this is why you don't need to worry about whether you will actually encounter N additional operations: the asymptotic nature of the analysis already assumes that you will.
Besides arbitrary length, amortized analysis doesn't make assumptions about the sequence of operations whose cost you are measuring -- it is a worst-case guarantee on any possible sequence of operations. No matter how badly the operations are chosen (say, by a malicious adversary!), an amortized analysis must guarantee that a long enough sequence of operations may not cost consistently more than the sum of their amortized costs. This is why (unless specifically mentioned as a qualifier) "probability" and "average case" are not relevant to amortized analysis -- any more than they are to an ordinary worst-case big-O analysis!

Solution 2 - Algorithm

> In an amortized analysis, the time required to perform a sequence of data-structure operations is averaged over all the operations performed... Amortized analysis differs from average-case analysis in that probability is not involved; an amortized analysis guarantees the average performance of each operation in the worst case.

(from Cormen et al., "Introduction to Algorithms")

That might be a bit confusing since it says both that the time is averaged, and that it's not an average-case analysis. So let me try to explain this with a financial analogy (indeed, "amortized" is a word most commonly associated with banking and accounting.)

Suppose that you are operating a lottery. (Not buying a lottery ticket, which we'll get to in a moment, but operating the lottery itself.) You print 100,000 tickets, which you will sell for 1 currency unit each. One of those tickets will entitle the purchaser to 40,000 currency units.

Now, assuming you can sell all the tickets, you stand to earn 60,000 currency units: 100,000 currency units in sales, minus the 40,000 currency unit prize. For you, the value of each ticket is 0.60 currency units, amortized over all the tickets. This is a reliable value; you can bank on it. If you get tired of selling the tickets yourself, and someone comes along and offers to sell them for 0.30 currency units each, you know exactly where you stand.

For the lottery purchaser, the situation is different. The purchaser has an expected loss of 0.60 currency units when they purchase a lottery ticket. But that's probabilistic: the purchaser might buy ten lottery tickets every day for 30 years (a bit more than 100,000 tickets) without ever winning. Or they might spontaneously buy a single ticket one day, and win 39,999 currency units.

Applied to datastructure analysis, we're talking about the first case, where we amortize the cost of some datastructure operation (say, insert) over all the operations of that kind. Average-case analysis deals with the expected value of a stochastic operation (say, search), where we cannot compute the total cost of all the operations, but we can provide a probabilistic analysis of the expected cost of a single one.

It's often stated that amortized analysis applies to the situation where a high-cost operation is rare, and that's often the case. But not always. Consider, for example, the so-called "banker's queue", which is a first-in-first-out (FIFO) queue, made out of two stacks. (It's a classic functional data-structure; you can build cheap LIFO stacks out of immutable single-linked nodes, but cheap FIFOs are not so obvious). The operations are implemented as follows:

put(x):  Push x on the right-hand stack.
y=get(): If the left-hand stack is empty:
           Pop each element off the right-hand stack and
             push it onto the left-hand stack. This effectively
             reverses the right-hand stack onto the left-hand stack.
         Pop and return the top element of the left-hand stack.

Now, I claim that the amortized cost of put and get is O(1), assuming that I start and end with an empty queue. The analysis is simple: I always put onto the right-hand stack, and get from the left-hand stack. So aside from the If clause, each put is a push, and each get is a pop, both of which are O(1). I don't know how many times I will execute the If clause -- it depends on the pattern of puts and gets -- but I know that every element moves exactly once from the right-hand stack to the left-hand stack. So the total cost over the entire sequence of n puts and n gets is: n pushes, n pops, and n moves, where a move is a pop followed by a push: in other words, the 2n operations (n puts and n gets) result in 2n pushes and 2n pops. So the amortized cost of a single put or get is one push and one pop.

Note that banker's queues are called that precisely because of the amortized complexity analysis (and the association of the word "amortized" with finance). Banker's queues are the answer to what used to be a common interview question, although I think it's now considered too well-known: Come up with a queue which implements the following three operations in amortized O(1) time:

Get and remove the oldest element of the queue,
Put a new element onto the queue,
Find the value of the current maximum element.

Solution 3 - Algorithm

The principle of "amortized complexity" is that although something may be quite complex when you do it, since it's not done very often, it's considered "not complex". For example, if you create a binary tree that needs balancing from time to time - say once every 2^n insertions - because although balancing the tree is quite complex, it only happens once in every n insertions (e.g once at insertion number 256, then again at 512th, 1024th, etc). On all other insertions, the complexity is O(1) - yes, it takes O(n) once every n insertions, but it's only 1/n probability - so we multiply O(n) by 1/n and get O(1). So that is said to be "Amortized complexity of O(1)" - because as you add more elements, the time consumed for rebalancing the tree is minimal.

Solution 4 - Algorithm

Amortized means divided over repeated runs. The worst-case behavior is guaranteed not to happen with much frequency. For example if the slowest case is O(N), but the chance of that happening is just O(1/N), and otherwise the process is O(1), then the algorithm would still have amortized constant O(1) time. Just consider the work of each O(N) run to be parceled out to N other runs.

The concept depends on having enough runs to divide the total time over. If the algorithm is only run once, or it has to meet a deadline each time it runs, then the worst-case complexity is more relevant.

Solution 5 - Algorithm

Say you are trying to find the kth smallest element of an unsorted array. Sorting the array would be O(n logn). So then finding the kth smallest number is just locating the index, so O(1).

Since the array is already sorted, we never have to sort again. We will never hit the worst case scenario more than once.

If we perform n queries of trying to locate kth smallest, it will still be O(n logn) because it dominates over O(1). If we average the time of each operation it will be:

(n logn)/n or O(logn). So, Time Complexity/ Number of Operations.

This is amortized complexity.

I think this is how it goes, im just learning it too..

Solution 6 - Algorithm

It is somewhat similar to multiplying worst case complexity of different branches in an algorithm with the probability of executing that branch, and adding the results. So if some branch is very unlikely to be taken, it contributes less to the complexity.

Content Type	Original Author	Original Content on Stackoverflow
Question	Bob John	View Question on Stackoverflow
Solution 1 - Algorithm	comingstorm	View Answer on Stackoverflow
Solution 2 - Algorithm	rici	View Answer on Stackoverflow
Solution 3 - Algorithm	Mats Petersson	View Answer on Stackoverflow
Solution 4 - Algorithm	Potatoswatter	View Answer on Stackoverflow
Solution 5 - Algorithm	user2968401	View Answer on Stackoverflow
Solution 6 - Algorithm	perreal	View Answer on Stackoverflow