Finding the median of an unsorted array

AlgorithmHeapMedian

Algorithm Problem Overview


To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this approach would take O(nlogn) time.

Can we do the same by some method in O(n) time? If we can, then please tell or suggest some method.

Algorithm Solutions


Solution 1 - Algorithm

You can use the Median of Medians algorithm to find median of an unsorted array in linear time.

Solution 2 - Algorithm

I have already upvoted the @dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort

Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).

BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams

Solution 3 - Algorithm

Quickselect works in O(n), this is also used in the partition step of Quicksort.

Solution 4 - Algorithm

The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:

import random

def partition(L, v):
    smaller = []
    bigger = []
    for val in L:
        if val < v: smaller += [val]
        if val > v: bigger += [val]
    return (smaller, [v], bigger)

def top_k(L, k):
    v = L[random.randrange(len(L))]
    (left, middle, right) = partition(L, v)
    # middle used below (in place of [v]) for clarity
    if len(left) == k:   return left
    if len(left)+1 == k: return left + middle
    if len(left) > k:    return top_k(left, k)
    return left + middle + top_k(right, k - len(left) - len(middle))

def median(L):
    n = len(L)
    l = top_k(L, n / 2 + 1)
    return max(l)

Solution 5 - Algorithm

The answer is "No, one can't find the median of an arbitrary, unsorted dataset in linear time". The best one can do as a general rule (as far as I know) is Median of Medians (to get a decent start), followed by Quickselect. Ref: [https://en.wikipedia.org/wiki/Median_of_medians][1]

Solution 6 - Algorithm

It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).

Solution 7 - Algorithm

As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
http://en.wikipedia.org/wiki/Selection_algorithm

Here is Java source for a Quickselect algorithm to find the k'th element in an array:

/**
 * Returns position of k'th largest element of sub-list.
 * 
 * @param list list to search, whose sub-list may be shuffled before
 *            returning
 * @param lo first element of sub-list in list
 * @param hi just after last element of sub-list in list
 * @param k
 * @return position of k'th largest element of (possibly shuffled) sub-list.
 */
static int select(double[] list, int lo, int hi, int k) {
	int n = hi - lo;
	if (n < 2)
		return lo;

	double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot

	// Triage list to [<pivot][=pivot][>pivot]
	int nLess = 0, nSame = 0, nMore = 0;
	int lo3 = lo;
	int hi3 = hi;
	while (lo3 < hi3) {
		double e = list[lo3];
		int cmp = compare(e, pivot);
		if (cmp < 0) {
			nLess++;
			lo3++;
		} else if (cmp > 0) {
			swap(list, lo3, --hi3);
			if (nSame > 0)
				swap(list, hi3, hi3 + nSame);
			nMore++;
		} else {
			nSame++;
			swap(list, lo3, --hi3);
		}
	}
	assert (nSame > 0);
	assert (nLess + nSame + nMore == n);
	assert (list[lo + nLess] == pivot);
	assert (list[hi - nMore - 1] == pivot);
	if (k >= n - nMore)
		return select(list, hi - nMore, hi, k - nLess - nSame);
	else if (k < nLess)
		return select(list, lo, lo + nLess, k);
	return lo + k;
}

I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].

In practice, you can expect the above code to be o(N).

Solution 8 - Algorithm

Let the problem be: finding the Kth largest element in an unsorted array.

Divide the array into n/5 groups where each group consisting of 5 elements.

Now a1,a2,a3....a(n/5) represent the medians of each group.

x = Median of the elements a1,a2,.....a(n/5).

Now if k

else if k>n/2 then we can remove the smallest ,2nd smallest and 3rd smallest element of the group whose median is smaller than the x. We can now call the function of again with 7n/10 elements and finding the (k-3n/10)th largest value.

Time Complexity Analysis: T(n) time complexity to find the kth largest in an array of size n.

T(n) = T(n/5) + T(7n/10) + O(n)

if you solve this you will find out that T(n) is actually O(n)

n/5 + 7n/10 = 9n/10 < n

Solution 9 - Algorithm

Notice that building a heap takes O(n) actually not O(nlogn), you can check this using amortized analysis or simply check in Youtube. Extract-Min takes O(logn), therefore, extracting n/2 will take (nlogn/2) = O(nlogn) amortized time.

About your question, you can simply check at Median of Medians.

Solution 10 - Algorithm

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

Example 1:

Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

Code:

class Solution:
    def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
        merged_array = sorted(nums1 + nums2)
        if len(merged_array) % 2 == 0:
            index = int(len(merged_array)/2)
            output =  (merged_array[index - 1] +  merged_array[index])/2
        else: 
            index = int(len(merged_array)/2)
            output = merged_array[index]
        return output

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLuvView Question on Stackoverflow
Solution 1 - AlgorithmSergey KalinichenkoView Answer on Stackoverflow
Solution 2 - AlgorithmrkachachView Answer on Stackoverflow
Solution 3 - AlgorithmBrokenGlassView Answer on Stackoverflow
Solution 4 - AlgorithmdoizucView Answer on Stackoverflow
Solution 5 - AlgorithmAlanKView Answer on Stackoverflow
Solution 6 - AlgorithmImposterView Answer on Stackoverflow
Solution 7 - AlgorithmAdam Gawne-CainView Answer on Stackoverflow
Solution 8 - Algorithmhrithik maheshwView Answer on Stackoverflow
Solution 9 - AlgorithmTosa LogitechView Answer on Stackoverflow
Solution 10 - AlgorithmFilippo TeodoroView Answer on Stackoverflow