Find duplicate element in array in time O(n)

JavaArraysAlgorithm

Java Problem Overview


I have been asked this question in a job interview and I have been wondering about the right answer.

> You have an array of numbers from 0 to n-1, one of the numbers is removed, and replaced with a number already in the array which makes a duplicate of that number. How can we detect this duplicate in time O(n)?

For example, an array of 4,1,2,3 would become 4,1,2,2.

The easy solution of time O(n2) is to use a nested loop to look for the duplicate of each element.

Java Solutions


Solution 1 - Java

This can be done in O(n) time and O(1) space.

(The algorithm only works because the numbers are consecutive integers in a known range):

In a single pass through the vector, compute the sum of all the numbers, and the sum of the squares of all the numbers.

Subtract the sum of all the numbers from N(N-1)/2. Call this A.

Subtract the sum of the squares from N(N-1)(2N-1)/6. Divide this by A. Call the result B.

The number which was removed is (B + A)/2 and the number it was replaced with is (B - A)/2.

Example:

The vector is [0, 1, 1, 2, 3, 5]:

  • N = 6

  • Sum of the vector is 0 + 1 + 1 + 2 + 3 + 5 = 12. N(N-1)/2 is 15. A = 3.

  • Sum of the squares is 0 + 1 + 1 + 4 + 9 + 25 = 40. N(N-1)(2N-1)/6 is 55. B = (55 - 40)/A = 5.

  • The number which was removed is (5 + 3) / 2 = 4.

  • The number it was replaced by is (5 - 3) / 2 = 1.

Why it works:

  • The sum of the original vector [0, ..., N-1] is N(N-1)/2. Suppose the value a was removed and replaced by b. Now the sum of the modified vector will be N(N-1)/2 + b - a. If we subtract the sum of the modified vector from N(N-1)/2 we get a - b. So A = a - b.

  • Similarly, the sum of the squares of the original vector is N(N-1)(2N-1)/6. The sum of the squares of the modified vector is N(N-1)(2N-1)/6 + b2 - a2. Subtracting the sum of the squares of the modified vector from the original sum gives a2 - b2, which is the same as (a+b)(a-b). So if we divide it by a - b (i.e., A), we get B = a + b.

  • Now B + A = a + b + a - b = 2a and B - A = a + b - (a - b) = 2b.

Solution 2 - Java

We have the original array int A[N]; Create a second array bool B[N] too, of type bool=false. Iterate the first array and set B[A[i]]=true if was false, else bing!

Solution 3 - Java

You can do it in O(N) time without any extra space. Here is how the algorithm works :

Iterate through array in the following manner :

  1. For each element encountered, set its corresponding index value to negative. Eg : if you find a[0] = 2. Got to a[2] and negate the value.

By doing this you flag it to be encountered. Since you know you cannot have negative numbers, you also know that you are the one who negated it.

  1. Check if index corresponding to the value is already flagged negative, if yes you get the duplicated element. Eg : if a[0]=2 , go to a[2] and check if it is negative.

Lets say you have following array :

int a[]  = {2,1,2,3,4};

After first element your array will be :

int a[] = {2,1,-2,3,4};

After second element your array will be :

int a[] = {2,-1,-2,3,4};

When you reach third element you go to a[2] and see its already negative. You get the duplicate.

Solution 4 - Java

Scan the array 3 times:

  1. XOR together all the array elements -> A. XOR together all the numbers from 0 to N-1 -> B. Now A XOR B = X XOR D, where X is the removed element, and D is the duplicate element.
  2. Choose any non-zero bit in A XOR B. XOR together all the array elements where this bit is set -> A1. XOR together all the numbers from 0 to N-1 where this bit is set -> B1. Now either A1 XOR B1 = X or A1 XOR B1 = D.
  3. Scan the array once more and try to find A1 XOR B1. If it is found, this is the duplicate element. If not, the duplicate element is A XOR B XOR A1 XOR B1.

Solution 5 - Java

Use a HashSet to hold all numbers already seen. It operates in (amortized) O(1) time, so the total is O(N).

Solution 6 - Java

I suggest using a BitSet. We know N is small enough for array indexing, so the BitSet will be of reasonable size.

For each element of the array, check the bit corresponding to its value. If it is already set, that is the duplicate. If not, set the bit.

Solution 7 - Java

@rici is right about the time and space usage: "This can be done in O(n) time and O(1) space."

However, the question can be expanded to broader requirement: it's not necessary that there is only one duplicate number, and numbers might not be consecutive.

OJ puts it this way here: (note 3 apparently can be narrowed)

>Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

>Note:

> - You must not modify the array (assume the array is read only). > - You must use only constant, O(1) extra space. > - Your runtime complexity should be less than O(n2). > - There is only one duplicate number in the array, but it could be repeated more than once.

The question is very well explained and answered here by Keith Schwarz, using Floyd's cycle-finding algorithm:

> The main trick we need to use to solve this problem is to notice that because we have an array of n elements ranging from 0 to n - 2, we can think of the array as defining a function f from the set {0, 1, ..., n - 1} onto itself. This function is defined by f(i) = A[i]. Given this setup, a duplicated value corresponds to a pair of indices i != j such that f(i) = f(j). Our challenge, therefore, is to find this pair (i, j). Once we have it, we can easily find the duplicated value by just picking f(i) = A[i].

> But how are we to find this repeated value? It turns out that this is a well-studied problem in computer science called cycle detection. The general form of the problem is as follows. We are given a function f. Define the sequence x_i as

    x_0     = k       (for some k)
    x_1     = f(x_0)
    x_2     = f(f(x_0))
    ...
    x_{n+1} = f(x_n)

> Assuming that f maps from a domain into itself, this function will have one of three forms. First, if the domain is infinite, then the sequence could be infinitely long and nonrepeating. For example, the function f(n) = n + 1 on the integers has this property - no number is ever duplicated. Second, the sequence could be a closed loop, which means that there is some i so that x_0 = x_i. In this case, the sequence cycles through some fixed set of values indefinitely. Finally, the sequence could be "rho-shaped." In this case, the sequence looks something like this:

 x_0 -> x_1 -> ... x_k -> x_{k+1} ... -> x_{k+j}
                    ^                       |
                    |                       |
                    +-----------------------+

> That is, the sequence begins with a chain of elements that enters a cycle, then cycles around indefinitely. We'll denote the first element of the cycle that is reached in the sequence the "entry" of the cycle.

An python implementation can also be found here:

def findDuplicate(self, nums):
    # The "tortoise and hare" step.  We start at the end of the array and try
    # to find an intersection point in the cycle.
    slow = 0
    fast = 0

    # Keep advancing 'slow' by one step and 'fast' by two steps until they
    # meet inside the loop.
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]

        if slow == fast:
            break

    # Start up another pointer from the end of the array and march it forward
    # until it hits the pointer inside the array.
    finder = 0
    while True:
        slow   = nums[slow]
        finder = nums[finder]

        # If the two hit, the intersection index is the duplicate element.
        if slow == finder:
            return slow

Solution 8 - Java

Use hashtable. Including an element in a hashtable is O(1).

Solution 9 - Java

One working solution:

asume number are integers

create an array of [0 .. N]

int[] counter = new int[N];

Then iterate read and increment the counter:

 if (counter[val] >0) {
   // duplicate
 } else {
   counter[val]++;
 }

Solution 10 - Java

This can be done in O(n) time and O(1) space. Without modifying the input array

  1. The idea is similar to finding the starting node of a loop in a linked list.
  2. Maintain two pointers: fast and slow
slow = a[0]
fast = a[a[0]]
  1. loop till slow != fast
  2. Once we find the loop (slow == fast)
  3. Reset slow back to zero
slow = 0
  1. find the starting node
while(slow != fast){
    slow = a[slow];
    fast = a[fast];
}
  1. slow is your duplicate number.

Here's a Java implementation:

class Solution {
    public int findDuplicate(int[] nums) {
        if(nums.length <= 1) return -1;
        int slow = nums[0], fast = nums[nums[0]]; //slow = head.next, fast = head.next.next
        while(slow != fast){            //check for loop
            slow = nums[slow];
            fast = nums[nums[fast]];
        }
        if(slow != fast) return -1;
        slow = 0; //reset one pointer
        while(slow != fast){ //find starting point of loop
            slow = nums[slow];
            fast = nums[fast];
        }
        return slow;
    }
}

Solution 11 - Java

This is an alternative solution in O(n) time and O(1) space. It is similar to rici's. I find it a bit easier to understand but, in practice, it will overflow faster.

Let X be the missing number and R be the repeated number.

  1. We can assume the numbers are from [1..n], i.e. zero does not appear. In fact, while looping through the array, we can test if zero was found and return immediately if not.

  2. Now consider:

     sum(A) = n (n + 1) / 2 - X + R
    
     product(A) = n! R / X
    

where product(A) is the product of all element in A skipping the zero. We have two equations in two unknowns from which X and R can be derived algebraically.

Edit: by popular demand, here is a worked-out example:

Let's set:

S = sum(A) - n (n + 1) / 2
P = n! / product(A)

Then our equations become:

R - X = S
X = R P

which can be solved to:

R = S / (1 - P)
X = P R = P S / (1 - P)

Example:

A = [0 1 2 2 4]

n = A.length - 1 = 4
S = (1 + 2 + 2 + 4) - 4 * 5 / 2 = -1
P = 4! / (1 * 2 * 2 * 4) = 3 / 2

R = -1 / (1 - 3/2) = -1 / -1/2 = 2
X = 3/2 * 2 = 3

Solution 12 - Java

You could proceed as follows:

  1. sort your array by using a Linear-time sorting algorithm (e.g. Counting sort) - O(N)
  2. scan the sorted array and stop as soon as two consecutive elements are equal - O(N)

Solution 13 - Java

public class FindDuplicate {
    public static void main(String[] args) {
        // assume the array is sorted, otherwise first we have to sort it.
        // time efficiency is o(n)
        int elementData[] = new int[] { 1, 2, 3, 3, 4, 5, 6, 8, 8 };
        int count = 1;
        int element1;
        int element2;

        for (int i = 0; i < elementData.length - 1; i++) {
            element1 = elementData[i];
            element2 = elementData[count];
            count++;
            if (element1 == element2) {
                System.out.println(element2);
            }
        }
    }
}

Solution 14 - Java

  public void duplicateNumberInArray {
	int a[] = new int[10];
	Scanner inp = new Scanner(System.in);
	for(int i=1;i<=5;i++){	
		System.out.println("enter no. ");
		a[i] = inp.nextInt();
	}
	Set<Integer> st = new HashSet<Integer>();
	Set<Integer> s = new HashSet<Integer>();
	for(int i=1;i<=5;i++){			
		if(!st.add(a[i])){
			s.add(a[i]);
		}
	}
	
	Iterator<Integer> itr = s.iterator();
                System.out.println("Duplicate numbers are");
	while(itr.hasNext()){
		System.out.println(itr.next());
	}
}

First of all creating an array of integer using Scanner class. Then iterating a loop through the numbers and checking if the number can be added to set (Numbers can be added to set only when that particular number should not be in set already, means set does not allow duplicate no. to add and return a boolean vale FALSE on adding duplicate value).If no. cannot be added means it is duplicate so add that duplicate number into another set, so that we can print later. Please note onething that we are adding the duplicate number into a set because it might be possible that duplicate number might be repeated several times, hence add it only once.At last we are printing set using Iterator.

Solution 15 - Java

//This is similar to the HashSet approach but uses only one data structure:

	int[] a = { 1, 4, 6, 7, 4, 6, 5, 22, 33, 44, 11, 5 };

	LinkedHashMap<Integer, Integer> map = new LinkedHashMap<Integer, Integer>();

	for (int i : a) {
		map.put(i, map.containsKey(i) ? (map.get(i)) + 1 : 1);
	}

	Set<Entry<Integer, Integer>> es = map.entrySet();
	Iterator<Entry<Integer, Integer>> it = es.iterator();

	while (it.hasNext()) {
		Entry<Integer, Integer> e = it.next();
		if (e.getValue() > 1) {
			System.out.println("Dupe " + e.getKey());
		}
	}

Solution 16 - Java

We can do using hashMap efficiently:

Integer[] a = {1,2,3,4,0,1,5,2,1,1,1,};
HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();
for(int x : a)
{
    if (map.containsKey(x))  map.put(x,map.get(x)+1);
    else map.put(x,1);
}

Integer [] keys = map.keySet().toArray(new Integer[map.size()]);
for(int x : keys)
{
    if(map.get(x)!=1)
    {
        System.out.println(x+" repeats : "+map.get(x));
    }
}

Solution 17 - Java

This program is based on c# and if you want to do this program using another programming language you have to firstly change an array in accending order and compare the first element to the second element.If it is equal then repeated number found.Program is

int[] array=new int[]{1,2,3,4,5,6,7,8,9,4};
Array.Sort(array);
for(int a=0;a<array.Length-1;a++)
{
  if(array[a]==array[a+1]
  {
     Console.WriteLine("This {0} element is repeated",array[a]);
   }
}
Console.WriteLine("Not repeated number in array");

Solution 18 - Java

  1. sort the array O(n ln n)

  2. using the sliding window trick to traverse the array O(n)

    Space is O(1)

     Arrays.sort(input);
     for(int i = 0, j = 1; j < input.length ; j++, i++){
     	if( input[i] == input[j]){
         	System.out.println(input[i]);
         	while(j < input.length && input[i] == input[j]) j++;
     	    i = j - 1;
         }
     }
    

Test case int[] { 1, 2, 3, 7, 7, 8, 3, 5, 7, 1, 2, 7 }

output 1, 2, 3, 7

Solution 19 - Java

Traverse through the array and check the sign of array[abs(array[i])], if positive make it as negative and if it is negative then print it, as follows:

import static java.lang.Math.abs;

public class FindRepeatedNumber {

	private static void findRepeatedNumber(int arr[]) {
		int i;
		for (i = 0; i < arr.length; i++) {
			if (arr[abs(arr[i])] > 0)
				arr[abs(arr[i])] = -arr[abs(arr[i])];
			else {
				System.out.print(abs(arr[i]) + ",");
			}
		}
	}

	public static void main(String[] args) {
		int arr[] = { 4, 2, 4, 5, 2, 3, 1 };
		findRepeatedNumber(arr);
	}
}

Reference: http://www.geeksforgeeks.org/find-duplicates-in-on-time-and-constant-extra-space/

Solution 20 - Java

As described,

> You have an array of numbers from 0 to n-1, one of the numbers is > removed, and replaced with a number already in the array which makes a > duplicate of that number.

I'm assuming elements in the array are sorted except the duplicate entry. If this is the scenario , we can achieve the goal easily as below :

		public static void main(String[] args) {
	//int arr[] = { 0, 1, 2, 2, 3 };
	int arr[] = { 1, 2, 3, 4, 3, 6 };
	int len = arr.length;
	int iMax = arr[0];
	for (int i = 1; i < len; i++) {
		iMax = Math.max(iMax, arr[i]);
		if (arr[i] < iMax) {
			System.out.println(arr[i]);
			break;
		}else if(arr[i+1] <= iMax) {
			System.out.println(arr[i+1]);
			break;
		}
	}
}
  • O(n) time and O(1) space ;please share your thoughts.

Solution 21 - Java

Here is the simple solution with hashmap in O(n) time.

#include<iostream>
#include<map>
using namespace std;

int main()
{
    int a[]={1,3,2,7,5,1,8,3,6,10};
    map<int,int> mp;
    for(int i=0;i<10;i++){
        
        if(mp.find(a[i]) == mp.end())
            mp.insert({a[i],1});
        else
            mp[a[i]]++;
    }

    for(auto i=mp.begin();i!=mp.end();++i){
        if(i->second > 1)
            cout<<i->first<<" ";
    }

}

Solution 22 - Java

int[] a = {5, 6, 8, 9, 3, 4, 2, 9 };
int[] b = {5, 6, 8, 9, 3, 6, 1, 9 };

 for (int i = 0; i < a.Length; i++)
  {
     if (a[i] != b[i])
      {
       Console.Write("Original Array manipulated at position {0}  + "\t\n"  
                             + "and the element is {1} replaced by {2} ", i, 
                             a[i],b[i] + "\t\n" );
       break;               
      }      
  }
   Console.Read();

   ///use break if want to check only one manipulation in original array.
   ///If want to check more then one manipulation in original array, remove break
   

Solution 23 - Java

This video If Programming Was An Anime is too fun not to share. It is the same problem and the video has the answers:

  1. Sorting
  2. Creating a hashmap/dictionary.
  3. Creating an array. (Though this is partially skipped over.)
  4. Using the Tortoise and Hare Algorithm.

Note: This problem is more of a trivia problem than it is real world. Any solution beyond a hashmap is premature optimization, except in rare limited ram situations, like embedded programming.

Furthermore, when is the last time you've seen in the real world an array where all of the variables within the array fit within the size of the array? Eg, if the data in the array is bytes (0-255) when do you have an array 256 elements or larger without nulls or inf within it, and you need to find a duplicate number? This scenario is so rare you will probably never get to use this trick in your entire career.

Because it is a trivia problem and is not real world the question, I'd be cautious accepting an offer from a company that asks trivia questions like this, because people will pass the interview by sheer luck instead of skill. This implies the devs there are not guaranteed to be skilled, which unless you're okay teaching your seniors skills, you might have a bad time.

Solution 24 - Java

int a[] = {2,1,2,3,4};

int b[] = {0};

for(int i = 0; i < a.size; i++)
{

    if(a[i] == a[i+1])
    {
         //duplicate found
         //copy it to second array
        b[i] = a[i];
    }
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMarwan TushyehView Question on Stackoverflow
Solution 1 - JavariciView Answer on Stackoverflow
Solution 2 - JavaqPCR4virView Answer on Stackoverflow
Solution 3 - Javauser1117564View Answer on Stackoverflow
Solution 4 - JavaEvgeny KluevView Answer on Stackoverflow
Solution 5 - JavaparsifalView Answer on Stackoverflow
Solution 6 - JavaPatricia ShanahanView Answer on Stackoverflow
Solution 7 - JavaStephenyeView Answer on Stackoverflow
Solution 8 - JavaJuan LopesView Answer on Stackoverflow
Solution 9 - JavaAlexWienView Answer on Stackoverflow
Solution 10 - JavaAnkit SharmaView Answer on Stackoverflow
Solution 11 - JavaGil VegliachView Answer on Stackoverflow
Solution 12 - JavaAndrea ScarcellaView Answer on Stackoverflow
Solution 13 - JavaSouravView Answer on Stackoverflow
Solution 14 - JavaNoorus KhanView Answer on Stackoverflow
Solution 15 - JavaMax TomlinsonView Answer on Stackoverflow
Solution 16 - Javahitesh141View Answer on Stackoverflow
Solution 17 - JavaRishabh RawatView Answer on Stackoverflow
Solution 18 - JavaAhDahView Answer on Stackoverflow
Solution 19 - JavaArpit AggarwalView Answer on Stackoverflow
Solution 20 - JavasahaSView Answer on Stackoverflow
Solution 21 - JavaShadab EqbalView Answer on Stackoverflow
Solution 22 - JavaPrakash KumarView Answer on Stackoverflow
Solution 23 - JavaDanielleView Answer on Stackoverflow
Solution 24 - Javamirfan899View Answer on Stackoverflow