Finding duplicates in O(n) time and O(1) space

C++ Problem Overview

Input: Given an array of n elements which contains elements from 0 to n-1, with any of these numbers appearing any number of times.

Goal : To find these repeating numbers in O(n) and using only constant memory space.

For example, let n be 7 and array be {1, 2, 3, 1, 3, 0, 6}, the answer should be 1 & 3. I checked similar questions here but the answers used some data structures like HashSet etc.

Any efficient algorithm for the same?

C++ Solutions

Solution 1 - C++

This is what I came up with, which doesn't require the additional sign bit:

for i := 0 to n - 1
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 0 to n - 1
    if A[i] != i then 
        print A[i]
    end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that it may not look O(n) at first blush, but it is - although it has a nested loop, it still runs in O(N) time. A swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The second loop prints the values of x for which A[x] doesn't equal x - since the first loop guarantees that if x exists at least once in the array, one of those instances will be at A[x], this means that it prints those values of x which are not present in the array.

(Ideone link so you can play with it)

Solution 2 - C++

caf's brilliant answer prints each number that appears k times in the array k-1 times. That's useful behaviour, but the question arguably calls for each duplicate to be printed once only, and he alludes to the possibility of doing this without blowing the linear time/constant space bounds. This can be done by replacing his second loop with the following pseudocode:

for (i = 0; i < N; ++i) {
    if (A[i] != i && A[A[i]] == A[i]) {
        print A[i];
        A[A[i]] = i;
    }
}

This exploits the property that after the first loop runs, if any value m appears more than once, then one of those appearances is guaranteed to be in the correct position, namely A[m]. If we are careful we can use that "home" location to store information about whether any duplicates have been printed yet or not.

In caf's version, as we went through the array, A[i] != i implied that A[i] is a duplicate. In my version, I rely on a slightly different invariant: that A[i] != i && A[A[i]] == A[i] implies that A[i] is a duplicate that we haven't seen before. (If you drop the "that we haven't seen before" part, the rest can be seen to be implied by the truth of caf's invariant, and the guarantee that all duplicates have some copy in a home location.) This property holds at the outset (after caf's 1st loop finishes) and I show below that it's maintained after each step.

As we go through the array, success on the A[i] != i part of the test implies that A[i] could be a duplicate that hasn't been seen before. If we haven't seen it before, then we expect A[i]'s home location to point to itself -- that's what's tested for by the second half of the if condition. If that's the case, we print it and alter the home location to point back to this first found duplicate, creating a 2-step "cycle".

To see that this operation doesn't alter our invariant, suppose m = A[i] for a particular position i satisfying A[i] != i && A[A[i]] == A[i]. It's obvious that the change we make (A[A[i]] = i) will work to prevent other non-home occurrences of m from being output as duplicates by causing the 2nd half of their if conditions to fail, but will it work when i arrives at the home location, m? Yes it will, because now, even though at this new i we find that the 1st half of the if condition, A[i] != i, is true, the 2nd half tests whether the location it points to is a home location and finds that it isn't. In this situation we no longer know whether m or A[m] was the duplicate value, but we know that either way, it has already been reported, because these 2-cycles are guaranteed not to appear in the result of caf's 1st loop. (Note that if m != A[m] then exactly one of m and A[m] occurs more than once, and the other does not occur at all.)

Solution 3 - C++

Here is the pseudocode

for i <- 0 to n-1:
   if (A[abs(A[i])]) >= 0 :
       (A[abs(A[i])]) = -(A[abs(A[i])])
   else
      print i
end for

Sample code in C++

Solution 4 - C++

For relatively small N we can use div/mod operations

n.times do |i|
  e = a[i]%n
  a[e] += n
end

n.times do |i| 
  count = a[i]/n
  puts i if count > 1
end

Not C/C++ but anyway

http://ideone.com/GRZPI

Solution 5 - C++

Not really pretty but at least it's easy to see the O(N) and O(1) properties. Basically we scan the array and, for each number we see if the corresponding position has been flagged already-seen-once (N) or already-seen-multiple-times (N+1). If it is flagged already-seen-once, we print it and flag it already-seen-multiple-times. If it is not flagged, we flag it already-seen-once and we move the original value of the corresponding index to the current position (flagging is a destructive operation).

for (i=0; i<a.length; i++) {
  value = a[i];
  if (value >= N)
    continue;
  if (a[value] == N)  {
    a[value] = N+1; 
    print value;
  } else if (a[value] < N) {
    if (value > i)
      a[i--] = a[value];
    a[value] = N;
  }
}

or, better yet (faster, despite the double loop):

for (i=0; i<a.length; i++) {
  value = a[i];
  while (value < N) {
    if (a[value] == N)  {
      a[value] = N+1; 
      print value;
      value = N;
    } else if (a[value] < N) {
      newvalue = value > i ? a[value] : N;
      a[value] = N;
      value = newvalue;
    }
  }
}

Solution 6 - C++

Let's assume that we present this array as a uni-directional graph data structure - each number is a vertex and its index in the array points to another vertex forming an edge of the graph.

For even more simplicity we have indices 0 to n-1 and range of number from 0..n-1. e.g.

   0  1  2  3  4 
 a[3, 2, 4, 3, 1]

0(3) --> 3(3) is a cycle.

Answer: Just traverse the array relying on indices. if a[x] = a[y] then it's a cycle and thus duplicate. Skip to the next index and continue again and so forth until the end of of an array. Complexity: O(n) time and O(1) space.

Solution 7 - C++

Check out the explanation here https://youtu.be/qJ_Y7pKP0e4

code here https://github.com/TechieExpress/DataStructures/blob/main/findDuplicates

Code snippet:

/**
*
* @author techieExpress
*
* You are given a list of n-1 integers and these integers are in the range * of 1 to n.
* Input: Given an array of n elements which contains elements 
* from 0 to n-1, with any of these numbers appearing any number of times.
* 
* Goal: To find these repeating numbers in O(n) and using only constant  * * memory space.
**/

public class findDuplicates {
	
	
	public static void main(String args[])
    {
        int arr[] = { 2,1,1,2 };
  
        for (int i = 0; i < arr.length; i++) {
        	arr[arr[i] % arr.length]
                = arr[arr[i] % arr.length]
                  + arr.length;
        }
        System.out.println("The repeating elements are : ");
        for (int i = 0; i < arr.length; i++) {
        	
        	//System.out.print(numRay[i]);
            if (arr[i] >= arr.length * 2) {
                System.out.println(i + " ");
                arr[i]=arr[i]%arr.length;
            }
        }
    }

}

Solution 8 - C++

We can do it O(n) time and O(1) space complexity by -

take the ith array element.
Make it +ve if it's negative
Last, multiply with -1 to the number getting from array index (ith element).

If the number positive, return the index.

 def findDuplicate(self, arr: List[int]) -> int:
     n=len(arr)
     for i in range(0,n):
     
         arr[(abs(arr[i]))-1]=arr[(abs(arr[i]))-1]*(-1)
         if arr[(abs(arr[i]))-1]>0:
             return abs(arr[i])

Solution 9 - C++

One solution in C is:

#include <stdio.h>

int finddup(int *arr,int len)
{
    int i;
    printf("Duplicate Elements ::");
    for(i = 0; i < len; i++)
    {
        if(arr[abs(arr[i])] > 0)
          arr[abs(arr[i])] = -arr[abs(arr[i])];
        else if(arr[abs(arr[i])] == 0)
        {
             arr[abs(arr[i])] = - len ;
        }
        else
          printf("%d ", abs(arr[i]));
    }
   
}
int main()
{   
    int arr1[]={0,1,1,2,2,0,2,0,0,5};
    finddup(arr1,sizeof(arr1)/sizeof(arr1[0]));
    return 0;
}

It is O(n) time and O(1) space complexity.

Solution 10 - C++

Algorithm can be readily seen in the following C function. Retrieving original array, although not required, will be possible taking each entry modulo n.

void print_repeats(unsigned a[], unsigned n)
{
    unsigned i, _2n = 2*n;
    for(i = 0; i < n; ++i) if(a[a[i] % n] < _2n) a[a[i] % n] += n;
    for(i = 0; i < n; ++i) if(a[i] >= _2n) printf("%u ", i);
    putchar('\n');
}

Ideone Link for testing.

Solution 11 - C++

static void findrepeat()
{
    int[] arr = new int[7] {0,2,1,0,0,4,4};
    
    for (int i = 0; i < arr.Length; i++)
    {
        if (i != arr[i])
        {
            if (arr[i] == arr[arr[i]])
            {
                Console.WriteLine(arr[i] + "!!!");
            }

            int t = arr[i];
            arr[i] = arr[arr[i]];
            arr[t] = t;
        }
    }

    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();
    
    for (int j = 0; j < arr.Length; j++)
    {
        if (j == arr[j])
        {
            arr[j] = 1;
        }
        else
        {
            arr[arr[j]]++;
            arr[j] = 0;
        }
    }
    
    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();
}

Solution 12 - C++

private static void printRepeating(int arr[], int size) {
		int i = 0;
		int j = 1;
		while (i < (size - 1)) {
			if (arr[i] == arr[j]) {
				System.out.println(arr[i] + " repeated at index " + j);
				j = size;
			}
			j++;
			if (j >= (size - 1)) {
				i++;
				j = i + 1;
			}
		}

	}

Solution 13 - C++

If the array is not too large this solution is simpler, It creates another array of same size for ticking.

1 Create a bitmap/array of same size as your input array

 int check_list[SIZE_OF_INPUT];
 for(n elements in checklist)
     check_list[i]=0;    //initialize to zero

2 scan your input array and increment its count in the above array

for(i=0;i<n;i++) // every element in input array
{
  check_list[a[i]]++; //increment its count  
}

3 Now scan the check_list array and print the duplicate either once or as many times they have been duplicated

for(i=0;i<n;i++)
{

    if(check_list[i]>1) // appeared as duplicate
    {
        printf(" ",i);  
    }
}

Of course it takes twice the space consumed by solution given above ,but time efficiency is O(2n) which is basically O(n).

Content Type	Original Author	Original Content on Stackoverflow
Question	Zaki	View Question on Stackoverflow
Solution 1 - C++	caf	View Answer on Stackoverflow
Solution 2 - C++	j_random_hacker	View Answer on Stackoverflow
Solution 3 - C++	Prasoon Saurav	View Answer on Stackoverflow
Solution 4 - C++	hoha	View Answer on Stackoverflow
Solution 5 - C++	CAFxX	View Answer on Stackoverflow
Solution 6 - C++	Ivan Voroshilin	View Answer on Stackoverflow
Solution 7 - C++	user3359003	View Answer on Stackoverflow
Solution 8 - C++	Mayank Maheshwari	View Answer on Stackoverflow
Solution 9 - C++	Anshul garg	View Answer on Stackoverflow
Solution 10 - C++	Apshir	View Answer on Stackoverflow
Solution 11 - C++	Eli	View Answer on Stackoverflow
Solution 12 - C++	user12704811	View Answer on Stackoverflow
Solution 13 - C++	Deepthought	View Answer on Stackoverflow