How to find a duplicate element in an array of shuffled consecutive integers?

ArraysAlgorithmDuplicates

Arrays Problem Overview


I recently came across a question somewhere:

>Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?

Arrays Solutions


Solution 1 - Arrays

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2

Solution 2 - Arrays

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a

Solution 3 - Arrays

A non destructive version of solution by Franci Penov.

This can be done by making use of the XOR operator.

Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index:                        0, 1, 2, 3, 4

Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.

int i;
int dupe = 0;
for(i = 0; i < N; i++) {
    dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.

The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.

Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)

This makes use of an additional variable so does not meet the requirements in the question completely.

Solution 4 - Arrays

Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.

Solution 5 - Arrays

To paraphrase Francis Penov's solution.

The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.

The solution is:

acc = 0
for i in array: acc = acc ^ i

Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.

acc = 0
for i in len(array): acc = acc ^ i ^ array[i]

Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)

But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.

It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).

Solution 6 - Arrays

Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.

Solution 7 - Arrays

One line solution in Python

arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2

Explanation on why it works is in @Matthieu M.'s answer.

Solution 8 - Arrays

If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.

Solution 9 - Arrays

Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:

sum = 0
for each element of the array:
sum += that element of the array
number_that_occurred_twice = sum - 500500

Solution 10 - Arrays

No extra storage requirement (apart from loop variable).

int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
   array[0] += array[i];
}

printf(
    "Answer : %d\n",
    ( array[0] - (length * (length + 1)) / 2 )
);

Solution 11 - Arrays

Do arguments and callstacks count as auxiliary storage?

int sumRemaining(int* remaining, int count) {
    if (!count) {
        return 0;
    }
    return remaining[0] + sumRemaining(remaining + 1, count - 1);
}

printf("duplicate is %d", sumRemaining(array, 1001) - 500500);


Edit: tail call version

int sumRemaining(int* remaining, int count, int sumSoFar) {
    if (!count) {
        return sumSoFar;
    }
    return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);

Solution 12 - Arrays

n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s

Solution 13 - Arrays

public static void main(String[] args) {
	int start = 1;
	int end = 10;
	int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
	System.out.println(findDuplicate(arr, start, end));
}

static int findDuplicate(int arr[], int start, int end) {
	
	int sumAll = 0;
	for(int i = start; i <= end; i++) {
		sumAll += i;
	}
	System.out.println(sumAll);
	int sumArrElem = 0;
	for(int e : arr) {
		sumArrElem += e;
	}
	System.out.println(sumArrElem);
	return sumArrElem - sumAll;
}

Solution 14 - Arrays

public int duplicateNumber(int[] A) {
    int count = 0;
    for(int k = 0; k < A.Length; k++)
        count += A[k];
    return count - (A.Length * (A.Length - 1) >> 1);
}

Solution 15 - Arrays

A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.

For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:

n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s

Solution 16 - Arrays

Improvement of Fraci's answer based on the property of XORing consecutive values:

int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
   result = result ^ array[i];
}

Where:

// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
    int modulo = x % 4;
    if (modulo == 0)
        return value;
    else if (modulo == 1)
        return 1;
    else if (modulo == 2)
        return i + 1;
    else
        return 0;
}

Or in pseudocode/math lang f(n) defined as (optimized):

if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0

And in canonical form f(n) is:

f(0) = 0
f(n) = f(n-1) xor n

Solution 17 - Arrays

I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.

   for i=0 to n-1
        begin:  
              diff = a[i]-i;
              dup = dup + diff;
        end
   // where dup is the duplicate element..

But by this method I won't be able to find out the index at which the duplicate element is present!

For that I need to traverse the array another time which is not desirable.

Solution 18 - Arrays

My answer to question 2:

Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.

Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,

Thus:

SUM = mySum + x + y;
PROD = myProd* x*y;

Thus:

x*y = PROD/myProd; x+y = SUM - mySum;

We can find x,y if solve this equation.

Solution 19 - Arrays

In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!

In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.

private static int findDuplicated(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    int[] checker = new int[array.length];
    Arrays.fill(checker, -1);
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        int checked = checker[value];
        if (checked == -1) {
            checker[value] = value;
        } else {
            return value;
        }
    }
    return -1;
}

private static int findDuplicatedWithoutAux(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        for (int j = i + 1; j < array.length; j++) {
            int toCompare = array[j];
            if (value == toCompare) {
                return array[i];
            }
        }
    }
    return -1;
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSysAdminView Question on Stackoverflow
Solution 1 - ArraysleppieView Answer on Stackoverflow
Solution 2 - ArraysFranci PenovView Answer on Stackoverflow
Solution 3 - ArrayscodaddictView Answer on Stackoverflow
Solution 4 - ArraysLaurynas BiveinisView Answer on Stackoverflow
Solution 5 - ArraysMatthieu M.View Answer on Stackoverflow
Solution 6 - ArrayskgiannakakisView Answer on Stackoverflow
Solution 7 - ArraysjfsView Answer on Stackoverflow
Solution 8 - ArraysJustin ArdiniView Answer on Stackoverflow
Solution 9 - ArraysMichael Aaron SafyanView Answer on Stackoverflow
Solution 10 - ArraysN 1.1View Answer on Stackoverflow
Solution 11 - ArrayscobbalView Answer on Stackoverflow
Solution 12 - ArraysSanthoshView Answer on Stackoverflow
Solution 13 - ArraysmRazaView Answer on Stackoverflow
Solution 14 - ArraysSunil B NView Answer on Stackoverflow
Solution 15 - ArrayspsihodeliaView Answer on Stackoverflow
Solution 16 - ArraysSeva ParfenovView Answer on Stackoverflow
Solution 17 - ArraysPoulamiView Answer on Stackoverflow
Solution 18 - ArraysZhixi ChenView Answer on Stackoverflow
Solution 19 - Arraysuser3743369View Answer on Stackoverflow