Find XOR of all numbers in a given range

Algorithm

Algorithm Problem Overview


You are given a large range [a,b] where 'a' and 'b' can be typically between 1 and 4,000,000,000 inclusive. You have to find out the XOR of all the numbers in the given range.

This problem was used in TopCoder SRM. I saw one of the solutions submitted in the match and I'm not able to figure out how its working.

Could someone help explain the winning solution:

long long f(long long a) {
     long long res[] = {a,1,a+1,0};
     return res[a%4];
}

long long getXor(long long a, long long b) {
     return f(b)^f(a-1);
}

Here, getXor() is the actual function to calculate the xor of all number in the passed range [a,b] and "f()" is a helper function.

Algorithm Solutions


Solution 1 - Algorithm

This is a pretty clever solution -- it exploits the fact that there is a pattern of results in the running XORs. The f() function calculates the XOR total run from [0, a]. Take a look at this table for 4-bit numbers:

0000 <- 0  [a]
0001 <- 1  [1]
0010 <- 3  [a+1]
0011 <- 0  [0]
0100 <- 4  [a]
0101 <- 1  [1]
0110 <- 7  [a+1]
0111 <- 0  [0]
1000 <- 8  [a]
1001 <- 1  [1]
1010 <- 11 [a+1]
1011 <- 0  [0]
1100 <- 12 [a]
1101 <- 1  [1]
1110 <- 15 [a+1]
1111 <- 0  [0]

Where the first column is the binary representation and then the decimal result and its relation to its index (a) into the XOR list. This happens because all the upper bits cancel and the lowest two bits cycle every 4. So, that's how to arrive at that little lookup table.

Now, consider for a general range of [a,b]. We can use f() to find the XOR for [0,a-1] and [0,b]. Since any value XOR'd with itself is zero, the f(a-1) just cancels out all the values in the XOR run less than a, leaving you with the XOR of the range [a,b].

Solution 2 - Algorithm

Adding to FatalError's great answer, the line return f(b)^f(a-1); could be explained better. In short, it's because XOR has these wonderful properties:

  • It's associative - Place brackets wherever you want
  • It's commutative - that means you can move the operators around (they can "commute")

Here's both in action:

(a ^ b ^ c) ^ (d ^ e ^ f) = (f ^ e) ^ (d ^ a ^ b) ^ c
  • It reverses itself

Like this:

a ^ b = c
c ^ a = b

Add and multiply are two examples of other associative/ commutative operators, but they don't reverse themselves. Ok, so, why are these properties important? Well, a simple route is to expand it out into what it really is, and then you can see these properties at work.

First, let's define what we want and call it n:

n      = (a ^ a+1 ^ a+2 .. ^ b)

If it helps, think of XOR (^) as if it was an add.

Let's also define the function:

f(b)   = 0 ^ 1 ^ 2 ^ 3 ^ 4 .. ^ b

b is greater than a, so just by safely dropping in a few extra brackets (which we can because it's associative), we can also say this:

f(b)   = ( 0 ^ 1 ^ 2 ^ 3 ^ 4 .. ^ (a-1) ) ^ (a ^ a+1 ^ a+2 .. ^ b)

Which simplifies to:

f(b)   = f(a-1) ^ (a ^ a+1 ^ a+2 .. ^ b)

f(b)   = f(a-1) ^ n

Next, we use that reversal property and commutivity to give us the magic line:

n      = f(b) ^ f(a-1)

If you've been thinking of XOR like an add, you would've dropped in a subtract there. XOR is to XOR what add is to subtract!

How do I come up with this myself?

Remember the properties of logical operators. Work with them almost like an add or multiply if it helps. It feels unusual that and (&), xor (^) and or (|) are associative, but they are!

Run the naive implementation through first, look for patterns in the output, then start finding rules which confirm the pattern is true. Simplify your implementation even further and repeat. This is probably the route that the original creator took, highlighted by the fact that it's not completely optimal (i.e. use a switch statement rather than an array).

Solution 3 - Algorithm

I found out that the below code is also working like the solution given in the question.

May be this is little optimized but its just what I got from observing repetition like given in the accepted answer,

I would like to know / understand the mathematical proof behind the given code, like explained in the answer by @Luke Briggs

Here is that JAVA code

public int findXORofRange(int m, int n) {
	int[] patternTracker;

	if(m % 2 == 0)
		patternTracker = new int[] {n, 1, n^1, 0};
	else
		patternTracker = new int[] {m, m^n, m-1, (m-1)^n};

	return patternTracker[(n-m) % 4];
}

Solution 4 - Algorithm

I have solved the problem with recursion. I simply divide the dataset into an almost equal part for every iteration.

public int recursion(int M, int N) {
    if (N - M == 1) {
        return M ^ N;
    } else {
        int pivot = this.calculatePivot(M, N);
        if (pivot + 1 == N) {
            return this.recursion(M, pivot) ^ N;
        } else {
            return this.recursion(M, pivot) ^ this.recursion(pivot + 1, N);
        }
    }
}
public int calculatePivot(int M, int N) {
    return (M + N) / 2;
}

Let me know your thoughts over the solution. Happy to get improvement feedbacks. The proposed solution calculates the XOR in 0(log N) complexity.

Thank you

Solution 5 - Algorithm

To support XOR from 0 to N the code given needed to be modified as below,

int f(int a) {
    int []res = {a, 1, a+1, 0};
    return res[a % 4];
}

int getXor(int a, int b) {
    return f(b) ^ f(a);
}

Solution 6 - Algorithm

Adding on even further to FatalError's answer, it's possible to prove (by induction) that the observed pattern in f() will cycle for every 4 numbers.

We're trying to prove that for every integer k >= 0,

f(4k + 1) = 1
f(4k + 2) = 4k + 3
f(4k + 3) = 0
f(4k + 4) = 4k + 4

where f(n) is 1 ^ 2 ^ ... ^ n.

As our base case, we can work out by hand that

f(1) = 1
f(2) = 1 ^ 2 = 3
f(3) = 3 ^ 3 = 0
f(4) = 0 ^ 4 = 4

For our inductive step, assume that these equations are true up to a particular integer 4x (i.e. f(4x) = 4x). We want to show that our equations are true for 4x + 1, 4x + 2, 4x + 3 and 4x + 4.

To help write and visualize the proof, we can let b(x) denote the binary (base-2) string representation of x, for example
b(7) = '111', b(9) = '1001'.

and

b(4x)     = 'b(x)00'
b(4x + 1) = 'b(x)01'
b(4x + 2) = 'b(x)10'
b(4x + 3) = 'b(x)11'

Here is the inductive step:

Assume: f(4x) = 4x = 'b(x)00'
Then:

f(4x + 1) = f(4x)    ^ (4x + 1)  // by definition
          = f(4x)    ^ 'b(x)01'  // by definition
          = 'b(x)00' ^ 'b(x)01'  // from assumption
          = '01'                 // as b(x) ^ b(x) = 0

f(4x + 2) = f(4x + 1) ^ (4x + 2)
          = f(4x + 1) ^ 'b(x)10'       
          = '01'      ^ 'b(x)10'
          = 'b(x)11'             // this is 4x + 3

f(4x + 3) = f(4x + 2) ^ (4x + 3)
          = f(4x + 2) ^ 'b(x)11' 
          = 'b(x)11'  ^ 'b(x)11'
          = '00'

For the last case, we don't use binary strings, 
since we don't know what b(4x + 4) is.

f(4x + 4) = f(4x + 3) ^ (4x + 4)
          = 0         ^ (4x + 4)
          = 4x + 4

So the pattern holds for the next four numbers after 4x, completing the proof.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionrajneesh2k10View Question on Stackoverflow
Solution 1 - AlgorithmFatalErrorView Answer on Stackoverflow
Solution 2 - AlgorithmLuke BriggsView Answer on Stackoverflow
Solution 3 - AlgorithmParth VishvajitView Answer on Stackoverflow
Solution 4 - AlgorithmAbhijeet SonawaneView Answer on Stackoverflow
Solution 5 - AlgorithmMohammad Nazmul HossainView Answer on Stackoverflow
Solution 6 - AlgorithmkcsquaredView Answer on Stackoverflow