Efficient list intersection algorithm

AlgorithmListSet Intersection

Algorithm Problem Overview


Given two lists (not necessarily sorted), what is the most efficient non-recursive algorithm to find the set intersection of those lists?
I don't believe I have access to hashing algorithms.

Algorithm Solutions


Solution 1 - Algorithm

You could put all elements of the first list into a hash set. Then, iterate the second one and, for each of its elements, check the hash to see if it exists in the first list. If so, output it as an element of the intersection.

Solution 2 - Algorithm

You might want to take a look at Bloom filters. They are bit vectors that give a probabilistic answer whether an element is a member of a set. Set intersection can be implemented with a simple bitwise AND operation. If you have a large number of null intersections, the Bloom filter can help you eliminate those quickly. You'll still have to resort to one of the other algorithms mentioned here to compute the actual intersection, however. http://en.wikipedia.org/wiki/Bloom_filter

Solution 3 - Algorithm

without hashing, I suppose you have two options:

  • The naive way is going to be compare each element to every other element. O(n^2)
  • Another way would be to sort the lists first, then iterate over them: O(n lg n) * 2 + 2 * O(n)

Solution 4 - Algorithm

From the eviews features list it seems that it supports complex merges and joins (if this is 'join' as in DB terminology, it will compute an intersection). Now dig through your documentation :-)

Additionally, eviews has their own user forum - why not ask there_

Solution 5 - Algorithm

in C++ the following can be tried using STL map

vector<int> set_intersection(vector<int> s1, vector<int> s2){
	
	vector<int> ret;
	map<int, bool> store;
	for(int i=0; i < s1.size(); i++){
		
		store[s1[i]] = true;
	}
	for(int i=0; i < s2.size(); i++){
		
		if(store[s2[i]] == true) ret.push_back(s2[i]);
		
	}
	return ret;
}

Solution 6 - Algorithm

with set 1 build a binary search tree with O(log n) and iterate set2 and search the BST m X O(log n) so total O(log n) + O(m)+O(log n) ==> O(log n)(m+1)

Solution 7 - Algorithm

Here is another possible solution I came up with takes O(nlogn) in time complexity and without any extra storage. You can check it out here https://gist.github.com/4455373

Here is how it works: Assuming that the sets do not contain any repetition, merge all the sets into one and sort it. Then loop through the merged set and on each iteration create a subset between the current index i and i+n where n is the number of sets available in the universe. What we look for as we loop is a repeating sequence of size n equal to the number of sets in the universe.

If that subset at i is equal to that subset at n this means that the element at i is repeated n times which is equal to the total number of sets. And since there are no repetitions in any set that means each of the sets contain that value so we add it to the intersection. Then we shift the index by i + whats remaining between it and n because definitely none of those indexes are going to form a repeating sequence.

Solution 8 - Algorithm

First, sort both lists using quicksort : O(n*log(n). Then, compare the lists by browsing the lowest values first, and add the common values. For example, in lua) :

function findIntersection(l1, l2)
	i, j = 1,1
	intersect = {}

	while i < #l1 and j < #l2 do
		if l1[i] == l2[i] then
			i, j = i + 1, j + 1
			table.insert(intersect, l1[i])
		else if l1[i] > l2[j] then
			l1, l2 = l2, l1
			i, j = j, i
		else
			i = i + 1
		end
	end
	
	return intersect
end

which is O(max(n, m)) where n and m are the sizes of the lists.

EDIT: quicksort is recursive, as said in the comments, but it looks like there are non-recursive implementations

Solution 9 - Algorithm

Using skip pointers and SSE instructions can improve list intersection efficiency.

Solution 10 - Algorithm

Why not implement your own simple hash table or hash set? It's worth it to avoid nlogn intersection if your lists are large as you say.

Since you know a bit about your data beforehand, you should be able to choose a good hash function.

Solution 11 - Algorithm

If there is a support for sets (as you call them in the title) as built-in usually there is a intersection method.

Anyway, as someone said you could do it easily (I will not post code, someone already did so) if you have the lists sorted. If you can't use recursion there is no problem. There are quick sort recursion-less implementations.

Solution 12 - Algorithm

I second the "sets" idea. In JavaScript, you could use the first list to populate an object, using the list elements as names. Then you use the list elements from the second list and see if those properties exist.

Solution 13 - Algorithm

In PHP, something like

function intersect($X) { // X is an array of arrays; returns intersection of all the arrays
  $counts = Array(); $result = Array();
  foreach ($X AS $x) {
    foreach ($x AS $y) { $counts[$y]++; }
  }
  foreach ($counts AS $x => $count) {
    if ($count == count($X)) { $result[] = $x; }
  }
  return $result;
}

Solution 14 - Algorithm

From the definition of Big-Oh notation:

> T(N) = O(f(N)) if there are positive constants c and n 0 such that > T(N) ≤ cf(N) when N ≥ n 0.

Which in practice means that if the two lists are relatively small in size say something less 100 elements in each two for loops works just fine. Loop the first list and look for similar object in the second. In my case it works just fine because I won't have more than 10 - 20 max elements in my lists. However, a good solution is the sort the first O(n log n), sort the second also O(n log n) and merge them, another O(n log n) roughly speeking O(3 n log n), say that the two lists are the same size.

Solution 15 - Algorithm

Time: O(n) Space: O(1) Solution for identifying points of intersection.

For example, the two given nodes will detect the point of intersection by swapping pointers every time they reach the end. Video Explanation Here.

public ListNode getIntersectionNode(ListNode headA, ListNode headB) {
	ListNode pA = headA;
	ListNode pB = headB;
	while (pA != pB) {
		pA = pA == null ? headB : pA.next;
		pB = pB == null ? headA : pB.next;
	}
	return pA;
}

Thanks.

Edit

My interpretation of intersection is finding the point of intersection.

For example:

Intersection

For the given lists A and B, A and B will "meet/intersect" at point c1, and the algo above will return c1. As OP stated that OP has NO access to Hashmaps or some sort, I believe OP is saying that the algo should have O(1) space complexity.

I got this idea from Leetcode some time ago, if interested: Intersection of Two Linked Lists.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavidView Question on Stackoverflow
Solution 1 - AlgorithmFrankView Answer on Stackoverflow
Solution 2 - AlgorithmAneil MallavarapuView Answer on Stackoverflow
Solution 3 - AlgorithmTom RitterView Answer on Stackoverflow
Solution 4 - AlgorithmzvrbaView Answer on Stackoverflow
Solution 5 - AlgorithmquasarView Answer on Stackoverflow
Solution 6 - AlgorithmkhajaView Answer on Stackoverflow
Solution 7 - AlgorithmAyman FarhatView Answer on Stackoverflow
Solution 8 - AlgorithmWookaiView Answer on Stackoverflow
Solution 9 - AlgorithmWolf GarbeView Answer on Stackoverflow
Solution 10 - AlgorithmImranView Answer on Stackoverflow
Solution 11 - AlgorithmAndrea AmbuView Answer on Stackoverflow
Solution 12 - AlgorithmNosrednaView Answer on Stackoverflow
Solution 13 - AlgorithmView Answer on Stackoverflow
Solution 14 - AlgorithmAdelinView Answer on Stackoverflow
Solution 15 - AlgorithmminchaejView Answer on Stackoverflow