How can I compare two sets of 1000 numbers against each other?
PhpJavascriptSqlAlgorithmPhp Problem Overview
I must check approximately 1000 numbers against 1000 other numbers.
I loaded both and compared them server-side:
foreach( $numbers1 as $n1 ) {
foreach( $numbers2 as $n2 ) {
if( $n1 == $n2 ) {
doBla();
}
}
}
This took a long time, so I tried to do the same comparison client side using two hidden
div
elements. Then compared them using JavaScript. It still takes 45 seconds to load the page (using hidden div
elements).
I do not need to load the numbers that are not the same.
Is there a faster algorithm? I am thinking of comparing them database side and just load the error numbers, then do an Ajax call for the remaining non-error numbers. But is a MySQL database fast enough?
Php Solutions
Solution 1 - Php
Sort the lists first. Then you can walk up both lists from the start, comparing as you go.
The loop would look something like this:
var A = getFirstArray().sort(), B = getSecondArray().sort();
var i = 0, j = 0;
while (i < A.length && j < B.length) {
if (A[i] === B[j]) {
doBla(A[i]);
i++; j++;
}
else if (A[i] < B[j]) {
i++;
}
else
j++;
}
(That's JavaScript; you could do it server-side too, but I don't know PHP.)
Edit — just to be fair to all the hashtable fans (whom I respect of course), it's pretty easy to do that in JavaScript:
var map = {};
for (var i = 0; i < B.length; ++i) map[B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map[A[i]]) doBla(A[i]);
Or if the numbers are or might be floats:
var map = {};
for (var i = 0; i < B.length; ++i) map['' + B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map['' + A[i]]) doBla(A[i]);
Since numbers are pretty cheap to hash (even in JavaScript, converting to string before hashing is surprisingly cheap), this would be pretty fast.
Solution 2 - Php
Solution 3 - Php
In database terms this can a join of 1000 rows to another 1000 rows. Any modern database system can handle this.
select x from table1
inner join table2
on table1.x = table2.y
where table1
and table2
are the rows concerned and could be the same table.
Solution 4 - Php
What you have shouldnt take that long - what does doBla() do? I suspect that is taking the time? Comparing two sets of 1000000 numbers with the same algorithm takes no time at all..
This is hilarious - the number of optimisation techniques as answers - the problem is not your algorithm - it is whatever doBla() does that is taking the time by a factor many times greater than any optimisation would help you :) esp. given the sets are only 1000 long and you have to sort them first..
Solution 5 - Php
Maybe just intersect the array values to find numbers existing in both arrays?
$result = array_intersect($numbers1, $numbers2);
foreach ($result as $val)
doBla();
Solution 6 - Php
If you sort list2 first and then do a binary search for each number in list1 you'll see a huge speed increase.
I'm not a PHP guy, but this should give you the idea:
sort($numbers2);
foreach($numbers1 as $n1)
{
if (BinarySearch($numbers2, $n1) >= 0) {
doBla();
}
}
Obviously not being a PHP guy I don't know the library, but I'm sure sorting and binary searching should be easy enough to find.
Note: In case you're not familiar with a binary search; you're sorting list2 because binary searches need to operate on sorted lists.
Solution 7 - Php
Sort them first.
Solution 8 - Php
I'm not a PHP expert, so this may need some debugging, but you can do this easily in O(n) time:
// Load one array into a hashtable, keyed by the number: O(n).
$keys1 = [];
foreach($numbers1 as $n1) $keys1[$n1] = true;
// Find the intersections with the other array:
foreach($numbers2 as $n2) { // O(n)
if (isset($keys1[$n2]) { // O(1)
doBla();
}
}
Regardless, the intersection isn't where your time is going. Even a bad O(n^2) implementation like you have now should be able to go through 1000 numbers in a second.
Solution 9 - Php
Stop- why are you doing this?
If the numbers are already in a SQL database, then do a join and let the DB figure out the most efficient route.
If they aren't in a database, then I'm betting you've gone off track somewhere and really ought to reconsider how you got here.
Solution 10 - Php
$same_numbers = array_intersect($numbers1, $$numbers2);
foreach($same_numbers as $n)
{
doBla();
}
Solution 11 - Php
Sort both lists, then walk both lists at the same time using the old-master new-master sequential update pattern. As long as you can sort the data it is the fastest way since your really only walking the list once, to the longest length of the largest list.
Solution 12 - Php
Your code is simply more complicated then in needs to be.
Assuming what you're looking for is that the numbers in each position match (and not just that the array contains the same numbers), you can flatten your loop to a single for.
<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);
// The loop you care about.
for($i=0; $i<1000; $i++) if ($first_array[$i] != $second_array[$i]) echo "Error at $i: first_array was {$first_array[$i]}, second was {$second_array[$i]}<br>";
?>
Using the code above, you will only loop 1000 times, as opposed to looping 1000000 times.
Now, if you need to just check that a number appears or does not appear in the arrays, use array_diff and array_intersect as follows:
<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);
$matches = array_intersect($first_array, $second_array);
$differences = array_diff($first_array, $second_array);
?>
Solution 13 - Php
Maybe I'm not seeing something here but this looks like a classic case of set intersection. Here's a few lines in perl that'll do it.
> foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } > > @union = keys %union; > @isect = keys %isect;
At the end of these lines of code @isect will contain all numbers that are in both @a and @b. I'm sure this is translatable to php more or less directly. FWIW, this is my favorite piece of code from the Perl Cookbook.
Solution 14 - Php
You can do it in O(n) time if you use bucket sort. Assuming you know the maximum value the numbers can take (although there are ways around that).
Solution 15 - Php
I think it would be much easier to use the built in array_intersect function. Using your example, you could do:
$results = array_intersect($numbers1, $numbers2);
foreach($results as $rk => $rv) {
doSomething($rv);
}
Solution 16 - Php
A better way would be to do something like this:
// 1. Create a hash map from one of the lists.
var hm = { };
for (var i in list1) {
if (!hm[list1[i]]) {
hm[list1[i]] = 1;
} else { hm[list1[i]] += 1; }
}
// 2. Lookup each element in the other list.
for (var i in list2) {
if (hm[list2[i]] >= 1) {
for (var j = 0; j < hm[list2[i]]; ++j) {
doBla();
}
}
}
This is guaranteed O(n) [assuming insertion an lookup in a hash map is O(1) amortized].
Update: The worst case of this algorithm would be O(n2) and there is no way to reduce -- unless you change the semantics of the program. This is because in the worst case, the program will call doBla() n2 number of times if all the numbers in both the lists are the same. However, if both the lists have unique numbers (i.e. generally unique within a list), then the runtime would tend towards O(n).
Solution 17 - Php
I'll create a GUI interface in Visual Basic, see if I can track the numbers
Solution 18 - Php
Mergesort both lists, start at the beginning of both lists, and then search through each list for similar numbers at the same time.
So, in pseudocode, it would go something like...
Mergesort (List A);
Mergesort (list B)
$Apos = 0;
$Bpos = 0;
while( $Apos != A.Length && $Bpos != B.length) // while you have not reached the end of either list
{
if (A[$Apos] == B[$Bpos])// found a match
doSomething();
else if (A[$Apos] > B[$Bpos]) // B is lower than A, so have B try and catch up to A.
$Bpos++;
else if (A[$Apos] < B[$Bpos]) // the value at A is less than the value at B, so increment B
$Apos++;
}
If I'm right, the speed of this algorithm is O(n logn).
Solution 19 - Php
I'm not sure why Mrk Mnl was downvoted but the function call is the overhead here.
Push out the matched numbers into another array and doBla() on them after the comparisons. As a test // out doBla() and see if you are experiencing the same performance issue.
Solution 20 - Php
Would it be possible to put these numbers into two database tables, and then do an INNER JOIN
? This will be very efficient and provide only the numbers which are contained in both tables. This is a perfect task for a database.
Solution 21 - Php
-
Create two duplicate collections, preferably ones with fast lookup times, like HashSet or perhaps TreeSet. Avoid Lists as they have very poor lookup times.
-
As you find elements, remove them from both sets. This can reduce lookup times by having fewer elements to sift through in later searches.
Solution 22 - Php
If you're trying to get a list of numbers without any duplicates, you can use a hash:
$unique = array();
foreach ($list1 as $num) {
$unique[$num] = $num;
}
foreach ($list2 as $num) {
$unique[$num] = $num;
}
$unique = array_keys($unique);
It's going to be slightly (very slightly) slower than the array walk method, but it's cleaner in my opinion.
Solution 23 - Php
Merge, sort and then count
<?php
$first = array('1001', '1002', '1003', '1004', '1005');
$second = array('1002', '1003', '1004', '1005', '1006');
$merged = array_merge($first, $first, $second);
sort($merged);
print_r(array_count_values($merged));
?>
Output / the values with a count of three are the ones you want
Array
(
[1001] => 2
[1002] => 3
[1003] => 3
[1004] => 3
[1005] => 3
[1006] => 1
)
Solution 24 - Php
This code will call doBla()
once for each time a value in $numbers1
is found in $numbers2
:
// get [val => occurences, ...] for $numbers2
$counts = array_count_values($numbers2);
foreach ($numbers1 as $n1) {
// if $n1 occurs in $numbers2...
if (isset($counts[$n1])) {
// call doBla() once for each occurence
for ($i=0; $i < $counts[$n1]; $i++) {
doBla();
}
}
}
If you only need to call doBla()
once if a match is found:
foreach ($numbers1 as $n1) {
if (in_array($n1, $numbers2))
doBla();
}
If $numbers1
and $numbers2
will only contain unique values, or if the number of times any specific value occurs in both arrays is not important, array_intersect()
will do the job:
$dups = array_intersect($numbers1, $numbers2);
foreach ($dups as $n)
doBla();
I agree with several earlier posts that the calls to doBla()
are probably taking more time than iterating over the arrays.
Solution 25 - Php
This problem can be break into 2 tasks. 1st task is finding all combinations (n^2-n)/2. For n=1000 the solution is x=499500. The 2nd task is to loop through all x numbers and compare them with the function doBla().
function getWayStr(curr) {
var nextAbove = -1;
for (var i = curr + 1; i < waypoints.length; ++i) {
if (nextAbove == -1) {
nextAbove = i;
} else {
wayStr.push(waypoints[i]);
wayStr.push(waypoints[curr]);
}
}
if (nextAbove != -1) {
wayStr.push(waypoints[nextAbove]);
getWayStr(nextAbove);
wayStr.push(waypoints[curr]);
}
}
Solution 26 - Php
Use WebAssembly rather than JavaScript.