How can I compare two sets of 1000 numbers against each other?

PhpJavascriptSqlAlgorithm

Php Problem Overview


I must check approximately 1000 numbers against 1000 other numbers.

I loaded both and compared them server-side:

foreach( $numbers1 as $n1 ) {
  foreach( $numbers2 as $n2 ) {
    if( $n1 == $n2 ) {
      doBla();
    }
  }
}

This took a long time, so I tried to do the same comparison client side using two hidden div elements. Then compared them using JavaScript. It still takes 45 seconds to load the page (using hidden div elements).

I do not need to load the numbers that are not the same.

Is there a faster algorithm? I am thinking of comparing them database side and just load the error numbers, then do an Ajax call for the remaining non-error numbers. But is a MySQL database fast enough?

Php Solutions


Solution 1 - Php

Sort the lists first. Then you can walk up both lists from the start, comparing as you go.

The loop would look something like this:

var A = getFirstArray().sort(), B = getSecondArray().sort();

var i = 0, j = 0;
while (i < A.length && j < B.length) {
    if (A[i] === B[j]) {
        doBla(A[i]);
        i++; j++;
    }
    else if (A[i] < B[j]) {
        i++;
    }
    else
        j++;
}

(That's JavaScript; you could do it server-side too, but I don't know PHP.)

Edit — just to be fair to all the hashtable fans (whom I respect of course), it's pretty easy to do that in JavaScript:

var map = {};
for (var i = 0; i < B.length; ++i) map[B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map[A[i]]) doBla(A[i]);

Or if the numbers are or might be floats:

var map = {};
for (var i = 0; i < B.length; ++i) map['' + B[i]] = true; // Assume integers.
for (var i = 0; i < A.length; ++i) if (map['' + A[i]]) doBla(A[i]);

Since numbers are pretty cheap to hash (even in JavaScript, converting to string before hashing is surprisingly cheap), this would be pretty fast.

Solution 2 - Php

Solution 3 - Php

In database terms this can a join of 1000 rows to another 1000 rows. Any modern database system can handle this.

select x from table1
inner join table2
on table1.x = table2.y

where table1 and table2 are the rows concerned and could be the same table.

Solution 4 - Php

What you have shouldnt take that long - what does doBla() do? I suspect that is taking the time? Comparing two sets of 1000000 numbers with the same algorithm takes no time at all..

This is hilarious - the number of optimisation techniques as answers - the problem is not your algorithm - it is whatever doBla() does that is taking the time by a factor many times greater than any optimisation would help you :) esp. given the sets are only 1000 long and you have to sort them first..

Solution 5 - Php

Maybe just intersect the array values to find numbers existing in both arrays?

$result = array_intersect($numbers1, $numbers2);
foreach ($result as $val)
  doBla();

Solution 6 - Php

If you sort list2 first and then do a binary search for each number in list1 you'll see a huge speed increase.

I'm not a PHP guy, but this should give you the idea:

sort($numbers2);

foreach($numbers1 as $n1)
{
   if (BinarySearch($numbers2, $n1) >= 0) {
     doBla();
 }
}

Obviously not being a PHP guy I don't know the library, but I'm sure sorting and binary searching should be easy enough to find.

Note: In case you're not familiar with a binary search; you're sorting list2 because binary searches need to operate on sorted lists.

Solution 7 - Php

Sort them first.

Solution 8 - Php

I'm not a PHP expert, so this may need some debugging, but you can do this easily in O(n) time:

// Load one array into a hashtable, keyed by the number: O(n).
$keys1 = [];
foreach($numbers1 as $n1) $keys1[$n1] = true;

// Find the intersections with the other array:
foreach($numbers2 as $n2) { // O(n)
  if (isset($keys1[$n2]) { // O(1)
     doBla();
  }
}

Regardless, the intersection isn't where your time is going. Even a bad O(n^2) implementation like you have now should be able to go through 1000 numbers in a second.

Solution 9 - Php

Stop- why are you doing this?

If the numbers are already in a SQL database, then do a join and let the DB figure out the most efficient route.

If they aren't in a database, then I'm betting you've gone off track somewhere and really ought to reconsider how you got here.

Solution 10 - Php

$same_numbers = array_intersect($numbers1, $$numbers2);

foreach($same_numbers as $n)
{
  doBla();
}

Solution 11 - Php

Sort both lists, then walk both lists at the same time using the old-master new-master sequential update pattern. As long as you can sort the data it is the fastest way since your really only walking the list once, to the longest length of the largest list.

Solution 12 - Php

Your code is simply more complicated then in needs to be.

Assuming what you're looking for is that the numbers in each position match (and not just that the array contains the same numbers), you can flatten your loop to a single for.

<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);

// The loop you care about.
for($i=0; $i<1000; $i++) if ($first_array[$i] != $second_array[$i]) echo "Error at $i: first_array was {$first_array[$i]}, second was {$second_array[$i]}<br>";

?>

Using the code above, you will only loop 1000 times, as opposed to looping 1000000 times.

Now, if you need to just check that a number appears or does not appear in the arrays, use array_diff and array_intersect as follows:

<?php
// Fill two arrays with random numbers as proof.
$first_array = array(1000);
$second_array = array(1000);
for($i=0; $i<1000; $i++) $first_array[$i] = rand(0, 1000);
for($i=0; $i<1000; $i++) $second_array[$i] = rand(0, 1000);

$matches = array_intersect($first_array, $second_array);
$differences = array_diff($first_array, $second_array);

?>

Solution 13 - Php

Maybe I'm not seeing something here but this looks like a classic case of set intersection. Here's a few lines in perl that'll do it.

> foreach $e (@a, @b) { $union{$e}++ && $isect{$e}++ } > > @union = keys %union; > @isect = keys %isect;

At the end of these lines of code @isect will contain all numbers that are in both @a and @b. I'm sure this is translatable to php more or less directly. FWIW, this is my favorite piece of code from the Perl Cookbook.

Solution 14 - Php

You can do it in O(n) time if you use bucket sort. Assuming you know the maximum value the numbers can take (although there are ways around that).

http://en.wikipedia.org/wiki/Bucket_sort

Solution 15 - Php

I think it would be much easier to use the built in array_intersect function. Using your example, you could do:

$results = array_intersect($numbers1, $numbers2);
foreach($results as $rk => $rv) {
    doSomething($rv);
}

Solution 16 - Php

A better way would be to do something like this:

// 1. Create a hash map from one of the lists.
var hm = { };
for (var i in list1) {
  if (!hm[list1[i]]) {
    hm[list1[i]] = 1;
  } else { hm[list1[i]] += 1; }
}

// 2. Lookup each element in the other list.
for (var i in list2) {
  if (hm[list2[i]] >= 1) {
    for (var j = 0; j < hm[list2[i]]; ++j) {
      doBla();
    }
  }
}

This is guaranteed O(n) [assuming insertion an lookup in a hash map is O(1) amortized].

Update: The worst case of this algorithm would be O(n2) and there is no way to reduce -- unless you change the semantics of the program. This is because in the worst case, the program will call doBla() n2 number of times if all the numbers in both the lists are the same. However, if both the lists have unique numbers (i.e. generally unique within a list), then the runtime would tend towards O(n).

Solution 17 - Php

I'll create a GUI interface in Visual Basic, see if I can track the numbers

Solution 18 - Php

Mergesort both lists, start at the beginning of both lists, and then search through each list for similar numbers at the same time.

So, in pseudocode, it would go something like...

Mergesort (List A);
Mergesort (list B)

$Apos = 0;
$Bpos = 0;

while( $Apos != A.Length && $Bpos != B.length) // while you have not reached the end of either list
{
if (A[$Apos] == B[$Bpos])// found a match
doSomething();

else if (A[$Apos] > B[$Bpos]) // B is lower than A, so have B try and catch up to A.
$Bpos++;

else if (A[$Apos] < B[$Bpos]) // the value at A is less than the value at B, so increment B
$Apos++;

}

If I'm right, the speed of this algorithm is O(n logn).

Solution 19 - Php

I'm not sure why Mrk Mnl was downvoted but the function call is the overhead here.

Push out the matched numbers into another array and doBla() on them after the comparisons. As a test // out doBla() and see if you are experiencing the same performance issue.

Solution 20 - Php

Would it be possible to put these numbers into two database tables, and then do an INNER JOIN? This will be very efficient and provide only the numbers which are contained in both tables. This is a perfect task for a database.

Solution 21 - Php

  1. Create two duplicate collections, preferably ones with fast lookup times, like HashSet or perhaps TreeSet. Avoid Lists as they have very poor lookup times.

  2. As you find elements, remove them from both sets. This can reduce lookup times by having fewer elements to sift through in later searches.

Solution 22 - Php

If you're trying to get a list of numbers without any duplicates, you can use a hash:

$unique = array();
foreach ($list1 as $num) {
  $unique[$num] = $num;
}
foreach ($list2 as $num) {
  $unique[$num] = $num;
}
$unique = array_keys($unique);

It's going to be slightly (very slightly) slower than the array walk method, but it's cleaner in my opinion.

Solution 23 - Php

Merge, sort and then count
<?php
    $first = array('1001', '1002', '1003', '1004', '1005');
    $second = array('1002', '1003', '1004', '1005', '1006');
    $merged = array_merge($first, $first, $second);
    sort($merged);
    print_r(array_count_values($merged));
?>
Output / the values with a count of three are the ones you want
Array
(
    [1001] => 2
    [1002] => 3
    [1003] => 3
    [1004] => 3
    [1005] => 3
    [1006] => 1
)

Solution 24 - Php

This code will call doBla() once for each time a value in $numbers1 is found in $numbers2:

// get [val => occurences, ...] for $numbers2
$counts = array_count_values($numbers2);
foreach ($numbers1 as $n1) {
    // if $n1 occurs in $numbers2...
    if (isset($counts[$n1])) {
        // call doBla() once for each occurence
        for ($i=0; $i < $counts[$n1]; $i++) {
            doBla();
        }
    }
}

If you only need to call doBla() once if a match is found:

foreach ($numbers1 as $n1) {
    if (in_array($n1, $numbers2))
        doBla();
}

If $numbers1 and $numbers2 will only contain unique values, or if the number of times any specific value occurs in both arrays is not important, array_intersect() will do the job:

$dups = array_intersect($numbers1, $numbers2);
foreach ($dups as $n)
    doBla();

I agree with several earlier posts that the calls to doBla() are probably taking more time than iterating over the arrays.

Solution 25 - Php

This problem can be break into 2 tasks. 1st task is finding all combinations (n^2-n)/2. For n=1000 the solution is x=499500. The 2nd task is to loop through all x numbers and compare them with the function doBla().

function getWayStr(curr) {
 var nextAbove = -1;
 for (var i = curr + 1; i < waypoints.length; ++i) {
  if (nextAbove == -1) {
    nextAbove = i;
   } else {
     wayStr.push(waypoints[i]);
     wayStr.push(waypoints[curr]);
   }
  }
  if (nextAbove != -1) {
    wayStr.push(waypoints[nextAbove]);
    getWayStr(nextAbove);
    wayStr.push(waypoints[curr]);
  }
 } 

Solution 26 - Php

Use WebAssembly rather than JavaScript.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbaklapView Question on Stackoverflow
Solution 1 - PhpPointyView Answer on Stackoverflow
Solution 2 - PhpPhill PaffordView Answer on Stackoverflow
Solution 3 - PhpPreet SanghaView Answer on Stackoverflow
Solution 4 - PhpmarkmnlView Answer on Stackoverflow
Solution 5 - PhpsodView Answer on Stackoverflow
Solution 6 - PhpGiovanni GalboView Answer on Stackoverflow
Solution 7 - PhpJRLView Answer on Stackoverflow
Solution 8 - PhpmunificentView Answer on Stackoverflow
Solution 9 - PhpdethSwatchView Answer on Stackoverflow
Solution 10 - PhpCesarView Answer on Stackoverflow
Solution 11 - PhpskamradtView Answer on Stackoverflow
Solution 12 - PhpJack SheddView Answer on Stackoverflow
Solution 13 - PhpShahbazView Answer on Stackoverflow
Solution 14 - PhpashananView Answer on Stackoverflow
Solution 15 - PhpDavidView Answer on Stackoverflow
Solution 16 - PhpdhruvbirdView Answer on Stackoverflow
Solution 17 - PhpedahsView Answer on Stackoverflow
Solution 18 - PhpwafflesView Answer on Stackoverflow
Solution 19 - PhpMartin BlankView Answer on Stackoverflow
Solution 20 - Phpm.edmondsonView Answer on Stackoverflow
Solution 21 - PhpEdwin BuckView Answer on Stackoverflow
Solution 22 - PhpbrianloveswordsView Answer on Stackoverflow
Solution 23 - PhpjaymzView Answer on Stackoverflow
Solution 24 - PhpgregjorView Answer on Stackoverflow
Solution 25 - PhpMicromegaView Answer on Stackoverflow
Solution 26 - PhpHasan SavranView Answer on Stackoverflow