what is faster: in_array or isset?

PhpPerformanceMicro Optimization

Php Problem Overview


This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)

I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!in_array($new_val, $a){
        $a[] = $new_val;
        //do other stuff
    }
}
?>

<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
    if(!isset($a[$new_val]){
        $a[$new_val] = true;
        //do other stuff
    }
}
?>

As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value], you can use $a[md5($new_value)]. it can still cause collisions, but would take away from a possible DoS attack when reading from an user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)

Php Solutions


Solution 1 - Php

The answers so far are spot-on. Using isset in this case is faster because

  • It uses an O(1) hash search on the key whereas in_array must check every value until it finds a match.
  • Being an opcode, it has less overhead than calling the in_array built-in function.

These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array to do more searching.

isset:    0.009623
in_array: 1.738441

This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.

$a = array();
for ($i = 0; $i < 10000; ++$i) {
    $v = rand(1, 1000000);
    $a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    isset($a[rand(1, 1000000)]);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    in_array(rand(1, 1000000), $a);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

Solution 2 - Php

> Which is faster: isset() vs in_array()

isset() is faster.

While it should be obvious, isset() only tests a single value. Whereas in_array() will iterate over the entire array, testing the value of each element.

Rough benchmarking is quite easy using microtime().

Results:
Total time isset():    0.002857
Total time in_array(): 0.017103

Note: Results were similar regardless if existed or not.

Code:
<?php
$a = array();
$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    isset($a['key']);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

$start = microtime( true );

for ($i = 0; $i < 10000; ++$i) {
    in_array('key', $a);
}

$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;

exit;
Additional Resources

I'd encourage you to also look at:

Solution 3 - Php

Using isset() takes advantage of speedier lookup because it uses a hash table, avoiding the need for O(n) searches.

The key is hashed first using the djb hash function to determine the bucket of similarly hashed keys in O(1). The bucket is then searched iteratively until the exact key is found in O(n).

Barring any intentional hash collisions, this approach yields much better performance than in_array().

Note that when using isset() in the way that you've shown, passing the final values to another function requires using array_keys() to create a new array. A memory compromise can be made by storing the data in both the keys and values.

Update

A good way to see how your code design decisions affect runtime performance, you can check out the compiled version of your script:

echo isset($arr[123])

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
   1     0  >   ZEND_ISSET_ISEMPTY_DIM_OBJ              2000000  ~0      !0, 123
         1      ECHO                                                 ~0
         2    > RETURN                                               null

echo in_array(123, $arr)

compiled vars:  !0 = $arr
line     # *  op                           fetch      ext  return  operands
-----------------------------------------------------------------------------
   1     0  >   SEND_VAL                                             123
         1      SEND_VAR                                             !0
         2      DO_FCALL                                 2  $0      'in_array'
         3      ECHO                                                 $0
         4    > RETURN                                               null

Not only does in_array() use a relatively inefficient O(n) search, it also needs to be called as a function (DO_FCALL) whereas isset() uses a single opcode (ZEND_ISSET_ISEMPTY_DIM_OBJ) for this.

Solution 4 - Php

The second would be faster, as it is looking only for that specific array key and does not need to iterate over the entire array until it is found (will look at every array element if it is not found)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFabrizioView Question on Stackoverflow
Solution 1 - PhpDavid HarknessView Answer on Stackoverflow
Solution 2 - PhpJason McCrearyView Answer on Stackoverflow
Solution 3 - PhpJa͢ckView Answer on Stackoverflow
Solution 4 - PhpMike BrantView Answer on Stackoverflow