PHP: Fastest way to handle undefined array key

PhpPerformance

Php Problem Overview


in a very tight loop I need to access tens of thousands of values in an array containing millions of elements. The key can be undefined: In that case it shall be legal to return NULL without any error message:

Array key exists: return value of element. Array key does not exist: return null.

I do know multiple solutions:

    if (isset($lookup_table[$key])) {
        return $lookup_table[$key];
    } else {
        return;
    }

or

@return $lookup_table[$key];

or

error_reporting(0);
$return = $lookup_table[$key];
error_reporting(E_ALL);
return $return;

All solutions are far from optimal:

  • The first one requires 2 lookup in the B-TREE: One to check existence, another to retrieve value. That effectively doubles the runtime.
  • The second one uses the error suppression operator, and thus creates a massive overhead on that line.
  • The third one calls the error handler (that will check error_reporting setting and then display nothing) and thereby creates an overhead.

My question is if I miss a way to avoid error handling and yet work with a single Btree lookup?

To answer some questions:

The array caches the results of a complex calculation - to complex to be done in real time. Out of billions of possible values, only millions yield a valid result. The array looks like 1234567 => 23457, 1234999 => 74361, .... That is saved to a PHP file of several megabyte, and include_once-d at the beginning of the execution. Initial load time does not matter. If the key is not found, it simply means that this specific value will not return a valid result. The trouble is to get this done 50k+ per second.

Conclusion

As there is no way found to get the value with a single lookup and without error handling, I have trouble accepting a single answer. Instead I upvoted all the great contributions.

The most valuable inputs where:

  • use array_key_exists, as it is faster than alternatives
  • Check out PHP's QuickHash

There was a lot of confusion on how PHP handles arrays. If you check the source code, you will see that all arrays are balanced trees. Building own lookup methods is common in C and C++, but is not performant in higher script-languages like PHP.

Php Solutions


Solution 1 - Php

###Update

Since PHP 7 you can accomplish this with the null coalesce operator:

return $table[$key] ?? null;

###Old answer

First of all, arrays are not implemented as a B-tree, it's a hash table; an array of buckets (indexed via a hash function), each with a linked list of actual values (in case of hash collisions). This means that lookup times depend on how well the hash function has "spread" the values across the buckets, i.e. the number of hash collisions is an important factor.

Technically, this statement is the most correct:

return array_key_exists($key, $table) ? $table[$key] : null;

This introduces a function call and is therefore much slower than the optimized isset(). How much? ~2e3 times slower.

Next up is using a reference to avoid the second lookup:

$tmp = &$lookup_table[$key];

return isset($tmp) ? $tmp : null;

Unfortunately, this modifies the original $lookup_table array if the item does not exist, because references are always made valid by PHP.

That leaves the following method, which is much like your own:

return isset($lookup_table[$key]) ? $lookup_table[$key] : null;

Besides not having the side effect of references, it's also faster in runtime, even when performing the lookup twice.

You could look into dividing your arrays into smaller pieces as one way to mitigate long lookup times.

Solution 2 - Php

I did some bench marking with the following code:

set_time_limit(100);

$count = 2500000;
$search_index_end = $count * 1.5;
$search_index_start = $count * .5;

$array = array();
for ($i = 0; $i < $count; $i++)
	$array[md5($i)] = $i;

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
	$key = md5($i);
	$test = isset($array[$key]) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
	$key = md5($i);
	$test = array_key_exists($key, $array) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";


$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
	$key = md5($i);
	$test = @$array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

$error_reporting = error_reporting();
error_reporting(0);
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
	$key = md5($i);
	$test = $array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
error_reporting($error_reporting);

$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
	$key = md5($i);
	$tmp = &$array[$key];
	$test = isset($tmp) ? $tmp : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";

and I found that the fastest running test was the one that uses isset($array[$key]) ? $array[$key] : null followed closely by the solution that just disables error reporting.

Solution 3 - Php

This Work for me

{{ isset($array['key']) ? $array['key']: 'Default' }} 

but this is fast

{{ $array['key'] or 'Default' }}

Solution 4 - Php

There are two typical approaches to this.

  1. Define defaults for an undefined key.
  2. Check for undefined key.

Here is how to perform the first and as little code as possible.

$data = array_merge(array($key=>false),$data);
return $data[$key];

Here is how to perform the second.

return isset($data[$key]) ? $data[$key] : false;

Solution 5 - Php

Just a sudden idea that would have to be tested, but did you try using array_intersect_key() to get the existing values and a array_merge to fill() the rest ? It would remove the need of a loop to access the data. Something like that :

$searched_keys = array ('key1' => null, 'key2' => null); // the list of the keys to find

$exiting_values = array_intersect_key($lookup_table, $searched_keys);
$all_values = array_merge($searched_keys, $exiting_keys);

Please note that I did not tried it performance-wise.

Solution 6 - Php

The @ operator and the error_reporting methods will both be slower than using isset. With both of these methods, it modifies the error reporting setting for PHP, but PHP's error handler will still be called. The error handler will check against the error_reporting setting and exit without reporting anything, however this has still taken time.

Solution 7 - Php

I prefer using the isset function instead of escaping the error. I made a function to check the key exists and if not returns a default value, in the case of nested arrays you just need to add the other keys in order:

Nested array lookup:

/**
 * Lookup array value.
 *
 * @param array $array
 * @param array $keys
 * @param $defaultValue
 */
public static function array_key_lookup($array, $keys, $defaultValue)
{
    $value = $array;
    foreach ($keys as $key) {
        if (isset($value[$key])) {
            $value = $value[$key];
        } else {
            $value = $defaultValue;
            break;
        }
    }

    return $value;
}

Usage example:

$array = [
    'key1' => 'value1',
    'key2' => 'value2',
    'key3' => [
        'key3a' => 'value3a',
        'key3b' => 'value3b'
    ]
];

array_key_lookup($array, ['key3', 'key3a'], 'default')
'value3a'

array_key_lookup($array, ['key2', 'key2a'], 'default')
'default'

array_key_lookup($array, ['key2'], 'default')
'value2'

array_key_lookup($array, ['key5'], 'default')
'default'

Escaping the error:

$value = @$array[$key1][$key2] ?: $defaultValue;

Solution 8 - Php

First, re-organize the data for performance by saving a new array where the data is sorted by the keys, but the new array contains a regular numeric index.

This part will be time consuming, but only done once.

 // first sort the array by it's keys
 ksort($data);

 // second create a new array with numeric index
 $tmp = new array();
 foreach($data as $key=>$value)
 {
    $tmp[] = array('key'=>$key,'value'=>$value);
 }
 // now save and use this data instead
 save_to_file($tmp);

Once that is done it should be quick to find the key using a Binary Search. Later you can use a function like this.

  function findKey($key, $data, $start, $end)
  { 
    if($end < $start) 
    { 
        return null; 
    } 

    $mid = (int)(($end - $start) / 2) + $start; 

    if($data[$mid]['key'] > $key) 
    { 
        return findKey($key, $data, $start, $mid - 1); 
    } 
    else if($data[$mid]['key'] < $key) 
    { 
        return findKey($key, $data, $mid + 1, $end); 
    } 

    return $data[$mid]['value'];
 }

To perform a search for a key you would do this.

 $result = findKey($key, $data, 0, count($data));
 if($result === null)
 {
      // key not found.
 }

If the count($data) is done all the time, then you could cache that in the file that you stored the array data.

I suspect this method will be a lot faster in performance then a regular linear search that is repeated against the $data. I can't promise it's faster. Only an octree would be quicker, but the time to build the octree might cancel out the search performance (I've experienced that before). It depends on how much searching in the data you have to do.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionZsolt SzilagyiView Question on Stackoverflow
Solution 1 - PhpJa͢ckView Answer on Stackoverflow
Solution 2 - PhpDiverseAndRemote.comView Answer on Stackoverflow
Solution 3 - PhpAntonio Reyes MontufarView Answer on Stackoverflow
Solution 4 - PhpReactgularView Answer on Stackoverflow
Solution 5 - PhpAraliciaView Answer on Stackoverflow
Solution 6 - PhpthomasrutterView Answer on Stackoverflow
Solution 7 - PhpIvan BabicView Answer on Stackoverflow
Solution 8 - PhpReactgularView Answer on Stackoverflow