Tetris-ing an array

PhpStringAlgorithm

Php Problem Overview


Consider the following array:

/www/htdocs/1/sites/lib/abcdedd
/www/htdocs/1/sites/conf/xyz
/www/htdocs/1/sites/conf/abc/def
/www/htdocs/1/sites/htdocs/xyz
/www/htdocs/1/sites/lib2/abcdedd

what is the shortest and most elegant way of detecting the common base path - in this case

/www/htdocs/1/sites/

and removing it from all elements in the array?

lib/abcdedd
conf/xyz
conf/abc/def
htdocs/xyz
lib2/abcdedd

Php Solutions


Solution 1 - Php

Write a function longest_common_prefix that takes two strings as input. Then apply it to the strings in any order to reduce them to their common prefix. Since it is associative and commutative the order doesn't matter for the result.

This is the same as for other binary operations like for example addition or greatest common divisor.

Solution 2 - Php

Load them into a trie data structure. Starting from the parent node, see which is having a children count great than one. Once you find that magic node, just dismantle the parent node structure and have the current node as root.

Solution 3 - Php

$common = PHP_INT_MAX;
foreach ($a as $item) {
        $common = min($common, str_common($a[0], $item, $common));
}

$result = array();
foreach ($a as $item) {
        $result[] = substr($item, $common);
}
print_r($result);

function str_common($a, $b, $max)
{
        $pos = 0;
        $last_slash = 0;
        $len = min(strlen($a), strlen($b), $max + 1);
        while ($pos < $len) {
                if ($a{$pos} != $b{$pos}) return $last_slash;
                if ($a{$pos} == '/') $last_slash = $pos;
                $pos++;
        }
        return $last_slash;
}

Solution 4 - Php

Well, considering that you can use XOR in this situation to find the common parts of the string. Any time you xor two bytes that are the same, you get a nullbyte as the output. So we can use that to our advantage:

$first = $array[0];
$length = strlen($first);
$count = count($array);
for ($i = 1; $i < $count; $i++) {
    $length = min($length, strspn($array[$i] ^ $first, chr(0)));
}

After that single loop, the $length variable will be equal to the longest common basepart between the array of strings. Then, we can extract the common part from the first element:

$common = substr($array[0], 0, $length);

And there you have it. As a function:

function commonPrefix(array $strings) {
    $first = $strings[0];
    $length = strlen($first);
    $count = count($strings);
    for ($i = 1; $i < $count; $i++) {
        $length = min($length, strspn($strings[$i] ^ $first, chr(0)));
    }
    return substr($first, 0, $length);
}

Note that it does use more than one iteration, but those iterations are done in libraries, so in interpreted languages this will have a huge efficiency gain...

Now, if you want only full paths, we need to truncate to the last / character. So:

$prefix = preg_replace('#/[^/]*$', '', commonPrefix($paths));

Now, it may overly cut two strings such as /foo/bar and /foo/bar/baz will be cut to /foo. But short of adding another iteration round to determine if the next character is either / or end-of-string, I can't see a way around that...

Solution 5 - Php

A naive approach would be to explode the paths at the / and successive compare every element in the arrays. So e.g. the first element would be empty in all arrays, so it will be removed, the next element will be www, it is the same in all arrays, so it gets removed, etc.

Something like (untested)

$exploded_paths = array();

foreach($paths as $path) {
    $exploded_paths[] = explode('/', $path);
}

$equal = true;
$ref = &$exploded_paths[0]; // compare against the first path for simplicity

while($equal) {   
    foreach($exploded_paths as $path_parts) {
        if($path_parts[0] !== $ref[0]) {
            $equal = false;
            break;
        }
    }
    if($equal) {
        foreach($exploded_paths as &$path_parts) {
            array_shift($path_parts); // remove the first element
        }
    }
}

Afterwards you just have to implode the elements in $exploded_paths again:

function impl($arr) {
    return '/' . implode('/', $arr);
}
$paths = array_map('impl', $exploded_paths);

Which gives me:

Array
(
    [0] => /lib/abcdedd
    [1] => /conf/xyz
    [2] => /conf/abc/def
    [3] => /htdocs/xyz
    [4] => /conf/xyz
)

This might not scale well ;)

Solution 6 - Php

Ok, I'm not sure this is bullet-proof, but I think it works:

echo array_reduce($array, function($reducedValue, $arrayValue) {
    if($reducedValue === NULL) return $arrayValue;
    for($i = 0; $i < strlen($reducedValue); $i++) {
        if(!isset($arrayValue[$i]) || $arrayValue[$i] !== $reducedValue[$i]) {
            return substr($reducedValue, 0, $i);
        }
    }
    return $reducedValue;
});

This will take the first value in the array as reference string. Then it will iterate over the reference string and compare each char with the char of the second string at the same position. If a char doesnt match, the reference string will be shortened to the position of the char and the next string is compared. The function will return the shortest matching string then.

Performance depends on the strings given. The earlier the reference string gets shorter, the quicker the code will finish. I really have no clue how to put that in a formula though.

I found that Artefacto's approach to sort the strings increases performance. Adding

asort($array);
$array = array(array_shift($array), array_pop($array));

before the array_reduce will significantly increase performance.

Also note that this will return the longest matching initial substring, which is more versatile but wont give you the common path. You have to run

substr($result, 0, strrpos($result, '/'));

on the result. And then you can use the result to remove the values

print_r(array_map(function($v) use ($path){
    return str_replace($path, '', $v);
}, $array));

which should give:

[0] => /lib/abcdedd
[1] => /conf/xyz/
[2] => /conf/abc/def
[3] => /htdocs/xyz
[4] => /lib2/abcdedd

Feedback welcome.

Solution 7 - Php

You could remove prefix the fastest way, reading each character only once:

function findLongestWord($lines, $delim = "/")
{
    $max = 0;
    $len = strlen($lines[0]); 

    // read first string once
    for($i = 0; $i < $len; $i++) {
        for($n = 1; $n < count($lines); $n++) {
            if($lines[0][$i] != $lines[$n][$i]) {
                // we've found a difference between current token
                // stop search:
                return $max;
            }
        }
        if($lines[0][$i] == $delim) {
            // we've found a complete token:
            $max = $i + 1;
        }
    }
    return $max;
}

$max = findLongestWord($lines);
// cut prefix of len "max"
for($n = 0; $n < count($lines); $n++) {
    $lines[$n] = substr(lines[$n], $max, $len);
}

Solution 8 - Php

$values = array('/www/htdocs/1/sites/lib/abcdedd',
				'/www/htdocs/1/sites/conf/xyz',
				'/www/htdocs/1/sites/conf/abc/def',
				'/www/htdocs/1/sites/htdocs/xyz',
				'/www/htdocs/1/sites/lib2/abcdedd'
);


function splitArrayValues($r) {
	return explode('/',$r);
}

function stripCommon($values) {
	$testValues = array_map('splitArrayValues',$values);

	$i = 0;
	foreach($testValues[0] as $key => $value) {
		foreach($testValues as $arraySetValues) {
			if ($arraySetValues[$key] != $value) break 2;
		}
		$i++;
	}

	$returnArray = array();
	foreach($testValues as $value) {
		$returnArray[] = implode('/',array_slice($value,$i));
	}

	return $returnArray;
}


$newValues = stripCommon($values);

echo '<pre>';
var_dump($newValues);
echo '</pre>';

EDIT Variant of my original method using an array_walk to rebuild the array

$values = array('/www/htdocs/1/sites/lib/abcdedd',
				'/www/htdocs/1/sites/conf/xyz',
				'/www/htdocs/1/sites/conf/abc/def',
				'/www/htdocs/1/sites/htdocs/xyz',
				'/www/htdocs/1/sites/lib2/abcdedd'
);


function splitArrayValues($r) {
	return explode('/',$r);
}

function rejoinArrayValues(&$r,$d,$i) {
	$r = implode('/',array_slice($r,$i));
}

function stripCommon($values) {
	$testValues = array_map('splitArrayValues',$values);

	$i = 0;
	foreach($testValues[0] as $key => $value) {
		foreach($testValues as $arraySetValues) {
			if ($arraySetValues[$key] != $value) break 2;
		}
		$i++;
	}

	array_walk($testValues, 'rejoinArrayValues', $i);

	return $testValues;
}


$newValues = stripCommon($values);

echo '<pre>';
var_dump($newValues);
echo '</pre>';

EDIT

The most efficient and elegant answer is likely to involve taking functions and methods from each of the provided answers

Solution 9 - Php

This has de advantage of not having linear time complexity; however, for most cases the sort will definitely not be the operation taking more time.

Basically, the clever part (at least I couldn't find a fault with it) here is that after sorting you will only have to compare the first path with the last.

sort($a);
$a = array_map(function ($el) { return explode("/", $el); }, $a);
$first = reset($a);
$last = end($a);
for ($eqdepth = 0; $first[$eqdepth] === $last[$eqdepth]; $eqdepth++) {}
array_walk($a,
    function (&$el) use ($eqdepth) {
        for ($i = 0; $i < $eqdepth; $i++) {
            array_shift($el);
        }
     });
$res = array_map(function ($el) { return implode("/", $el); }, $a);

Solution 10 - Php

I would explode the values based on the / and then use array_intersect_assoc to detect the common elements and ensure they have the correct corresponding index in the array. The resulting array could be recombined to produce the common path.

function getCommonPath($pathArray)
{
    $pathElements = array();
    
    foreach($pathArray as $path)
    {
        $pathElements[] = explode("/",$path);
    }
    
    $commonPath = $pathElements[0];
    
    for($i=1;$i<count($pathElements);$i++)
    {
        $commonPath = array_intersect_assoc($commonPath,$pathElements[$i]);
    }
    
    if(is_array($commonPath) return implode("/",$commonPath);
    else return null;
}

function removeCommonPath($pathArray)
{
    $commonPath = getCommonPath($pathArray());
    
    for($i=0;$i<count($pathArray);$i++)
    {
        $pathArray[$i] = substr($pathArray[$i],str_len($commonPath));
    }
    
    return $pathArray;
}

This is untested, but, the idea is that the $commonPath array only ever contains the elements of the path that have been contained in all path arrays that have been compared against it. When the loop is complete, we simply recombine it with / to get the true $commonPath

Update As pointed out by Felix Kling, array_intersect won't consider paths that have common elements but in different orders... To solve this, I used array_intersect_assoc instead of array_intersect

Update Added code to remove the common path (or tetris it!) from the array as well.

Solution 11 - Php

The problem can be simplified if just viewed from the string comparison angle. This is probably faster than array-splitting:

$longest = $tetris[0];  # or array_pop()
foreach ($tetris as $cmp) {
        while (strncmp($longest+"/", $cmp, strlen($longest)+1) !== 0) {
                $longest = substr($longest, 0, strrpos($longest, "/"));
        }
}

Solution 12 - Php

Perhaps porting the algorithm Python's os.path.commonprefix(m) uses would work?

def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    n = min(len(s1), len(s2))
    for i in xrange(n):
        if s1[i] != s2[i]:
            return s1[:i]
    return s1[:n]

That is, uh... something like

function commonprefix($m) {
  if(!$m) return "";
  $s1 = min($m);
  $s2 = max($m);
  $n = min(strlen($s1), strlen($s2));
  for($i=0;$i<$n;$i++) if($s1[$i] != $s2[$i]) return substr($s1, 0, $i);
  return substr($s1, 0, $n);
}

After that you can just substr each element of the original list with the length of the common prefix as the start offset.

Solution 13 - Php

I'll throw my hat in the ring …

function longestCommonPrefix($a, $b) {
    $i = 0;
    $end = min(strlen($a), strlen($b));
    while ($i < $end && $a[$i] == $b[$i]) $i++;
    return substr($a, 0, $i);
}

function longestCommonPrefixFromArray(array $strings) {
    $count = count($strings);
    if (!$count) return '';
    $prefix = reset($strings);
    for ($i = 1; $i < $count; $i++)
        $prefix = longestCommonPrefix($prefix, $strings[$i]);
    return $prefix;
}

function stripPrefix(&$string, $foo, $length) {
    $string = substr($string, $length);
}

Usage:

$paths = array(
    '/www/htdocs/1/sites/lib/abcdedd',
    '/www/htdocs/1/sites/conf/xyz',
    '/www/htdocs/1/sites/conf/abc/def',
    '/www/htdocs/1/sites/htdocs/xyz',
    '/www/htdocs/1/sites/lib2/abcdedd',
);

$longComPref = longestCommonPrefixFromArray($paths);
array_walk($paths, 'stripPrefix', strlen($longComPref));
print_r($paths);

Solution 14 - Php

Well, there are already some solutions here but, just because it was fun:

$values = array(
    '/www/htdocs/1/sites/lib/abcdedd',
    '/www/htdocs/1/sites/conf/xyz',
    '/www/htdocs/1/sites/conf/abc/def', 
    '/www/htdocs/1/sites/htdocs/xyz',
    '/www/htdocs/1/sites/lib2/abcdedd' 
);

function findCommon($values){
    $common = false;
    foreach($values as &$p){
        $p = explode('/', $p);
        if(!$common){
            $common = $p;
        } else {
            $common = array_intersect_assoc($common, $p);
        }
    }
    return $common;
}
function removeCommon($values, $common){
    foreach($values as &$p){
        $p = explode('/', $p);
        $p = array_diff_assoc($p, $common);
        $p = implode('/', $p);
    }
    
    return $values;
}

echo '<pre>';
print_r(removeCommon($values, findCommon($values)));
echo '</pre>';

Output:

Array
(
    [0] => lib/abcdedd
    [1] => conf/xyz
    [2] => conf/abc/def
    [3] => htdocs/xyz
    [4] => lib2/abcdedd
)

Solution 15 - Php

$arrMain = array(
            '/www/htdocs/1/sites/lib/abcdedd',
            '/www/htdocs/1/sites/conf/xyz',
            '/www/htdocs/1/sites/conf/abc/def',
            '/www/htdocs/1/sites/htdocs/xyz',
            '/www/htdocs/1/sites/lib2/abcdedd'
);
function explodePath( $strPath ){ 
    return explode("/", $strPath);
}

function removePath( $strPath)
{
    global $strCommon;
    return str_replace( $strCommon, '', $strPath );
}
$arrExplodedPaths = array_map( 'explodePath', $arrMain ) ;

//Check for common and skip first 1
$strCommon = '';
for( $i=1; $i< count( $arrExplodedPaths[0] ); $i++)
{
    for( $j = 0; $j < count( $arrExplodedPaths); $j++ )
    {
        if( $arrExplodedPaths[0][ $i ] !== $arrExplodedPaths[ $j ][ $i ] )
        {
            break 2;
        } 
    }
    $strCommon .= '/'.$arrExplodedPaths[0][$i];
}
print_r( array_map( 'removePath', $arrMain ) );

This works fine... similar to mark baker but uses str_replace

Solution 16 - Php

Probably too naive and noobish but it works. I have used this algorithm:

<?php

function strlcs($str1, $str2){
	$str1Len = strlen($str1);
	$str2Len = strlen($str2);
	$ret = array();
 
	if($str1Len == 0 || $str2Len == 0)
		return $ret; //no similarities
 
	$CSL = array(); //Common Sequence Length array
	$intLargestSize = 0;
 
	//initialize the CSL array to assume there are no similarities
	for($i=0; $i<$str1Len; $i++){
		$CSL[$i] = array();
		for($j=0; $j<$str2Len; $j++){
			$CSL[$i][$j] = 0;
		}
	}
 
	for($i=0; $i<$str1Len; $i++){
		for($j=0; $j<$str2Len; $j++){
			//check every combination of characters
			if( $str1[$i] == $str2[$j] ){
				//these are the same in both strings
				if($i == 0 || $j == 0)
					//it's the first character, so it's clearly only 1 character long
					$CSL[$i][$j] = 1; 
				else
					//it's one character longer than the string from the previous character
					$CSL[$i][$j] = $CSL[$i-1][$j-1] + 1; 
 
				if( $CSL[$i][$j] > $intLargestSize ){
					//remember this as the largest
					$intLargestSize = $CSL[$i][$j]; 
					//wipe any previous results
					$ret = array();
					//and then fall through to remember this new value
				}
				if( $CSL[$i][$j] == $intLargestSize )
					//remember the largest string(s)
					$ret[] = substr($str1, $i-$intLargestSize+1, $intLargestSize);
			}
			//else, $CSL should be set to 0, which it was already initialized to
		}
	}
	//return the list of matches
	return $ret;
}


$arr = array(
'/www/htdocs/1/sites/lib/abcdedd',
'/www/htdocs/1/sites/conf/xyz',
'/www/htdocs/1/sites/conf/abc/def',
'/www/htdocs/1/sites/htdocs/xyz',
'/www/htdocs/1/sites/lib2/abcdedd'
);

// find the common substring
$longestCommonSubstring = strlcs( $arr[0], $arr[1] );

// remvoe the common substring
foreach ($arr as $k => $v) {
	$arr[$k] = str_replace($longestCommonSubstring[0], '', $v);
}
var_dump($arr);

Output:

array(5) {
  [0]=>
  string(11) "lib/abcdedd"
  [1]=>
  string(8) "conf/xyz"
  [2]=>
  string(12) "conf/abc/def"
  [3]=>
  string(10) "htdocs/xyz"
  [4]=>
  string(12) "lib2/abcdedd"
}

:)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPekkaView Question on Stackoverflow
Solution 1 - PhpstarblueView Answer on Stackoverflow
Solution 2 - PhpbragboyView Answer on Stackoverflow
Solution 3 - PhpSjoerdView Answer on Stackoverflow
Solution 4 - PhpircmaxellView Answer on Stackoverflow
Solution 5 - PhpFelix KlingView Answer on Stackoverflow
Solution 6 - PhpGordonView Answer on Stackoverflow
Solution 7 - PhpDoomsdayView Answer on Stackoverflow
Solution 8 - PhpMark BakerView Answer on Stackoverflow
Solution 9 - PhpArtefactoView Answer on Stackoverflow
Solution 10 - PhpBrendan BullenView Answer on Stackoverflow
Solution 11 - PhpmarioView Answer on Stackoverflow
Solution 12 - PhpAKXView Answer on Stackoverflow
Solution 13 - PhprikView Answer on Stackoverflow
Solution 14 - PhpacmView Answer on Stackoverflow
Solution 15 - PhpKoolKabinView Answer on Stackoverflow
Solution 16 - PhpRichard KnopView Answer on Stackoverflow