PHP explode the string, but treat words in quotes as a single word

PhpQuotesExplodeStr Replace

Php Problem Overview


How can I explode the following string:

Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor

into

array("Lorem", "ipsum", "dolor sit amet", "consectetur", "adipiscing elit", "dolor")

So that the text in quotation is treated as a single word.

Here's what I have for now:

$mytext = "Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor"
$noquotes = str_replace("%22", "", $mytext");
$newarray = explode(" ", $noquotes);

but my code divides each word into an array. How do I make words inside quotation marks treated as one word?

Php Solutions


Solution 1 - Php

You could use a preg_match_all(...):

$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

which will produce:

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => "dolor sit amet"
            [3] => consectetur
            [4] => "adipiscing \"elit"
            [5] => dolor
        )

)

And as you can see, it also accounts for escaped quotes inside quoted strings.

EDIT

A short explanation:

"           # match the character '"'
(?:         # start non-capture group 1 
  \\        #   match the character '\'
  .         #   match any character except line breaks
  |         #   OR
  [^\\"]    #   match any character except '\' and '"'
)*          # end non-capture group 1 and repeat it zero or more times
"           # match the character '"'
|           # OR
\S+         # match a non-whitespace character: [^\s] and repeat it one or more times

And in case of matching %22 instead of double quotes, you'd do:

preg_match_all('/%22(?:\\\\.|(?!%22).)*%22|\S+/', $text, $matches);

Solution 2 - Php

This would have been much easier with str_getcsv().

$test = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor';
var_dump(str_getcsv($test, ' '));

Gives you

array(6) {
  [0]=>
  string(5) "Lorem"
  [1]=>
  string(5) "ipsum"
  [2]=>
  string(14) "dolor sit amet"
  [3]=>
  string(11) "consectetur"
  [4]=>
  string(15) "adipiscing elit"
  [5]=>
  string(5) "dolor"
}

Solution 3 - Php

You can also try this multiple explode function

function multiexplode ($delimiters,$string)
{

$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);

Solution 4 - Php

I came here with a complex string splitting problem similar to this, but none of the answers here did exactly what I wanted - so I wrote my own.

I am posting it here just in case it is helpful to someone else.

This is probably a very slow and inefficient way to do it - but it works for me.

function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
	$chars = str_split($str);
	$parts = [];
	$nextpart = "";
	$toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
	$depth = 0;
	foreach($chars as $char)
	{
		if(in_array($char, $openers))
			$depth++;
		elseif(in_array($char, $closers))
			$depth--;
		elseif(in_array($char, $togglers))
		{
			if($toggle_states[$char])
				$depth--; // we are inside a toggle block, leave it and decrease the depth
			else
				// we are outside a toggle block, enter it and increase the depth
				$depth++;
			
			// invert the toggle block state
			$toggle_states[$char] = !$toggle_states[$char];
		}
		else
			$nextpart .= $char;
		
		if($depth < 0) $depth = 0;
		
		if(in_array($char, $delimiters) &&
		   $depth == 0 &&
		   !in_array($char, $closers))
		{
			$parts[] = substr($nextpart, 0, -1);
			$nextpart = "";
		}
	}
	if(strlen($nextpart) > 0)
		$parts[] = $nextpart;
	
	return $parts;
}

Usage is as follows. explode_adv takes 5 arguments:

  1. An array of characters that open a block - e.g. [, (, etc.
  2. An array of characters that close a block - e.g. ], ), etc.
  3. An array of characters that toggle a block - e.g. ", ', etc.
  4. An array of characters that should cause a split into the next part.
  5. The string to work on.

This method probably has flaws - edits are welcome.

Solution 5 - Php

In some situations the little known token_get_all() might prove useful:

$tokens = token_get_all("<?php $text ?>");
$separator = ' ';
$items = array();
$item = "";
$last = count($tokens) - 1;
foreach($tokens as $index => $token) {
	if($index != 0 && $index != $last) {
		if(count($token) == 3) {
			if($token[0] == T_CONSTANT_ENCAPSED_STRING) {
				$token = substr($token[1], 1, -1);
			} else {
				$token = $token[1];
			}
		}
		if($token == $separator) {
			$items[] = $item;
			$item = "";
		} else {
			$item .= $token;
		}
	}
}

Results:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor sit amet
    [3] => consectetur
    [4] => adipiscing elit
    [5] => dolor
)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontimofeyView Question on Stackoverflow
Solution 1 - PhpBart KiersView Answer on Stackoverflow
Solution 2 - PhpPetahView Answer on Stackoverflow
Solution 3 - PhpNikzView Answer on Stackoverflow
Solution 4 - PhpstarbeamrainbowlabsView Answer on Stackoverflow
Solution 5 - PhpcleongView Answer on Stackoverflow