htmlentities in PHP but preserving html tags

PhpHtmlStringReplaceHtml Entities

Php Problem Overview


I want to convert all texts in a string into html entities but preserving the HTML tags, for example this:

<p><font style="color:#FF0000">Camión español</font></p>

should be translated into this:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

any ideas?

Php Solutions


Solution 1 - Php

You can get the list of correspondances character => entity used by htmlentities, with the function get_html_translation_table ; consider this code :

$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);

(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)

It will get you something like this :

array
  ' ' => string '&nbsp;' (length=6)
  '¡' => string '&iexcl;' (length=7)
  '¢' => string '&cent;' (length=6)
  '£' => string '&pound;' (length=7)
  '¤' => string '&curren;' (length=8)
  ....
  ....
  ....
  'ÿ' => string '&yuml;' (length=6)
  '"' => string '&quot;' (length=6)
  '<' => string '&lt;' (length=4)
  '>' => string '&gt;' (length=4)
  '&' => string '&amp;' (length=5)

Now, remove the correspondances you don't want :

unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.

And now, you just have to extract the list of keys and values :

$search = array_keys($list);
$values = array_values($list);

And, finally, you can use str_replace to do the replacement :

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);

And you get :

string '<p><font style="color:#FF0000">Cami&Atilde;&sup3;n espa&Atilde;&plusmn;ol</font></p>' (length=84)

Which looks like what you wanted ;-)


Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)

Second edit couple of minutes after : it seem you'll have to use utf8_encode on the $search list, before calling str_replace :-(

Which means using something like this :

$search = array_map('utf8_encode', $search);

Between the call to array_keys and the call to str_replace.

And, this time, you should really get what you wanted :

string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)


And here is the full portion of code :

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);

$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);

And the full output :

string '<p><font style="color:#FF0000">Camión español</font></p>' (length=58)
string '<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>' (length=70)

This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.

Have fun !

Solution 2 - Php

Might not be terribly efficient, but it works

$sample = '<p><font style="color:#FF0000">Camión español</font></p>';

echo htmlspecialchars_decode(
    htmlentities($sample, ENT_NOQUOTES, 'UTF-8', false)
  , ENT_NOQUOTES
);

Solution 3 - Php

This is optimized version of the accepted answer.

$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);

$string = strtr($string, $list);

Solution 4 - Php

No solution short of a parser is going to be correct for all cases. Yours is a good case:

<p><font style="color:#FF0000">Camión español</font></p>

but do you also want to support:

<p><font>true if 5 < a && name == "joe"</font></p>

where you want it to come out as:

<p><font>true if 5 &lt; a &amp;&amp; name == &quot;joe&quot;</font></p>

Question: Can you do the encoding BEFORE you build the HTML. In other words can do something like:

"<p><font>" + htmlentities(inner) + "</font></p>"

You'll save yourself lots of grief if you can do that. If you can't, you'll need some way to skip encoding <, >, and " (as described above), or simply encode it all, and then undo it (eg. replace('&lt;', '<'))

Solution 5 - Php

one-line solution with NO translation table or custom function required:

i know this is an old question, but i recently had to import a static site into a wordpress site and had to overcome this issue:

here is my solution that does not require fiddling with translation tables:

htmlspecialchars_decode( htmlentities( html_entity_decode( $string ) ) );

when applied to the OP's string:

<p><font style="color:#FF0000">Camión español</font></p>

output:

<p><font style="color:#FF0000">Cami&oacute;n espa&ntilde;ol</font></p>

when applied to Luca's string:

<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong></strong><img src="/some/path" /></p></div>

output:

<b>Is 1 < 4?</b>&egrave;<br><i>"then"</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>

Solution 6 - Php

This is a function I've just written which solves this problem in a very elegant way:

First of all, the HTML tags will be extracted from the string, then htmlentities() is executed on every remaining substring and after that the original HTML tags will be inserted at their old position thus resulting in no alternation of the HTML tags. :-)

Have fun:

function htmlentitiesOutsideHTMLTags ($htmlText)
{
	$matches = Array();
	$sep = '###HTMLTAG###';
				
	preg_match_all("@<[^>]*>@", $htmlText, $matches);	
	$tmp = preg_replace("@(<[^>]*>)@", $sep, $htmlText);
	$tmp = explode($sep, $tmp);
	
	for ($i=0; $i<count($tmp); $i++)
		$tmp[$i] = htmlentities($tmp[$i]);
			
	$tmp = join($sep, $tmp);

	for ($i=0; $i<count($matches[0]); $i++)
		$tmp = preg_replace("@$sep@", $matches[0][$i], $tmp, 1);

	return $tmp;
}

Solution 7 - Php

Based on the answer of bflesch, I did some changes to manage string containing less than sign, greater than sign and single quote or double quotes.

function htmlentitiesOutsideHTMLTags ($htmlText, $ent)
{
    $matches = Array();
    $sep = '###HTMLTAG###';

    preg_match_all(":</{0,1}[a-z]+[^>]*>:i", $htmlText, $matches);

    $tmp = preg_replace(":</{0,1}[a-z]+[^>]*>:i", $sep, $htmlText);
    $tmp = explode($sep, $tmp);

    for ($i=0; $i<count($tmp); $i++)
        $tmp[$i] = htmlentities($tmp[$i], $ent, 'UTF-8', false);

    $tmp = join($sep, $tmp);

    for ($i=0; $i<count($matches[0]); $i++)
        $tmp = preg_replace(":$sep:", $matches[0][$i], $tmp, 1);

    return $tmp;
}



Example of use:

$string = '<b>Is 1 < 4?</b>è<br><i>"then"</i> <div style="some:style;"><p>gain some <strong></strong><img src="/some/path" /></p></div>';
$string_entities = htmlentitiesOutsideHTMLTags($string, ENT_QUOTES | ENT_HTML401);
var_dump( $string_entities );

Output is:

string '<b>Is 1 &lt; 4?</b>&egrave;<br><i>&quot;then&quot;</i> <div style="some:style;"><p>gain some <strong>&euro;</strong><img src="/some/path" /></p></div>' (length=150)



You can pass any ent flag according to the htmlentities manual

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionfidoboyView Question on Stackoverflow
Solution 1 - PhpPascal MARTINView Answer on Stackoverflow
Solution 2 - PhpPeter BaileyView Answer on Stackoverflow
Solution 3 - PhpSileNTView Answer on Stackoverflow
Solution 4 - PhpndpView Answer on Stackoverflow
Solution 5 - PhpaequalsbView Answer on Stackoverflow
Solution 6 - PhpbfleschView Answer on Stackoverflow
Solution 7 - PhpLuca BorrioneView Answer on Stackoverflow