How do you make strings "XML safe"?

PhpXmlCakephp

Php Problem Overview


I am responding to an AJAX call by sending it an XML document through PHP echos. In order to form this XML document, I loop through the records of a database. The problem is that the database includes records that have '<' symbols in them. So naturally, the browser throws an error at that particular spot. How can this be fixed?

Php Solutions


Solution 1 - Php

Since PHP 5.4 you can use:

htmlspecialchars($string, ENT_XML1);

You should specify the encoding, such as:

htmlspecialchars($string, ENT_XML1, 'UTF-8');

Update

Note that the above will only convert:

  • & to &amp;
  • < to &lt;
  • > to &gt;

If you want to escape text for use in an attribute enclosed in double quotes:

htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');

will convert " to &quot; in addition to &, < and >.


And if your attributes are enclosed in single quotes:

htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');

will convert ' to &apos; in addition to &, <, > and ".

(Of course you can use this even outside of attributes).


See the manual entry for htmlspecialchars.

Solution 2 - Php

By either escaping those characters with htmlspecialchars, or, perhaps more appropriately, using a library for building XML documents, such as DOMDocument or XMLWriter.

Another alternative would be to use CDATA sections, but then you'd have to look out for occurrences of ]]>.

Take also into consideration that that you must respect the encoding you define for the XML document (by default UTF-8).

Solution 3 - Php

  1. You can wrap your text as CDATA like this:

    5]]>

see http://www.w3schools.com/xml/xml_cdata.asp

  1. As already someone said: Escape those chars. E.g. like so:

    5<6 and 6>5

Solution 4 - Php

Try this:

$str = htmlentities($str,ENT_QUOTES,'UTF-8');

So, after filtering your data using htmlentities() function, you can use the data in XML tag like:

<mytag>$str</mytag>

Solution 5 - Php

If at all possible, its always a good idea to create your XML using the XML classes rather than string manipulation - one of the benefits being that the classes will automatically escape characters as needed.

Solution 6 - Php

Adding this in case it helps someone.

As I am working with Japanese characters, encoding has also been set appropriately. However, from time to time, I find that htmlentities and htmlspecialchars are not sufficient.

Some user inputs contain special characters that are not stripped by the above functions. In those cases I have to do this:

preg_replace('/[\x00-\x1f]/','',htmlspecialchars($string))

This will also remove certain xml-unsafe control characters like Null character or EOT. You can use this table to determine which characters you wish to omit.

Solution 7 - Php

I prefer the way Golang does quote escaping for XML (and a few extras like newline escaping, and escaping some other characters), so I have ported its XML escape function to PHP below

function isInCharacterRange(int $r): bool {
	return $r == 0x09 ||
			$r == 0x0A ||
			$r == 0x0D ||
			$r >= 0x20 && $r <= 0xDF77 ||
			$r >= 0xE000 && $r <= 0xFFFD ||
			$r >= 0x10000 && $r <= 0x10FFFF;
}

function xml(string $s, bool $escapeNewline = true): string {
	$w = '';

	$Last = 0;
	$l = strlen($s);
	$i = 0;

	while ($i < $l) {
		$r = mb_substr(substr($s, $i), 0, 1);
		$Width = strlen($r);
		$i += $Width;
		switch ($r) {
			case '"':
				$esc = '&#34;';
				break;
			case "'":
				$esc = '&#39;';
				break;
			case '&':
				$esc = '&amp;';
				break;
			case '<':
				$esc = '&lt;';
				break;
			case '>':
				$esc = '&gt;';
				break;
			case "\t":
				$esc = '&#x9;';
				break;
			case "\n":
				if (!$escapeNewline) {
					continue 2;
				}
				$esc = '&#xA;';
				break;
			case "\r":
				$esc = '&#xD;';
				break;
			default:
				if (!isInCharacterRange(mb_ord($r)) || (mb_ord($r) === 0xFFFD && $Width === 1)) {
					$esc = "\u{FFFD}";
					break;
				}

				continue 2;
		}
		$w .= substr($s, $Last, $i - $Last - $Width) . $esc;
		$Last = $i;
	}
	$w .= substr($s, $Last);
	return $w;
}

Note you'll need at least PHP7.2 because of the mb_ord usage, or you'll have to swap it out for another polyfill, but these functions are working great for us!

For anyone curious, here is the relevant Go source https://golang.org/src/encoding/xml/xml.go?s=44219:44263#L1887

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJayD3eView Question on Stackoverflow
Solution 1 - PhpSébastienView Answer on Stackoverflow
Solution 2 - PhpArtefactoView Answer on Stackoverflow
Solution 3 - PhpElvithView Answer on Stackoverflow
Solution 4 - PhpMosiurView Answer on Stackoverflow
Solution 5 - PhpEd SchemborView Answer on Stackoverflow
Solution 6 - PhpReuben L.View Answer on Stackoverflow
Solution 7 - PhpBrian LeishmanView Answer on Stackoverflow