how to escape xml entities in javascript?

Javascript

Javascript Problem Overview


In JavaScript (server side nodejs) I'm writing a program which generates xml as output.

I am building the xml by concatenating a string:

str += '<' + key + '>';
str += value;
str += '</' + key + '>';

The problem is: What if value contains characters like '&', '>' or '<'? What's the best way to escape those characters?

or is there any javascript library around which can escape XML entities?

Javascript Solutions


Solution 1 - Javascript

HTML encoding is simply replacing &, ", ', < and > chars with their entity equivalents. Order matters, if you don't replace the & chars first, you'll double encode some of the entities:

if (!String.prototype.encodeHTML) {
  String.prototype.encodeHTML = function () {
    return this.replace(/&/g, '&amp;')
               .replace(/</g, '&lt;')
               .replace(/>/g, '&gt;')
               .replace(/"/g, '&quot;')
               .replace(/'/g, '&apos;');
  };
}

As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value only

Conversely if you want to decode HTML entities1, make sure you decode &amp; to & after everything else so that you don't double decode any entities:

if (!String.prototype.decodeHTML) {
  String.prototype.decodeHTML = function () {
    return this.replace(/&apos;/g, "'")
               .replace(/&quot;/g, '"')
               .replace(/&gt;/g, '>')
               .replace(/&lt;/g, '<')
               .replace(/&amp;/g, '&');
  };
}

1 just the basics, not including &copy; to © or other such things


As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape method to perform this functionality.

Solution 2 - Javascript

This might be a bit more efficient with the same outcome:

function escapeXml(unsafe) {
    return unsafe.replace(/[<>&'"]/g, function (c) {
        switch (c) {
            case '<': return '&lt;';
            case '>': return '&gt;';
            case '&': return '&amp;';
            case '\'': return '&apos;';
            case '"': return '&quot;';
        }
    });
}

Solution 3 - Javascript

If you have jQuery, here's a simple solution:

  String.prototype.htmlEscape = function() {
    return $('<div/>').text(this.toString()).html();
  };

Use it like this:

"<foo&bar>".htmlEscape(); -> "&lt;foo&amp;bar&gt"

Solution 4 - Javascript

you can use the below method. I have added this in prototype for easier access. I have also used negative look-ahead so it wont mess things, if you call the method twice or more.

Usage:

 var original = "Hi&there";
 var escaped = original.EncodeXMLEscapeChars();  //Hi&amp;there

Decoding is automaticaly handeled in XML parser.

Method :

//String Extenstion to format string for xml content.
//Replces xml escape chracters to their equivalent html notation.
String.prototype.EncodeXMLEscapeChars = function () {
    var OutPut = this;
    if ($.trim(OutPut) != "") {
        OutPut = OutPut.replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;").replace(/'/g, "&#39;");
        OutPut = OutPut.replace(/&(?!(amp;)|(lt;)|(gt;)|(quot;)|(#39;)|(apos;))/g, "&amp;");
        OutPut = OutPut.replace(/([^\\])((\\\\)*)\\(?![\\/{])/g, "$1\\\\$2");  //replaces odd backslash(\\) with even.
    }
    else {
        OutPut = "";
    }
    return OutPut;
};

Solution 5 - Javascript

Caution, all the regexing isn't good if you have XML inside XML.
Instead loop over the string once, and substitute all escape characters.
That way, you can't run over the same character twice.

function _xmlAttributeEscape(inputString)
{
    var output = [];

    for (var i = 0; i < inputString.length; ++i)
    {
        switch (inputString[i])
        {
            case '&':
                output.push("&amp;");
                break;
            case '"':
                output.push("&quot;");
                break;
            case "<":
                output.push("&lt;");
                break;
            case ">":
                output.push("&gt;");
                break;
            default:
                output.push(inputString[i]);
        }


    }

    return output.join("");
}

Solution 6 - Javascript

I originally used the accepted answer in production code and found that it was actually really slow when used heavily. Here is a much faster solution (runs at over twice the speed):

   var escapeXml = (function() {
        var doc = document.implementation.createDocument("", "", null)
        var el = doc.createElement("temp");
        el.textContent = "temp";
        el = el.firstChild;
        var ser =  new XMLSerializer();
        return function(text) {
            el.nodeValue = text;
            return ser.serializeToString(el);
        };
    })();

console.log(escapeXml("<>&")); //&lt;&gt;&amp;

Solution 7 - Javascript

maybe you can try this,

function encodeXML(s) {
  const dom = document.createElement('div')
  dom.textContent = s
  return dom.innerHTML
}

reference

Solution 8 - Javascript

Technically, &, < and > aren't valid XML entity name characters. If you can't trust the key variable, you should filter them out.

If you want them escaped as HTML entities, you could use something like http://www.strictly-software.com/htmlencode .

Solution 9 - Javascript

if something is escaped from before, you could try this since this will not double escape like many others

function escape(text) {
    return String(text).replace(/(['"<>&'])(\w+;)?/g, (match, char, escaped) => {
        if(escaped) 
            return match
        
        switch(char) {
            case '\'': return '&quot;'
            case '"': return '&apos;'
            case '<': return '&lt;'
            case '>': return '&gt;'
            case '&': return '&amp;'
        }
    })
}

Solution 10 - Javascript

Adding on to ZZZZBov's answer, I find this a bit cleaner and easier to read:

const encodeXML = (str) =>
	str
		.replace(/&/g, '&amp;')
		.replace(/</g, '&lt;')
		.replace(/>/g, '&gt;')
		.replace(/"/g, '&quot;')
		.replace(/'/g, '&apos;');

Additionally, all five characters can be found here for example: https://www.sitemaps.org/protocol.html

Note that this only encodes values (as other have stated).

Solution 11 - Javascript

This is simple:

sText = ("" + sText).split("<").join("&lt;").split(">").join("&gt;").split('"').join("&#34;").split("'").join("&#39;");

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionZo72View Question on Stackoverflow
Solution 1 - JavascriptzzzzBovView Answer on Stackoverflow
Solution 2 - JavascripthgoeblView Answer on Stackoverflow
Solution 3 - JavascriptlambshaanxyView Answer on Stackoverflow
Solution 4 - JavascriptsudhAnsu63View Answer on Stackoverflow
Solution 5 - JavascriptStefan SteigerView Answer on Stackoverflow
Solution 6 - JavascriptjordancpaulView Answer on Stackoverflow
Solution 7 - JavascriptcrownView Answer on Stackoverflow
Solution 8 - JavascriptJohan B.W. de VriesView Answer on Stackoverflow
Solution 9 - JavascriptLostfieldsView Answer on Stackoverflow
Solution 10 - JavascriptJustinView Answer on Stackoverflow
Solution 11 - JavascriptPer GhoshView Answer on Stackoverflow