Replacing   from javascript dom text node

Javascript Problem Overview

I am processing xhtml using javascript. I am getting the text content for a div node by concatenating the nodeValue of all child nodes where nodeType == Node.TEXT_NODE.

The resulting string sometimes contains a non-breaking space entity. How do I replace this with a regular space character?

My div looks like this...

> <div><b>Expires On</b> Sep 30, 2009 06:30 AM</div>

The following suggestions found on the web did not work:

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");


var cleanText = replaceHtmlEntities(text);

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

Any suggestions?

Javascript Solutions

Solution 1 - Javascript

This is much easier than you're making it. The text node will not have the literal string " " in it, it'll have have the corresponding character with code 160.

function replaceNbsps(str) {
  var re = new RegExp(String.fromCharCode(160), "g");
  return str.replace(re, " ");
}

textNode.nodeValue = replaceNbsps(textNode.nodeValue);

UPDATE

Even easier:

textNode.nodeValue = textNode.nodeValue.replace(/\u00a0/g, " ");

Solution 2 - Javascript

If you only need to replace   then you can use a far simpler regex:

var textWithNBSpaceReplaced = originalText.replace(/ /g, ' ');

Also, there is a typo in your div example, it says &nnbsp; instead of  .

Solution 3 - Javascript

That first line is pretty messed up. It only needs to be:

var cleanText = text.replace(/\xA0/g,' ');

That should be all you need.

Solution 4 - Javascript

I think when you define a function with "var foo = function() {...};", the function is only defined after that line. In other words, try this:

var replaceHtmlEntites = (function() {
  var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
  var translate = {
    "nbsp": " ",
    "amp" : "&",
    "quot": "\"",
    "lt"  : "<",
    "gt"  : ">"
  };
  return function(s) {
    return ( s.replace(translate_re, function(match, entity) {
      return translate[entity];
    }) );
  }
})();

var cleanText = text.replace(/^\xa0*([^\xa0]*)\xa0*$/g,"");
cleanText = replaceHtmlEntities(text);

Edit: Also, only use "var" the first time you declare a variable (you're using it twice on the cleanText variable).

Edit 2: The problem is the spelling of the function name. You have "var replaceHtmlEntites =". It should be "var replaceHtmlEntities ="

Solution 5 - Javascript

i used this, and it worked:

var cleanText = text.replace(/&amp;nbsp;/g,"");

Solution 6 - Javascript

var text = "&quot;&nbsp;&amp;&lt;&gt;";
text = text.replaceHtmlEntites();

String.prototype.replaceHtmlEntites = function() {
var s = this;
var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
var translate = {"nbsp": " ","amp" : "&","quot": "\"","lt"  : "<","gt"  : ">"};
return ( s.replace(translate_re, function(match, entity) {
  return translate[entity];
}) );
};

try this.....this worked for me

Solution 7 - Javascript

Removes everything between & and ; which all such symbols have. if you juts want to get rid of them.

text.replace(/&.*;/g,'');

Solution 8 - Javascript

for me replace doesn't work... try this code:

str = str.split("&quot;").join('"');

Solution 9 - Javascript

A way to hack this in is to replace any empty line with two or more spaces with some newlines and a token. Then post markdown, replace paragraphs with just that token to line breaks.

// replace empty lines with "EMPTY_LINE"
rawMdText = rawMdText.replace(/\n  +(?=\n)/g, "\n\nEMPTY_LINE\n");
// put <br> at the end of any other line with two spaces
rawMdText = rawMdText.replace(/  +\n/, "<br>\n");

// parse
let rawHtml = markdownParse(rawMdText);

// for any paragraphs that end with a newline (injected above) 
// and are followed by multiple empty lines leading to
// another paragraph, condense them into one paragraph
mdHtml = mdHtml.replace(/(<br>\s*<\/p>\s*)(<p>EMPTY_LINE<\/p>\s*)+(<p>)/g, (match) => {
  return match.match(/EMPTY_LINE/g).map(() => "<br>").join("");
});

// for basic newlines, just replace them
mdHtml = mdHtml.replace(/<p>EMPTY_LINE<\/p>/g, "<br>");

What this does is finds every new line with nothing but a couple spaces+. It uses look ahead so that it starts at the right place for the next replace, it'll break on two lines in a row without that.

Then markdown will parse those lines into paragraphs containing nothing but the token "EMPTY_LINE". So you can go through the rawHtml and replace those with line breaks.

As a bonus, the replace function will condense all line break paragraphs into an uppper and lower paragraph if they exist.

In effect, you'd use it like this:

A line with spaces at end  
  
  
and empty lines with spaces in between will condense into a multi-line paragraph.

A line with no spaces at end
  
  
and lines with spaces in between will be two paragraphs with extra lines between.

And the output would be this:

<p>
  A line with spaces at end<br>
  <br>
  <br>
  and empty lines with spaces in between will condense into a multi-line paragraph.
</p>

<p>A line with no spaces at end</p>
<br>
<br>
<p>and lines with spaces in between will be two paragraphs with extra lines between.</p>

Content Type	Original Author	Original Content on Stackoverflow
Question	user158678	View Question on Stackoverflow
Solution 1 - Javascript	Tim Down	View Answer on Stackoverflow
Solution 2 - Javascript	bobbymcr	View Answer on Stackoverflow
Solution 3 - Javascript	brianary	View Answer on Stackoverflow
Solution 4 - Javascript	Kip	View Answer on Stackoverflow
Solution 5 - Javascript	mohamida	View Answer on Stackoverflow
Solution 6 - Javascript	Amit Sharma	View Answer on Stackoverflow
Solution 7 - Javascript	Andi Giga	View Answer on Stackoverflow
Solution 8 - Javascript	לבני מלכה	View Answer on Stackoverflow
Solution 9 - Javascript	Seph Reed	View Answer on Stackoverflow