Remove HTML Tags in Javascript with Regex

Javascript Problem Overview

I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?

<script type="text/javascript">

var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);

</script>

Thanks a lot!

Javascript Solutions

Solution 1 - Javascript

Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:

var regex = /(<([^>]+)>)/ig
,   body = "<p>test</p>"
,   result = body.replace(regex, "");

console.log(result);

If you're willing to use a library such as jQuery, you could simply do this:

console.log($('<p>test</p>').text());

Solution 2 - Javascript

This is an old question, but I stumbled across it and thought I'd share the method I used:

var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.

Warning

This can't safely deal with user content, because it's vulnerable to script injections. For example, running this:

var body = '<img src=fake onerror=alert("dangerous")> Hello';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

Leads to an alert being emitted.

Solution 3 - Javascript

This worked for me.

   var regex = /(&nbsp;|<([^>]+)>)/ig
      ,   body = tt
     ,   result = body.replace(regex, "");
       alert(result);

Solution 4 - Javascript

Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.

@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
	var tmp = document.createElement("DIV");
	tmp.innerHTML = html;
	var res = tmp.textContent || tmp.innerText || '';
	res.replace('\u200B', ''); // zero width space
	res = res.trim();
	return res;
}

Solution 5 - Javascript

This is a solution for HTML tag and etc and you can remove and add conditions to get the text without HTML and you can replace it by any.

convertHtmlToText(passHtmlBlock)
{
   str = str.toString();
  return str.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}

Solution 6 - Javascript

my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.

For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:

strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");

This will produce "This string contains a lot of tags!".

For a better understanding, please do read the documentation at GitHub FuncJS.

Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!

Solution 7 - Javascript

you can use a powerful library for management String which is https://github.com/epeli/underscore.string">undrescore.string.js</a>

_('a <a href="#">link</a>').stripTags()

=> 'a link'

_('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()

=> 'a linkalert("hello world!")'

Don't forget to import this lib as following :

        <script src="underscore.js" type="text/javascript"></script>
        <script src="underscore.string.js" type="text/javascript"></script>
        <script type="text/javascript"> _.mixin(_.str.exports())</script>

Solution 8 - Javascript

For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

Solution 9 - Javascript

The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.

  "<<h1>h1>foo<<//</h1>h1/>"

This input will ensure that the stripping assembles a set of tags for you and will result in:

  "<h1>foo</h1>"

additionally jquery's text function will strip text not surrounded by tags.

Here's a function that uses jQuery but should be more robust against both of these cases:

var stripHTML = function(s) {
    var lastString;

    do {            
        s = $('<div>').html(lastString = s).text();
    } while(lastString !== s) 
        
    return s;
};

Solution 10 - Javascript

<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head> 
<body>
       <form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>

Solution 11 - Javascript

The way I do it is practically a one-liner.

The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.

Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.

I realize this question is old, I just thought my solution was unique and wanted to share. :)

function getTextFromString(htmlString) {
	return document
		.createRange()
		// Creates a fragment and turns the supplied string into HTML nodes
		.createContextualFragment(htmlString)
		// Gets the text from the fragment
		.textContent
		// Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
		.replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
		// Trims off any extra space on either end of the string
		.trim();
}

var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');

alert(cleanString);

Solution 12 - Javascript

If you want to do this with a library and are not using JQuery, the best JS library specifically for this purpose is striptags.

It is heavier than a regex (17.9kb), but if you need greater security than a regex can provide/don't care about the extra 17.6kb, then it's the best solution.

Solution 13 - Javascript

Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.

Content Type	Original Author	Original Content on Stackoverflow
Question	Gabe	View Question on Stackoverflow
Solution 1 - Javascript	karim79	View Answer on Stackoverflow
Solution 2 - Javascript	jsdw	View Answer on Stackoverflow
Solution 3 - Javascript	user1786058	View Answer on Stackoverflow
Solution 4 - Javascript	Rentering.com	View Answer on Stackoverflow
Solution 5 - Javascript	Sahil Ralkar	View Answer on Stackoverflow
Solution 6 - Javascript	Sharikul Islam	View Answer on Stackoverflow
Solution 7 - Javascript	Abdennour TOUMI	View Answer on Stackoverflow
Solution 8 - Javascript	Mike Samuel	View Answer on Stackoverflow
Solution 9 - Javascript	Rick Moynihan	View Answer on Stackoverflow
Solution 10 - Javascript	Surya R Praveen	View Answer on Stackoverflow
Solution 11 - Javascript	ElijahFowler	View Answer on Stackoverflow
Solution 12 - Javascript	Andrew	View Answer on Stackoverflow
Solution 13 - Javascript	Cole	View Answer on Stackoverflow