Remove HTML Tags in Javascript with Regex

JavascriptRegex

Javascript Problem Overview


I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?

<script type="text/javascript">

var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);

</script>

Thanks a lot!

Javascript Solutions


Solution 1 - Javascript

Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:

var regex = /(<([^>]+)>)/ig
,   body = "<p>test</p>"
,   result = body.replace(regex, "");

console.log(result);

If you're willing to use a library such as jQuery, you could simply do this:

console.log($('<p>test</p>').text());

Solution 2 - Javascript

This is an old question, but I stumbled across it and thought I'd share the method I used:

var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

sanitized will now contain: "some text and some more text"

Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.

Warning

This can't safely deal with user content, because it's vulnerable to script injections. For example, running this:

var body = '<img src=fake onerror=alert("dangerous")> Hello';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;

Leads to an alert being emitted.

Solution 3 - Javascript

This worked for me.

   var regex = /(&nbsp;|<([^>]+)>)/ig
      ,   body = tt
     ,   result = body.replace(regex, "");
       alert(result);

Solution 4 - Javascript

Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.

@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
	var tmp = document.createElement("DIV");
	tmp.innerHTML = html;
	var res = tmp.textContent || tmp.innerText || '';
	res.replace('\u200B', ''); // zero width space
	res = res.trim();
	return res;
}

Solution 5 - Javascript

This is a solution for HTML tag and   etc and you can remove and add conditions to get the text without HTML and you can replace it by any.

convertHtmlToText(passHtmlBlock)
{
   str = str.toString();
  return str.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}

Solution 6 - Javascript

my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.

For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:

strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");

This will produce "This string contains a lot of tags!".

For a better understanding, please do read the documentation at GitHub FuncJS.

Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!

Solution 7 - Javascript

you can use a powerful library for management String which is https://github.com/epeli/underscore.string">undrescore.string.js</a>

_('a <a href="#">link</a>').stripTags()

=> 'a link'

_('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()

=> 'a linkalert("hello world!")'

Don't forget to import this lib as following :

        <script src="underscore.js" type="text/javascript"></script>
        <script src="underscore.string.js" type="text/javascript"></script>
        <script type="text/javascript"> _.mixin(_.str.exports())</script>

Solution 8 - Javascript

For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer

Solution 9 - Javascript

The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.

  "<<h1>h1>foo<<//</h1>h1/>"

This input will ensure that the stripping assembles a set of tags for you and will result in:

  "<h1>foo</h1>"

additionally jquery's text function will strip text not surrounded by tags.

Here's a function that uses jQuery but should be more robust against both of these cases:

var stripHTML = function(s) {
    var lastString;

    do {            
        s = $('<div>').html(lastString = s).text();
    } while(lastString !== s) 
        
    return s;
};

Solution 10 - Javascript

<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head> 
<body>
       <form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>

Solution 11 - Javascript

The way I do it is practically a one-liner.

The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.

Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.

I realize this question is old, I just thought my solution was unique and wanted to share. :)

function getTextFromString(htmlString) {
	return document
		.createRange()
		// Creates a fragment and turns the supplied string into HTML nodes
		.createContextualFragment(htmlString)
		// Gets the text from the fragment
		.textContent
		// Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
		.replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
		// Trims off any extra space on either end of the string
		.trim();
}

var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');

alert(cleanString);

Solution 12 - Javascript

If you want to do this with a library and are not using JQuery, the best JS library specifically for this purpose is striptags.

It is heavier than a regex (17.9kb), but if you need greater security than a regex can provide/don't care about the extra 17.6kb, then it's the best solution.

Solution 13 - Javascript

Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGabeView Question on Stackoverflow
Solution 1 - Javascriptkarim79View Answer on Stackoverflow
Solution 2 - JavascriptjsdwView Answer on Stackoverflow
Solution 3 - Javascriptuser1786058View Answer on Stackoverflow
Solution 4 - JavascriptRentering.comView Answer on Stackoverflow
Solution 5 - JavascriptSahil RalkarView Answer on Stackoverflow
Solution 6 - JavascriptSharikul IslamView Answer on Stackoverflow
Solution 7 - JavascriptAbdennour TOUMIView Answer on Stackoverflow
Solution 8 - JavascriptMike SamuelView Answer on Stackoverflow
Solution 9 - JavascriptRick MoynihanView Answer on Stackoverflow
Solution 10 - JavascriptSurya R PraveenView Answer on Stackoverflow
Solution 11 - JavascriptElijahFowlerView Answer on Stackoverflow
Solution 12 - JavascriptAndrewView Answer on Stackoverflow
Solution 13 - JavascriptColeView Answer on Stackoverflow