Remove HTML Tags in Javascript with Regex
JavascriptRegexJavascript Problem Overview
I am trying to remove all the html tags out of a string in Javascript. Heres what I have... I can't figure out why its not working....any know what I am doing wrong?
<script type="text/javascript">
var regex = "/<(.|\n)*?>/";
var body = "<p>test</p>";
var result = body.replace(regex, "");
alert(result);
</script>
Thanks a lot!
Javascript Solutions
Solution 1 - Javascript
Try this, noting that the grammar of HTML is too complex for regular expressions to be correct 100% of the time:
var regex = /(<([^>]+)>)/ig
, body = "<p>test</p>"
, result = body.replace(regex, "");
console.log(result);
If you're willing to use a library such as jQuery, you could simply do this:
console.log($('<p>test</p>').text());
Solution 2 - Javascript
This is an old question, but I stumbled across it and thought I'd share the method I used:
var body = '<div id="anid">some <a href="link">text</a></div> and some more text';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;
sanitized
will now contain: "some text and some more text"
Simple, no jQuery needed, and it shouldn't let you down even in more complex cases.
Warning
This can't safely deal with user content, because it's vulnerable to script injections. For example, running this:
var body = '<img src=fake onerror=alert("dangerous")> Hello';
var temp = document.createElement("div");
temp.innerHTML = body;
var sanitized = temp.textContent || temp.innerText;
Leads to an alert being emitted.
Solution 3 - Javascript
This worked for me.
var regex = /( |<([^>]+)>)/ig
, body = tt
, result = body.replace(regex, "");
alert(result);
Solution 4 - Javascript
Here is how TextAngular (WYSISYG Editor) is doing it. I also found this to be the most consistent answer, which is NO REGEX.
@license textAngular
Author : Austin Anderson
License : 2013 MIT
Version 1.5.16
// turn html into pure text that shows visiblity
function stripHtmlToText(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
var res = tmp.textContent || tmp.innerText || '';
res.replace('\u200B', ''); // zero width space
res = res.trim();
return res;
}
Solution 5 - Javascript
This is a solution for HTML tag and etc and you can remove and add conditions to get the text without HTML and you can replace it by any.
convertHtmlToText(passHtmlBlock)
{
str = str.toString();
return str.replace(/<[^>]*(>|$)| |‌|»|«|>/g, 'ReplaceIfYouWantOtherWiseKeepItEmpty');
}
Solution 6 - Javascript
my simple JavaScript library called FuncJS has a function called "strip_tags()" which does the task for you — without requiring you to enter any regular expressions.
For example, say that you want to remove tags from a sentence - with this function, you can do it simply like this:
strip_tags("This string <em>contains</em> <strong>a lot</strong> of tags!");
This will produce "This string contains a lot of tags!".
For a better understanding, please do read the documentation at GitHub FuncJS.
Additionally, if you'd like, please provide some feedback through the form. It would be very helpful to me!
Solution 7 - Javascript
you can use a powerful library for management String which is https://github.com/epeli/underscore.string">undrescore.string.js</a>
_('a <a href="#">link</a>').stripTags()
=> 'a link'
_('a <a href="#">link</a><script>alert("hello world!")</script>').stripTags()
=> 'a linkalert("hello world!")'
Don't forget to import this lib as following :
<script src="underscore.js" type="text/javascript"></script>
<script src="underscore.string.js" type="text/javascript"></script>
<script type="text/javascript"> _.mixin(_.str.exports())</script>
Solution 8 - Javascript
For a proper HTML sanitizer in JS, see http://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
Solution 9 - Javascript
The selected answer doesn't always ensure that HTML is stripped, as it's still possible to construct an invalid HTML string through it by crafting a string like the following.
"<<h1>h1>foo<<//</h1>h1/>"
This input will ensure that the stripping assembles a set of tags for you and will result in:
"<h1>foo</h1>"
additionally jquery's text function will strip text not surrounded by tags.
Here's a function that uses jQuery but should be more robust against both of these cases:
var stripHTML = function(s) {
var lastString;
do {
s = $('<div>').html(lastString = s).text();
} while(lastString !== s)
return s;
};
Solution 10 - Javascript
<html>
<head>
<script type="text/javascript">
function striptag(){
var html = /(<([^>]+)>)/gi;
for (i=0; i < arguments.length; i++)
arguments[i].value=arguments[i].value.replace(html, "")
}
</script>
</head>
<body>
<form name="myform">
<textarea class="comment" title="comment" name=comment rows=4 cols=40></textarea><br>
<input type="button" value="Remove HTML Tags" onClick="striptag(this.form.comment)">
</form>
</body>
</html>
Solution 11 - Javascript
The way I do it is practically a one-liner.
The function creates a Range object and then creates a DocumentFragment in the Range with the string as the child content.
Then it grabs the text of the fragment, removes any "invisible"/zero-width characters, and trims it of any leading/trailing white space.
I realize this question is old, I just thought my solution was unique and wanted to share. :)
function getTextFromString(htmlString) {
return document
.createRange()
// Creates a fragment and turns the supplied string into HTML nodes
.createContextualFragment(htmlString)
// Gets the text from the fragment
.textContent
// Removes the Zero-Width Space, Zero-Width Joiner, Zero-Width No-Break Space, Left-To-Right Mark, and Right-To-Left Mark characters
.replace(/[\u200B-\u200D\uFEFF\u200E\u200F]/g, '')
// Trims off any extra space on either end of the string
.trim();
}
var cleanString = getTextFromString('<p>Hello world! I <em>love</em> <strong>JavaScript</strong>!!!</p>');
alert(cleanString);
Solution 12 - Javascript
If you want to do this with a library and are not using JQuery, the best JS library specifically for this purpose is striptags.
It is heavier than a regex (17.9kb), but if you need greater security than a regex can provide/don't care about the extra 17.6kb, then it's the best solution.
Solution 13 - Javascript
Like others have stated, regex will not work. Take a moment to read my article about why you cannot and should not try to parse html with regex, which is what you're doing when you're attempting to strip html from your source string.