How do I remove all HTML tags from a string without knowing which tags are in it?

C#Html

C# Problem Overview


Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?

For example:

string title = "<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)"

The above should really be:

"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"

C# Solutions


Solution 1 - C#

You can use a simple regex like this:

public static string StripHTML(string input)
{
   return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See https://stackoverflow.com/questions/4878452/remove-html-tags-in-string for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: https://stackoverflow.com/questions/12787449/html-agility-pack-removing-unwanted-tags-without-removing-content

Solution 2 - C#

You can parse the string using Html Agility pack and get the InnerText.

    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(@"<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)");
    string result = htmlDoc.DocumentNode.InnerText;

Solution 3 - C#

You can use the below code on your string and you will get the complete string without html part.

string title = "<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)".Replace("&nbsp;",string.Empty);            
        string s = Regex.Replace(title, "<.*?>", String.Empty);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJJ.View Question on Stackoverflow
Solution 1 - C#BidouView Answer on Stackoverflow
Solution 2 - C#ssilas777View Answer on Stackoverflow
Solution 3 - C#VinayView Answer on Stackoverflow