Objective C HTML escape/unescape

IphoneHtmlObjective CCocoa TouchEscaping

Iphone Problem Overview


Wondering if there is an easy way to do a simple HTML escape/unescape in Objective C. What I want is something like this psuedo code:

NSString *string = @"<span>Foo</span>";
[string stringByUnescapingHTML];

Which returns

<span>Foo</span>

Hopefully unescaping all other HTML entities as well and even ASCII codes like Ӓ and the like.

Is there any methods in Cocoa Touch/UIKit to do this?

Iphone Solutions


Solution 1 - Iphone

Check out my NSString category for XMLEntities. There's methods to decode XML entities (including all HTML character references), encode XML entities, stripping tags and removing newlines and whitespace from a string:

- (NSString *)stringByStrippingTags;
- (NSString *)stringByDecodingXMLEntities; // Including all HTML character references
- (NSString *)stringByEncodingXMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;

Solution 2 - Iphone

Another HTML NSString category from Google Toolbox for Mac
Despite the name, this works on iOS too.

http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h

/// Get a string where internal characters that are escaped for HTML are unescaped 
//
///  For example, '&amp;' becomes '&'
///  Handles &#32; and &#x32; cases as well
///
//  Returns:
//    Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;

And I had to include only three files in the project: header, implementation and GTMDefines.h.

Solution 3 - Iphone

This link contains the solution below. Cocoa CF has the CFXMLCreateStringByUnescapingEntities function but that's not available on the iPhone.

@interface MREntitiesConverter : NSObject <NSXMLParserDelegate>{
	NSMutableString* resultString;
}

@property (nonatomic, retain) NSMutableString* resultString;

- (NSString*)convertEntitiesInString:(NSString*)s;

@end


@implementation MREntitiesConverter

@synthesize resultString;

- (id)init
{
	if([super init]) {
		resultString = [[NSMutableString alloc] init];
	}
	return self;
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
		[self.resultString appendString:s];
}

- (NSString*)convertEntitiesInString:(NSString*)s {
	if (!s) {
		NSLog(@"ERROR : Parameter string is nil");
	}
	NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
	NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
	NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease];
	[xmlParse setDelegate:self];
	[xmlParse parse];
	return [NSString stringWithFormat:@"%@",resultString];
}

- (void)dealloc {
	[resultString release];
	[super dealloc];
}

@end

Solution 4 - Iphone

This is an incredibly hacked together solution I did, but if you want to simply escape a string without worrying about parsing, do this:

-(NSString *)htmlEntityDecode:(NSString *)string
    {
        string = [string stringByReplacingOccurrencesOfString:@"&quot;" withString:@"\""];
        string = [string stringByReplacingOccurrencesOfString:@"&apos;" withString:@"'"];
        string = [string stringByReplacingOccurrencesOfString:@"&lt;" withString:@"<"];
        string = [string stringByReplacingOccurrencesOfString:@"&gt;" withString:@">"];
        string = [string stringByReplacingOccurrencesOfString:@"&amp;" withString:@"&"]; // Do this last so that, e.g. @"&amp;lt;" goes to @"&lt;" not @"<"
        
        return string;
    }

I know it's by no means elegant, but it gets the job done. You can then decode an element by calling:

string = [self htmlEntityDecode:string];

Like I said, it's hacky but it works. IF you want to encode a string, just reverse the stringByReplacingOccurencesOfString parameters.

Solution 5 - Iphone

In iOS 7 you can use NSAttributedString's ability to import HTML to convert HTML entities to an NSString.

Eg:

@interface NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString;
@end

@implementation NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString
{
    NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType,
                               NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) };

    NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding];

    return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];
}

@end

Then in your code when you want to clean up the entities:

NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];

This is probably the simplest way, but I don't know how performant it is. You should probably be pretty damn sure the content your "cleaning" doesn't contain any <img> tags or stuff like that because this method will download those images during the HTML to NSAttributedString conversion. :)

Solution 6 - Iphone

Here's a solution that neutralizes all characters (by making them all HTML encoded entities for their unicode value)... Used this for my need (making sure a string that came from the user but was placed inside of a webview couldn't have any XSS attacks):

Interface:

@interface NSString (escape)
- (NSString*)stringByEncodingHTMLEntities;
@end

Implementation:

@implementation NSString (escape)

- (NSString*)stringByEncodingHTMLEntities {
    // Rather then mapping each individual entity and checking if it needs to be replaced, we simply replace every character with the hex entity
        
    NSMutableString *resultString = [NSMutableString string];
    for(int pos = 0; pos<[self length]; pos++)
        [resultString appendFormat:@"&#x%x;",[self characterAtIndex:pos]];
    return [NSString stringWithString:resultString];
}

@end

Usage Example:

UIWebView *webView = [[UIWebView alloc] init];
NSString *userInput = @"<script>alert('This is an XSS ATTACK!');</script>";
NSString *safeInput = [userInput stringByEncodingHTMLEntities];
[webView loadHTMLString:safeInput baseURL:nil];

Your mileage will vary.

Solution 7 - Iphone

The least invasive and most lightweight way to encode and decode HTML or XML strings is to use the GTMNSStringHTMLAdditions CocoaPod.

It is simply the Google Toolbox for Mac NSString category GTMNSString+HTML, stripped of the dependency on GTMDefines.h. So all you need to add is one .h and one .m, and you're good to go.

Example:

#import "GTMNSString+HTML.h"

// Encoding a string with XML / HTML elements
NSString *stringToEncode = @"<TheBeat>Goes On</TheBeat>";
NSString *encodedString = [stringToEncode gtm_stringByEscapingForHTML];

// encodedString looks like this now:
// &lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;

// Decoding a string with XML / HTML encoded elements
NSString *stringToDecode = @"&lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;";
NSString *decodedString = [stringToDecode gtm_stringByUnescapingFromHTML];

// decodedString looks like this now:
// <TheBeat>Goes On</TheBeat>

Solution 8 - Iphone

This is an easy to use NSString category implementation:

It is far from complete but you can add some missing entities from here: http://code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

Usage:

#import "NSString+HTML.h"

NSString *raw = [NSString stringWithFormat:@"<div></div>"];
NSString *escaped = [raw htmlEscapedString];

Solution 9 - Iphone

The MREntitiesConverter above is an HTML stripper, not encoder.

If you need an encoder, go here: https://stackoverflow.com/questions/803676/encode-nsstring-for-xml-html

Solution 10 - Iphone

MREntitiesConverter doesn't work for escaping malformed xml. It will fail on a simple URL:

http://www.google.com/search?client=safari&rls=en&q=fail&ie=UTF-8&oe=UTF-8

Solution 11 - Iphone

If you need to generate a literal you might consider using a tool like this:

http://www.freeformatter.com/java-dotnet-escape.html#ad-output

to accomplish the work for you.

See also this answer.

Solution 12 - Iphone

This easiest solution is to create a category as below:

Here’s the category’s header file:

#import <Foundation/Foundation.h>
@interface NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding;
@end

And here’s the implementation:

#import "NSString+URLEncoding.h"
@implementation NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding {
	return (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
			   (CFStringRef)self,
			   NULL,
			   (CFStringRef)@"!*'\"();:@&=+$,/?%#[]% ",
			   CFStringConvertNSStringEncodingToEncoding(encoding));
}
@end

And now we can simply do this:

NSString *raw = @"hell & brimstone + earthly/delight";
NSString *url = [NSString stringWithFormat:@"http://example.com/example?param=%@",
			[raw urlEncodeUsingEncoding:NSUTF8Encoding]];
NSLog(url);

The credits for this answer goes to the website below:-

http://madebymany.com/blog/url-encoding-an-nsstring-on-ios

Solution 13 - Iphone

Why not just using ?

NSData *data = [s dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSString *result = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
return result;

Noob question but in my case it works...

Solution 14 - Iphone

> This is an old answer that I posted some years ago. My intention was > not to provide a "good" and "respectable" solution, but a "hacky" one > that might be useful under some circunstances. Please, don't use this solution unless nothing else works. > > Actually, it works perfectly fine in many situations that other > answers don't because the UIWebView is doing all the work. And you can > even inject some javascript (which can be dangerous and/or useful). The performance should be horrible, but actually is not that bad.

There is another solution that has to be mentioned. Just create a UIWebView, load the encoded string and get the text back. It escapes tags "<>", and also decodes all html entities (e.g. "&gt;") and it might work where other's don't (e.g. using cyrillics). I don't think it's the best solution, but it can be useful if the above solutions doesn't work.

Here is a small example using ARC:

@interface YourClass() <UIWebViewDelegate>

    @property UIWebView *webView;

@end

@implementation YourClass 

- (void)someMethodWhereYouGetTheHtmlString:(NSString *)htmlString {
    self.webView = [[UIWebView alloc] init];
    NSString *htmlString = [NSString stringWithFormat:@"<html><body>%@</body></html>", self.description];
    [self.webView loadHTMLString:htmlString baseURL:nil];
    self.webView.delegate = self;
}

- (void)webView:(UIWebView *)webView didFailLoadWithError:(NSError *)error {
    self.webView = nil;
}

- (void)webViewDidFinishLoad:(UIWebView *)webView {
    self.webView = nil;
    NSString *escapedString = [self.webView stringByEvaluatingJavaScriptFromString:@"document.body.textContent;"];
}

- (void)webViewDidStartLoad:(UIWebView *)webView {
    // Do Nothing
}

@end

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlex WayneView Question on Stackoverflow
Solution 1 - IphoneMichael WaterfallView Answer on Stackoverflow
Solution 2 - IphoneNikita RybakView Answer on Stackoverflow
Solution 3 - IphoneAndrew GrantView Answer on Stackoverflow
Solution 4 - IphoneAndrew KozlikView Answer on Stackoverflow
Solution 5 - IphoneorjView Answer on Stackoverflow
Solution 6 - IphoneBadPirateView Answer on Stackoverflow
Solution 7 - IphoneT BlankView Answer on Stackoverflow
Solution 8 - IphoneBlagoView Answer on Stackoverflow
Solution 9 - IphoneBrain2000View Answer on Stackoverflow
Solution 10 - IphonerichcollinsView Answer on Stackoverflow
Solution 11 - IphonediadyneView Answer on Stackoverflow
Solution 12 - IphoneHashim AkhtarView Answer on Stackoverflow
Solution 13 - IphonekheraudView Answer on Stackoverflow
Solution 14 - IphoneFranMowinckelView Answer on Stackoverflow