Easy way to remove accents from a Unicode string?

JavaStringUnicodeDiacritics

Java Problem Overview


I want to change this sentence :

> Et ça sera sa moitié.

To :

> Et ca sera sa moitie.

Is there an easy way to do this in Java, like I would do in Objective-C ?

NSString *str = @"Et ça sera sa moitié.";
NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];

Java Solutions


Solution 1 - Java

Finally, I've solved it by using the Normalizer class.

import java.text.Normalizer;

public static String stripAccents(String s) 
{
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}

Solution 2 - Java

Maybe the easiest and safest way is using StringUtils from Apache Commons Lang

StringUtils.stripAccents(String input)

> Removes diacritics (~= accents) from a string. The case will not be > altered. For instance, 'à' will be replaced by 'a'. Note that > ligatures will be left as is.

StringUtils.stripAccents()

Solution 3 - Java

I guess the only difference is that I use a + and not a [] compared to the solution. I think both works, but it's better to have it here as well.

String normalized = Normalizer.normalize(input, Normalizer.Form.NFD);
String accentRemoved = normalized.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

Solution 4 - Java

For kotlin

fun stripAccents(s: String): String 
{
    var string = Normalizer.normalize(s, Normalizer.Form.NFD)
    string = Regex("\\p{InCombiningDiacriticalMarks}+").replace(string, "")
    return  string
}

Solution 5 - Java

Assuming you are using Java 6 or newer, you might want to take a look at Normalizer, which can decompose accents, then use a regex to strip the combining accents.

Otherwise, you should be able to achieve the same result using ICU4J.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRobView Question on Stackoverflow
Solution 1 - JavaRobView Answer on Stackoverflow
Solution 2 - JavaOndrej BozekView Answer on Stackoverflow
Solution 3 - JavaEpicPandaForceView Answer on Stackoverflow
Solution 4 - JavaTristan RichardView Answer on Stackoverflow
Solution 5 - JavahertzsprungView Answer on Stackoverflow