Java. Ignore accents when comparing strings

JavaStringCompare

Java Problem Overview


The problem it's easy. Is there any function in JAVA to compare two Strings and return true ignoring the accented chars?

ie

String x = "Joao";
String y = "João";

return that are equal.

Thanks

Java Solutions


Solution 1 - Java

I think you should be using the Collator class. It allows you to set a strength and locale and it will compare characters appropriately.

From the Java 1.6 API:

> You can set a Collator's strength > property to determine the level of > difference considered significant in > comparisons. Four strengths are > provided: PRIMARY, SECONDARY, > TERTIARY, and IDENTICAL. The exact > assignment of strengths to language > features is locale dependant. For > example, in Czech, "e" and "f" are > considered primary differences, while > "e" and "ě" are secondary differences, > "e" and "E" are tertiary differences > and "e" and "e" are identical.

I think the important point here (which people are trying to make) is that "Joao"and "João" should never be considered as equal, but if you are doing sorting you don't want them to be compared based on their ASCII value because then you would have something like Joao, John, João, which is not good. Using the collator class definitely handles this correctly.

Solution 2 - Java

You didn't hear this from me (because I disagree with the premise of the question), but, you can use java.text.Normalizer, and normalize with NFD: this splits off the accent from the letter it's attached to. You can then filter off the accent characters and compare.

Solution 3 - Java

Java's Collator returns 0 for both "a" and "á", if you configure it to ignore diacritics:

public boolean isSame(String a, String b) {
	Collator insenstiveStringComparator = Collator.getInstance();
	insenstiveStringComparator.setStrength(Collator.PRIMARY);
	return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") yields true

Solution 4 - Java

Or use stripAccents from apache StringUtils library if you want to compare/sort ignoring accents :

 public int compareStripAccent(String a, String b) {
    return StringUtils.stripAccents(a).compareTo(StringUtils.stripAccents(b));
}

Solution 5 - Java

public boolean insenstiveStringComparator (String a, String b) {
	java.text.Collator collate = java.text.Collator.getInstance();
	collate.setStrength(java.text.Collator.PRIMARY);
	collate.setDecomposition(java.text.Collator.CANONICAL_DECOMPOSITION); 
	return collate.equals(a, b);	
}

Solution 6 - Java

The problem with these sort of conversions is that there isn't always a clear-cut mapping from accented to non-accented characters. It depends on codepages, localizations, etc. For example, is this a with an accent equivalent to an "a"? Not a problem for a human, but trickier for the computer.

AFAIK Java does not have a built in conversion that can look up the current localization options and make these sort of conversions. You may need some external library that handles unicode better, like ICU (http://site.icu-project.org/ )

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionframaraView Question on Stackoverflow
Solution 1 - JavaDaveJohnstonView Answer on Stackoverflow
Solution 2 - JavaChris Jester-YoungView Answer on Stackoverflow
Solution 3 - JavaBenny BottemaView Answer on Stackoverflow
Solution 4 - JavaDanielView Answer on Stackoverflow
Solution 5 - JavaCarlos Federico Lopez SpindolaView Answer on Stackoverflow
Solution 6 - JavaUriView Answer on Stackoverflow