How to check if a String contains only ASCII?

JavaStringCharacter EncodingAscii

Java Problem Overview


The call Character.isLetter(c) returns true if the character is a letter. But is there a way to quickly find if a String only contains the base characters of ASCII?

Java Solutions


Solution 1 - Java

From Guava 19.0 onward, you may use:

boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);

This uses the matchesAllOf(someString) method which relies on the factory method ascii() rather than the now deprecated ASCII singleton.

Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20 (space) such as tabs, line-feed / return but also BEL with code 0x07 and DEL with code 0x7F.

This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000 or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.

For earlier Guava versions without the ascii() method you may write:

boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);

Solution 2 - Java

You can do it with java.nio.charset.Charset.

import java.nio.charset.Charset;

public class StringUtils {
  
  public static boolean isPureAscii(String v) {
    return Charset.forName("US-ASCII").newEncoder().canEncode(v);
    // or "ISO-8859-1" for ISO Latin 1
    // or StandardCharsets.US_ASCII with JDK1.7+
  }

  public static void main (String args[])
    throws Exception {

     String test = "Réal";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     test = "Real";
     System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
     
     /*
      * output :
      *   Réal isPureAscii() : false
      *   Real isPureAscii() : true
      */
  }
}

Detect non-ASCII character in a String

Solution 3 - Java

Here is another way not depending on a library but using a regex.

You can use this single line:

text.matches("\\A\\p{ASCII}*\\z")

Whole example program:

public class Main {
	public static void main(String[] args) {
		char nonAscii = 0x00FF;
		String asciiText = "Hello";
		String nonAsciiText = "Buy: " + nonAscii;
		System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
		System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
	}
}

Understanding the regex :

  • li \\A : Beginning of input
  • \\p{ASCII} : Any ASCII character
  • * : all repetitions
  • \\z : End of input

Solution 4 - Java

Iterate through the string and make sure all the characters have a value less than 128.

Java Strings are conceptually encoded as UTF-16. In UTF-16, the ASCII character set is encoded as the values 0 - 127 and the encoding for any non ASCII character (which may consist of more than one Java char) is guaranteed not to include the numbers 0 - 127

Solution 5 - Java

Or you copy the code from the IDN class.

// to check if a string only contains US-ASCII code point
//
private static boolean isAllASCII(String input) {
    boolean isASCII = true;
    for (int i = 0; i < input.length(); i++) {
        int c = input.charAt(i);
        if (c > 0x7F) {
            isASCII = false;
            break;
        }
    }
    return isASCII;
}

Solution 6 - Java

commons-lang3 from Apache contains valuable utility/convenience methods for all kinds of 'problems', including this one.

System.out.println(StringUtils.isAsciiPrintable("!@£$%^&!@£$%^"));

Solution 7 - Java

try this:

for (char c: string.toCharArray()){
  if (((int)c)>127){
    return false;
  } 
}
return true;

Solution 8 - Java

This will return true if String only contains ASCII characters and false when it does not

Charset.forName("US-ASCII").newEncoder().canEncode(str)

If You want to remove non ASCII , here is the snippet:

if(!Charset.forName("US-ASCII").newEncoder().canEncode(str)) {
                        str = str.replaceAll("[^\\p{ASCII}]", "");
                    }

Solution 9 - Java

In Kotlin:

fun String.isAsciiString() : Boolean =
    this.toCharArray().none { it < ' ' || it > '~' }

Solution 10 - Java

Iterate through the string, and use charAt() to get the char. Then treat it as an int, and see if it has a unicode value (a superset of ASCII) which you like.

Break at the first you don't like.

Solution 11 - Java

private static boolean isASCII(String s) 
{
    for (int i = 0; i < s.length(); i++) 
        if (s.charAt(i) > 127) 
            return false;
    return true;
}

Solution 12 - Java

In Java 8 and above, one can use String#codePoints in conjunction with IntStream#allMatch.

boolean allASCII = str.codePoints().allMatch(c -> c < 128);

Solution 13 - Java

It was possible. Pretty problem.

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;

public class EncodingTest {

	static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII")
			.newEncoder();

	public static void main(String[] args) {

		String testStr = "¤EÀsÆW°ê»Ú®i¶T¤¤¤ß3¼Ó®i¶TÆU2~~KITEC 3/F Rotunda 2";
		String[] strArr = testStr.split("~~", 2);
		int count = 0;
		boolean encodeFlag = false;

		do {
			encodeFlag = asciiEncoderTest(strArr[count]);
			System.out.println(encodeFlag);
			count++;
		} while (count < strArr.length);
	}

	public static boolean asciiEncoderTest(String test) {
		boolean encodeFlag = false;
		try {
			encodeFlag = asciiEncoder.canEncode(new String(test
					.getBytes("ISO8859_1"), "BIG5"));
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		}
		return encodeFlag;
	}
}

Solution 14 - Java

//return is uppercase or lowercase
public boolean isASCIILetter(char c) {
  return (c > 64 && c < 91) || (c > 96 && c < 123);
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTambourineManView Question on Stackoverflow
Solution 1 - JavaColinDView Answer on Stackoverflow
Solution 2 - JavaRealHowToView Answer on Stackoverflow
Solution 3 - JavaArne DeutschView Answer on Stackoverflow
Solution 4 - JavaJeremyPView Answer on Stackoverflow
Solution 5 - JavaZarathustraView Answer on Stackoverflow
Solution 6 - JavafjkjavaView Answer on Stackoverflow
Solution 7 - JavapforyogurtView Answer on Stackoverflow
Solution 8 - Javamike oganyanView Answer on Stackoverflow
Solution 9 - Javasteven smithView Answer on Stackoverflow
Solution 10 - JavaThorbjørn Ravn AndersenView Answer on Stackoverflow
Solution 11 - JavaPhilView Answer on Stackoverflow
Solution 12 - JavaUnmitigatedView Answer on Stackoverflow
Solution 13 - Javauser3614583View Answer on Stackoverflow
Solution 14 - JavaLukas GreblikasView Answer on Stackoverflow