How to check if a String contains another String in a case insensitive manner in Java?

JavaString

Java Problem Overview


Say I have two strings,

String s1 = "AbBaCca";
String s2 = "bac";

I want to perform a check returning that s2 is contained within s1. I can do this with:

return s1.contains(s2);

I am pretty sure that contains() is case sensitive, however I can't determine this for sure from reading the documentation. If it is then I suppose my best method would be something like:

return s1.toLowerCase().contains(s2.toLowerCase());

All this aside, is there another (possibly better) way to accomplish this without caring about case-sensitivity?

Java Solutions


Solution 1 - Java

Yes, contains is case sensitive. You can use java.util.regex.Pattern with the CASE_INSENSITIVE flag for case insensitive matching:

Pattern.compile(Pattern.quote(wantedStr), Pattern.CASE_INSENSITIVE).matcher(source).find();

EDIT: If s2 contains regex special characters (of which there are many) it's important to quote it first. I've corrected my answer since it is the first one people will see, but vote up Matt Quail's since he pointed this out.

Solution 2 - Java

One problem with the answer by Dave L. is when s2 contains regex markup such as \d, etc.

You want to call Pattern.quote() on s2:

Pattern.compile(Pattern.quote(s2), Pattern.CASE_INSENSITIVE).matcher(s1).find();

Solution 3 - Java

You can use

org.apache.commons.lang3.StringUtils.containsIgnoreCase("AbBaCca", "bac");

The Apache Commons library is very useful for this sort of thing. And this particular one may be better than regular expressions as regex is always expensive in terms of performance.

Solution 4 - Java

A Faster Implementation: Utilizing String.regionMatches()

Using regexp can be relatively slow. It (being slow) doesn't matter if you just want to check in one case. But if you have an array or a collection of thousands or hundreds of thousands of strings, things can get pretty slow.

The presented solution below doesn't use regular expressions nor toLowerCase() (which is also slow because it creates another strings and just throws them away after the check).

The solution builds on the String.regionMatches() method which seems to be unknown. It checks if 2 String regions match, but what's important is that it also has an overload with a handy ignoreCase parameter.

public static boolean containsIgnoreCase(String src, String what) {
	final int length = what.length();
	if (length == 0)
		return true; // Empty string is contained
		
	final char firstLo = Character.toLowerCase(what.charAt(0));
	final char firstUp = Character.toUpperCase(what.charAt(0));
	
	for (int i = src.length() - length; i >= 0; i--) {
		// Quick check before calling the more expensive regionMatches() method:
		final char ch = src.charAt(i);
		if (ch != firstLo && ch != firstUp)
			continue;
		
		if (src.regionMatches(true, i, what, 0, length))
			return true;
	}
	
	return false;
}

Speed Analysis

This speed analysis does not mean to be rocket science, just a rough picture of how fast the different methods are.

I compare 5 methods.

  1. Our containsIgnoreCase() method.
  2. By converting both strings to lower-case and call String.contains().
  3. By converting source string to lower-case and call String.contains() with the pre-cached, lower-cased substring. This solution is already not as flexible because it tests a predefiend substring.
  4. Using regular expression (the accepted answer Pattern.compile().matcher().find()...)
  5. Using regular expression but with pre-created and cached Pattern. This solution is already not as flexible because it tests a predefined substring.

Results (by calling the method 10 million times):

  1. Our method: 670 ms
  2. 2x toLowerCase() and contains(): 2829 ms
  3. 1x toLowerCase() and contains() with cached substring: 2446 ms
  4. Regexp: 7180 ms
  5. Regexp with cached Pattern: 1845 ms

Results in a table:

                                            RELATIVE SPEED   1/RELATIVE SPEED
 METHOD                          EXEC TIME    TO SLOWEST      TO FASTEST (#1)
------------------------------------------------------------------------------
 1. Using regionMatches()          670 ms       10.7x            1.0x
 2. 2x lowercase+contains         2829 ms        2.5x            4.2x
 3. 1x lowercase+contains cache   2446 ms        2.9x            3.7x
 4. Regexp                        7180 ms        1.0x           10.7x
 5. Regexp+cached pattern         1845 ms        3.9x            2.8x

Our method is 4x faster compared to lowercasing and using contains(), 10x faster compared to using regular expressions and also 3x faster even if the Pattern is pre-cached (and losing flexibility of checking for an arbitrary substring).


Analysis Test Code

If you're interested how the analysis was performed, here is the complete runnable application:

import java.util.regex.Pattern;

public class ContainsAnalysis {
	
	// Case 1 utilizing String.regionMatches()
	public static boolean containsIgnoreCase(String src, String what) {
		final int length = what.length();
		if (length == 0)
			return true; // Empty string is contained
			
		final char firstLo = Character.toLowerCase(what.charAt(0));
		final char firstUp = Character.toUpperCase(what.charAt(0));
		
		for (int i = src.length() - length; i >= 0; i--) {
			// Quick check before calling the more expensive regionMatches()
			// method:
			final char ch = src.charAt(i);
			if (ch != firstLo && ch != firstUp)
				continue;
			
			if (src.regionMatches(true, i, what, 0, length))
				return true;
		}
		
		return false;
	}
	
	// Case 2 with 2x toLowerCase() and contains()
	public static boolean containsConverting(String src, String what) {
		return src.toLowerCase().contains(what.toLowerCase());
	}
	
	// The cached substring for case 3
	private static final String S = "i am".toLowerCase();
	
	// Case 3 with pre-cached substring and 1x toLowerCase() and contains()
	public static boolean containsConverting(String src) {
		return src.toLowerCase().contains(S);
	}
	
	// Case 4 with regexp
	public static boolean containsIgnoreCaseRegexp(String src, String what) {
		return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
					.matcher(src).find();
	}
	
	// The cached pattern for case 5
	private static final Pattern P = Pattern.compile(
			Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);
	
	// Case 5 with pre-cached Pattern
	public static boolean containsIgnoreCaseRegexp(String src) {
		return P.matcher(src).find();
	}
	
	// Main method: perfroms speed analysis on different contains methods
	// (case ignored)
	public static void main(String[] args) throws Exception {
		final String src = "Hi, I am Adam";
		final String what = "i am";
		
		long start, end;
		final int N = 10_000_000;
		
		start = System.nanoTime();
		for (int i = 0; i < N; i++)
			containsIgnoreCase(src, what);
		end = System.nanoTime();
		System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");
		
		start = System.nanoTime();
		for (int i = 0; i < N; i++)
			containsConverting(src, what);
		end = System.nanoTime();
		System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");
		
		start = System.nanoTime();
		for (int i = 0; i < N; i++)
			containsConverting(src);
		end = System.nanoTime();
		System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");
		
		start = System.nanoTime();
		for (int i = 0; i < N; i++)
			containsIgnoreCaseRegexp(src, what);
		end = System.nanoTime();
		System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");
		
		start = System.nanoTime();
		for (int i = 0; i < N; i++)
			containsIgnoreCaseRegexp(src);
		end = System.nanoTime();
		System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
	}
	
}

Solution 5 - Java

A simpler way of doing this (without worrying about pattern matching) would be converting both Strings to lowercase:

String foobar = "fooBar";
String bar = "FOO";
if (foobar.toLowerCase().contains(bar.toLowerCase()) {
    System.out.println("It's a match!");
}

Solution 6 - Java

Yes, this is achievable:

String s1 = "abBaCca";
String s2 = "bac";

String s1Lower = s1;

//s1Lower is exact same string, now convert it to lowercase, I left the s1 intact for print purposes if needed

s1Lower = s1Lower.toLowerCase();

String trueStatement = "FALSE!";
if (s1Lower.contains(s2)) {

    //THIS statement will be TRUE
    trueStatement = "TRUE!"
}

return trueStatement;

This code will return the String "TRUE!" as it found that your characters were contained.

Solution 7 - Java

You can use regular expressions, and it works:

boolean found = s1.matches("(?i).*" + s2+ ".*");

Solution 8 - Java

Here's some Unicode-friendly ones you can make if you pull in ICU4j. I guess "ignore case" is questionable for the method names because although primary strength comparisons do ignore case, it's described as the specifics being locale-dependent. But it's hopefully locale-dependent in a way the user would expect.

public static boolean containsIgnoreCase(String haystack, String needle) {
    return indexOfIgnoreCase(haystack, needle) >= 0;
}

public static int indexOfIgnoreCase(String haystack, String needle) {
    StringSearch stringSearch = new StringSearch(needle, haystack);
    stringSearch.getCollator().setStrength(Collator.PRIMARY);
    return stringSearch.first();
}

Solution 9 - Java

I did a test finding a case-insensitive match of a string. I have a Vector of 150,000 objects all with a String as one field and wanted to find the subset which matched a string. I tried three methods:

  1. Convert all to lower case

     for (SongInformation song: songs) {
         if (song.artist.toLowerCase().indexOf(pattern.toLowercase() > -1) {
                 ...
         }
     }
    
  2. Use the String matches() method

     for (SongInformation song: songs) {
         if (song.artist.matches("(?i).*" + pattern + ".*")) {
         ...
         }
     }
    
  3. Use regular expressions

     Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
     Matcher m = p.matcher("");
     for (SongInformation song: songs) {
         m.reset(song.artist);
         if (m.find()) {
         ...
         }
     }
    

Timing results are:

  • No attempted match: 20 msecs

  • To lower match: 182 msecs

  • String matches: 278 msecs

  • Regular expression: 65 msecs

The regular expression looks to be the fastest for this use case.

Solution 10 - Java

There is a simple concise way, using regex flag (case insensitive {i}):

 String s1 = "hello abc efg";
 String s2 = "ABC";
 s1.matches(".*(?i)"+s2+".*");

/*
 * .*  denotes every character except line break
 * (?i) denotes case insensitivity flag enabled for s2 (String)
 * */

Solution 11 - Java

"AbCd".toLowerCase().contains("abcD".toLowerCase())

Solution 12 - Java

I'm not sure what your main question is here, but yes, .contains is case sensitive.

Solution 13 - Java

String container = " Case SeNsitive ";
String sub = "sen";
if (rcontains(container, sub)) {
    System.out.println("no case");
}

public static Boolean rcontains(String container, String sub) {

    Boolean b = false;
    for (int a = 0; a < container.length() - sub.length() + 1; a++) {
        //System.out.println(sub + " to " + container.substring(a, a+sub.length()));
        if (sub.equalsIgnoreCase(container.substring(a, a + sub.length()))) {
            b = true;
        }
    }
    return b;
}

Basically, it is a method that takes two strings. It is supposed to be a not-case sensitive version of contains(). When using the contains method, you want to see if one string is contained in the other.

This method takes the string that is "sub" and checks if it is equal to the substrings of the container string that are equal in length to the "sub". If you look at the for loop, you will see that it iterates in substrings (that are the length of the "sub") over the container string.

Each iteration checks to see if the substring of the container string is equalsIgnoreCase to the sub.

Solution 14 - Java

If you have to search an ASCII string in another ASCII string, such as a URL, you will find my solution to be better. I've tested icza's method and mine for the speed and here are the results:

  • Case 1 took 2788 ms - regionMatches
  • Case 2 took 1520 ms - my

The code:

public static String lowerCaseAscii(String s) {
    if (s == null)
        return null;

    int len = s.length();
    char[] buf = new char[len];
    s.getChars(0, len, buf, 0);
    for (int i=0; i<len; i++) {
        if (buf[i] >= 'A' && buf[i] <= 'Z')
            buf[i] += 0x20;
    }

    return new String(buf);
}

public static boolean containsIgnoreCaseAscii(String str, String searchStr) {
    return StringUtils.contains(lowerCaseAscii(str), lowerCaseAscii(searchStr));
}

Solution 15 - Java

import java.text.Normalizer;

import org.apache.commons.lang3.StringUtils;

public class ContainsIgnoreCase {

    public static void main(String[] args) {

        String in = "   Annulée ";
        String key = "annulee";

        // 100% java
        if (Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", "").toLowerCase().contains(key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

        // use commons.lang lib
        if (StringUtils.containsIgnoreCase(Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", ""), key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

    }

}

Solution 16 - Java

We can use stream with anyMatch and contains of Java 8

public class Test2 {
    public static void main(String[] args) {

        String a = "Gina Gini Protijayi Soudipta";
        String b = "Gini";

        System.out.println(WordPresentOrNot(a, b));
    }// main

    private static boolean WordPresentOrNot(String a, String b) {
    //contains is case sensitive. That's why change it to upper or lower case. Then check
        // Here we are using stream with anyMatch
        boolean match = Arrays.stream(a.toLowerCase().split(" ")).anyMatch(b.toLowerCase()::contains);
        return match;
    }

}

Solution 17 - Java

or you can use a simple approach and just convert the string's case to substring's case and then use contains method.

Solution 18 - Java

String x="abCd";
System.out.println(Pattern.compile("c",Pattern.CASE_INSENSITIVE).matcher(x).find());

Solution 19 - Java

You could simply do something like this:

String s1 = "AbBaCca";
String s2 = "bac";
String toLower = s1.toLowerCase();
return toLower.contains(s2);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAaronView Question on Stackoverflow
Solution 1 - JavaDave L.View Answer on Stackoverflow
Solution 2 - JavaMatt QuailView Answer on Stackoverflow
Solution 3 - JavamuhamadtoView Answer on Stackoverflow
Solution 4 - JavaiczaView Answer on Stackoverflow
Solution 5 - JavaPhilView Answer on Stackoverflow
Solution 6 - JavaBilbo BagginsView Answer on Stackoverflow
Solution 7 - JavaShivView Answer on Stackoverflow
Solution 8 - JavaHakanaiView Answer on Stackoverflow
Solution 9 - JavaJan NewmarchView Answer on Stackoverflow
Solution 10 - JavaMr.QView Answer on Stackoverflow
Solution 11 - JavaTakhir AtamuratovView Answer on Stackoverflow
Solution 12 - JavaSCdFView Answer on Stackoverflow
Solution 13 - JavasethView Answer on Stackoverflow
Solution 14 - JavaRevertronView Answer on Stackoverflow
Solution 15 - JavaStéphane GRILLONView Answer on Stackoverflow
Solution 16 - JavaSoudipta DuttaView Answer on Stackoverflow
Solution 17 - JavaSyed Salman HassanView Answer on Stackoverflow
Solution 18 - JavaIVYView Answer on Stackoverflow
Solution 19 - JavaErick KondelaView Answer on Stackoverflow