Using Java to find substring of a bigger string using Regular Expression

JavaRegexString

Java Problem Overview


If I have a string like this:

FOO[BAR]

I need a generic way to get the "BAR" string out of the string so that no matter what string is between the square brackets it would be able to get the string.

e.g.

FOO[DOG] = DOG
FOO[CAT] = CAT

Java Solutions


Solution 1 - Java

You should be able to use non-greedy quantifiers, specifically *?. You're going to probably want the following:

Pattern MY_PATTERN = Pattern.compile("\\[(.*?)\\]");

This will give you a pattern that will match your string and put the text within the square brackets in the first group. Have a look at the Pattern API Documentation for more information.

To extract the string, you could use something like the following:

Matcher m = MY_PATTERN.matcher("FOO[BAR]");
while (m.find()) {
    String s = m.group(1);
    // s now contains "BAR"
}

Solution 2 - Java

the non-regex way:

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf("["),input.indexOf("]"));

alternatively, for slightly better performance/memory usage (thanks Hosam):

String input = "FOO[BAR]", extracted;
extracted = input.substring(input.indexOf('['),input.lastIndexOf(']'));

Solution 3 - Java

This is a working example :

RegexpExample.java

package org.regexp.replace;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexpExample
{
	public static void main(String[] args)
	{
		String string = "var1[value1], var2[value2], var3[value3]";
		Pattern pattern = Pattern.compile("(\\[)(.*?)(\\])");
		Matcher matcher = pattern.matcher(string);
		
		List<String> listMatches = new ArrayList<String>();
		
		while(matcher.find())
		{
			listMatches.add(matcher.group(2));
		}

		for(String s : listMatches)
		{
			System.out.println(s);
		}
	}
}

It displays :

value1
value2
value3

Solution 4 - Java

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public static String get_match(String s, String p) {
    // returns first match of p in s for first group in regular expression 
    Matcher m = Pattern.compile(p).matcher(s);
    return m.find() ? m.group(1) : "";
}

get_match("FOO[BAR]", "\\[(.*?)\\]")  // returns "BAR"

public static List<String> get_matches(String s, String p) {
    // returns all matches of p in s for first group in regular expression 
    List<String> matches = new ArrayList<String>();
    Matcher m = Pattern.compile(p).matcher(s);
    while(m.find()) {
        matches.add(m.group(1));
    }
    return matches;
}

get_matches("FOO[BAR] FOO[CAT]", "\\[(.*?)\\]")) // returns [BAR, CAT]

Solution 5 - Java

If you simply need to get whatever is between [], the you can use \[([^\]]*)\] like this:

Pattern regex = Pattern.compile("\\[([^\\]]*)\\]");
Matcher m = regex.matcher(str);
if (m.find()) {
    result = m.group();
}

If you need it to be of the form identifier + [ + content + ] then you can limit extracting the content only when the identifier is a alphanumerical:

[a-zA-Z][a-z-A-Z0-9_]*\s*\[([^\]]*)\]

This will validate things like Foo [Bar], or myDevice_123["input"] for instance.

Main issue

The main problem is when you want to extract the content of something like this:

FOO[BAR[CAT[123]]+DOG[FOO]]

The Regex won't work and will return BAR[CAT[123 and FOO.
If we change the Regex to \[(.*)\] then we're OK but then, if you're trying to extract the content from more complex things like:

FOO[BAR[CAT[123]]+DOG[FOO]] = myOtherFoo[BAR[5]]

None of the Regexes will work.

The most accurate Regex to extract the proper content in all cases would be a lot more complex as it would need to balance [] pairs and give you they content.

A simpler solution

If your problems is getting complex and the content of the [] arbitrary, you could instead balance the pairs of [] and extract the string using plain old code rathe than a Regex:

int i;
int brackets = 0;
string c;
result = "";
for (i = input.indexOf("["); i < str.length; i++) {
    c = str.substring(i, i + 1);
    if (c == '[') {
        brackets++;
    } else if (c == ']') {
        brackets--;
        if (brackets <= 0) 
            break;
    }
    result = result + c;
}	

This is more pseudo-code than real code, I'm not a Java coder so I don't know if the syntax is correct, but it should be easy enough to improve upon.
What count is that this code should work and allow you to extract the content of the [], however complex it is.

Solution 6 - Java

I think your regular expression would look like:

/FOO\[(.+)\]/

Assuming that FOO going to be constant.

So, to put this in Java:

Pattern p = Pattern.compile("FOO\\[(.+)\\]");
Matcher m = p.matcher(inputLine);

Solution 7 - Java

String input = "FOO[BAR]";
String result = input.substring(input.indexOf("[")+1,input.lastIndexOf("]"));

This will return the value between first '[' and last ']'

Foo[Bar] => Bar

Foo[Bar[test]] => Bar[test]

Note: You should add error checking if the input string is not well formed.

Solution 8 - Java

Like this its work if you want to parse some string which is coming from mYearInDB.toString() =[2013] it will give 2013

Matcher n = MY_PATTERN.matcher("FOO[BAR]"+mYearInDB.toString());
while (n.find()) {
 extracredYear  = n.group(1);
 // s now contains "BAR"
	}
	System.out.println("Extrated output is : "+extracredYear);

Solution 9 - Java

assuming that no other closing square bracket is allowed within, /FOO\[([^\]]*)\]/

Solution 10 - Java

I'd define that I want a maximum number of non-] characters between [ and ]. These need to be escaped with backslashes (and in Java, these need to be escaped again), and the definition of non-] is a character class, thus inside [ and ] (i.e. [^\\]]). The result:

FOO\\[([^\\]]+)\\]

Solution 11 - Java

"FOO[DOG]".replaceAll("^.*?\\[|\\].*", "");

This will return a string taking only the string inside square brackets.

This remove all string outside from square brackets.

You can test this java sample code online: http://tpcg.io/wZoFu0

You can test this regex from here: https://regex101.com/r/oUAzsS/1

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondigiarnieView Question on Stackoverflow
Solution 1 - JavaBryan KyleView Answer on Stackoverflow
Solution 2 - JavazaczapView Answer on Stackoverflow
Solution 3 - JavaDjahid BekkaView Answer on Stackoverflow
Solution 4 - JavadansalmoView Answer on Stackoverflow
Solution 5 - JavaRenaud BompuisView Answer on Stackoverflow
Solution 6 - JavaKevin LacquementView Answer on Stackoverflow
Solution 7 - JavaamitView Answer on Stackoverflow
Solution 8 - Javauser665270View Answer on Stackoverflow
Solution 9 - JavaManuView Answer on Stackoverflow
Solution 10 - JavaFabian SteegView Answer on Stackoverflow
Solution 11 - JavaJorge Wander Santana UreñaView Answer on Stackoverflow