How to split a string between letters and digits (or between digits and letters)?

JavaRegexString

Java Problem Overview


I'm trying to work out a way of splitting up a string in java that follows a pattern like so:

String a = "123abc345def";

The results from this should be the following:

x[0] = "123";
x[1] = "abc";
x[2] = "345";
x[3] = "def";

However I'm completely stumped as to how I can achieve this. Please can someone help me out? I have tried searching online for a similar problem, however it's very difficult to phrase it correctly in a search.

Please note: The number of letters & numbers may vary (e.g. There could be a string like so '1234a5bcdef')

Java Solutions


Solution 1 - Java

You could try to split on (?<=\D)(?=\d)|(?<=\d)(?=\D), like:

str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

It matches positions between a number and not-a-number (in any order).

  • (?<=\D)(?=\d) - matches a position between a non-digit (\D) and a digit (\d)
  • (?<=\d)(?=\D) - matches a position between a digit and a non-digit.

Solution 2 - Java

How about:

private List<String> Parse(String str) {
    List<String> output = new ArrayList<String>();
    Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
    while (match.find()) {
        output.add(match.group());
    }
    return output;
}

Solution 3 - Java

You can try this:

Pattern p = Pattern.compile("[a-z]+|\\d+");
Matcher m = p.matcher("123abc345def");
ArrayList<String> allMatches = new ArrayList<>();
while (m.find()) {
    allMatches.add(m.group());
}

The result (allMatches) will be:

["123", "abc", "345", "def"]

Solution 4 - Java

Use two different patterns: [0-9]* and [a-zA-Z]* and split twice by each of them.

Solution 5 - Java

If you are looking for solution without using Java String functionality (i.e. split, match, etc.) then the following should help:

List<String> splitString(String string) {
		List<String> list = new ArrayList<String>();
		String token = "";
		char curr;
		for (int e = 0; e < string.length() + 1; e++) {
			if (e == 0)
				curr = string.charAt(0);
			else {
				curr = string.charAt(--e);
			}

			if (isNumber(curr)) {
				while (e < string.length() && isNumber(string.charAt(e))) {
					token += string.charAt(e++);
				}
				list.add(token);
				token = "";
			} else {
				while (e < string.length() && !isNumber(string.charAt(e))) {
					token += string.charAt(e++);
				}
				list.add(token);
				token = "";
			}

		}

		return list;
	}

boolean isNumber(char c) {
		return c >= '0' && c <= '9';
	}

This solution will split numbers and 'words', where 'words' are strings that don't contain numbers. However, if you like to have only 'words' containing English letters then you can easily modify it by adding more conditions (like isNumber method call) depending on your requirements (for example you may wish to skip words that contain non English letters). Also note that the splitString method returns ArrayList which later can be converted to String array.

Solution 6 - Java

Didn't use Java for ages, so just some pseudo code, that should help get you started (faster for me than looking up everything :) ).

 string a = "123abc345def";
 string[] result;
 while(a.Length > 0)
 {
      string part;
      if((part = a.Match(/\d+/)).Length) // match digits
           ;
      else if((part = a.Match(/\a+/)).Length) // match letters
           ;
      else
           break; // something invalid - neither digit nor letter
      result.append(part);
      a = a.SubStr(part.Length - 1); // remove the part we've found
 }

Solution 7 - Java

I was doing this sort of thing for mission critical code. Like every fraction of a second counts because I need to process 180k entries in an unnoticeable amount of time. So I skipped the regex and split altogether and allowed for inline processing of each element (though adding them to an ArrayList<String> would be fine). If you want to do this exact thing but need it to be something like 20x faster...

void parseGroups(String text) {
    int last = 0;
    int state = 0;
    for (int i = 0, s = text.length(); i < s; i++) {
        switch (text.charAt(i)) {
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                if (state == 2) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 1;
                break;
            default:
                if (state == 1) {
                    processElement(text.substring(last, i));
                    last = i;
                }
                state = 2;
                break;
        }
    }
    processElement(text.substring(last));
}

Solution 8 - Java

Wouldn't this "d+|D+" do the job instead of the cumbersome: "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)" ?

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser843337View Question on Stackoverflow
Solution 1 - JavaQtaxView Answer on Stackoverflow
Solution 2 - JavanullpotentView Answer on Stackoverflow
Solution 3 - JavaThe Anh NguyenView Answer on Stackoverflow
Solution 4 - JavamishadoffView Answer on Stackoverflow
Solution 5 - JavasergeyanView Answer on Stackoverflow
Solution 6 - JavaMarioView Answer on Stackoverflow
Solution 7 - JavaTatarizeView Answer on Stackoverflow
Solution 8 - JavaAndrew AndersonView Answer on Stackoverflow