Split string to equal length substrings in Java

JavaRegexStringSplit

Java Problem Overview


How to split the string "Thequickbrownfoxjumps" to substrings of equal size in Java. Eg. "Thequickbrownfoxjumps" of 4 equal size should give the output.

["Theq","uick","brow","nfox","jump","s"]

Similar Question:

Split string into equal-length substrings in Scala

Java Solutions


Solution 1 - Java

Here's the regex one-liner version:

System.out.println(Arrays.toString(
    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.

Solution 2 - Java

Well, it's fairly easy to do this with simple arithmetic and string operations:

public static List<String> splitEqually(String text, int size) {
    // Give the list the right capacity to start with. You could use an array
    // instead if you wanted.
    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);

    for (int start = 0; start < text.length(); start += size) {
        ret.add(text.substring(start, Math.min(text.length(), start + size)));
    }
    return ret;
}

Note: this assumes a 1:1 mapping of UTF-16 code unit (char, effectively) with "character". That assumption breaks down for characters outside the Basic Multilingual Plane, such as emoji, and (depending on how you want to count things) combining characters.

I don't think it's really worth using a regex for this.

EDIT: My reasoning for not using a regex:

  • This doesn't use any of the real pattern matching of regexes. It's just counting.
  • I suspect the above will be more efficient, although in most cases it won't matter
  • If you need to use variable sizes in different places, you've either got repetition or a helper function to build the regex itself based on a parameter - ick.
  • The regex provided in another answer firstly didn't compile (invalid escaping), and then didn't work. My code worked first time. That's more a testament to the usability of regexes vs plain code, IMO.

Solution 3 - Java

This is very easy with Google Guava:

for(final String token :
    Splitter
        .fixedLength(4)
        .split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

Output:

Theq
uick
brow
nfox
jump
s

Or if you need the result as an array, you can use this code:

String[] tokens =
    Iterables.toArray(
        Splitter
            .fixedLength(4)
            .split("Thequickbrownfoxjumps"),
        String.class
    );

Reference:

Note: Splitter construction is shown inline above, but since Splitters are immutable and reusable, it's a good practice to store them in constants:

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);

// more code

for(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

Solution 4 - Java

If you're using Google's guava general-purpose libraries (and quite honestly, any new Java project probably should be), this is insanely trivial with the Splitter class:

for (String substring : Splitter.fixedLength(4).split(inputString)) {
    doSomethingWith(substring);
}

and that's it. Easy as!

Solution 5 - Java

public static String[] split(String src, int len) {
	String[] result = new String[(int)Math.ceil((double)src.length()/(double)len)];
	for (int i=0; i<result.length; i++)
		result[i] = src.substring(i*len, Math.min(src.length(), (i+1)*len));
	return result;
}

Solution 6 - Java

public String[] splitInParts(String s, int partLength)
{
    int len = s.length();

    // Number of parts
    int nparts = (len + partLength - 1) / partLength;
    String parts[] = new String[nparts];

    // Break into parts
    int offset= 0;
    int i = 0;
    while (i < nparts)
    {
        parts[i] = s.substring(offset, Math.min(offset + partLength, len));
        offset += partLength;
        i++;
    }

    return parts;
}

Solution 7 - Java

Here's a one-liner version which uses Java 8 IntStream to determine the indexes of the slice beginnings:

String x = "Thequickbrownfoxjumps";

String[] result = IntStream
                    .iterate(0, i -> i + 4)
                    .limit((int) Math.ceil(x.length() / 4.0))
                    .mapToObj(i ->
                        x.substring(i, Math.min(i + 4, x.length())
                    )
                    .toArray(String[]::new);

Solution 8 - Java

A StringBuilder version:

public static List<String> getChunks(String s, int chunkSize)
{
 List<String> chunks = new ArrayList<>();
 StringBuilder sb = new StringBuilder(s);

while(!(sb.length() ==0)) 
{           
   chunks.add(sb.substring(0, chunkSize));
   sb.delete(0, chunkSize);

}
return chunks;

}

Solution 9 - Java

I'd rather this simple solution:

String content = "Thequickbrownfoxjumps";
while(content.length() > 4) {
    System.out.println(content.substring(0, 4));
    content = content.substring(4);
}
System.out.println(content);

Solution 10 - Java

i use the following java 8 solution:

public static List<String> splitString(final String string, final int chunkSize) {
  final int numberOfChunks = (string.length() + chunkSize - 1) / chunkSize;
  return IntStream.range(0, numberOfChunks)
                  .mapToObj(index -> string.substring(index * chunkSize, Math.min((index + 1) * chunkSize, string.length())))
                  .collect(toList());
}

Solution 11 - Java

You can use substring from String.class (handling exceptions) or from Apache lang commons (it handles exceptions for you)

static String	substring(String str, int start, int end) 
        

Put it inside a loop and you are good to go.

Solution 12 - Java

In case you want to split the string equally backwards, i.e. from right to left, for example, to split 1010001111 to [10, 1000, 1111], here's the code:

/**
 * @param s         the string to be split
 * @param subLen    length of the equal-length substrings.
 * @param backwards true if the splitting is from right to left, false otherwise
 * @return an array of equal-length substrings
 * @throws ArithmeticException: / by zero when subLen == 0
 */
public static String[] split(String s, int subLen, boolean backwards) {
    assert s != null;
    int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
    String[] strs = new String[groups];
    if (backwards) {
        for (int i = 0; i < groups; i++) {
            int beginIndex = s.length() - subLen * (i + 1);
            int endIndex = beginIndex + subLen;
            if (beginIndex < 0)
                beginIndex = 0;
            strs[groups - i - 1] = s.substring(beginIndex, endIndex);
        }
    } else {
        for (int i = 0; i < groups; i++) {
            int beginIndex = subLen * i;
            int endIndex = beginIndex + subLen;
            if (endIndex > s.length())
                endIndex = s.length();
            strs[i] = s.substring(beginIndex, endIndex);
        }
    }
    return strs;
}

Solution 13 - Java

Here is a one liner implementation using Java8 streams:

String input = "Thequickbrownfoxjumps";
final AtomicInteger atomicInteger = new AtomicInteger(0);
Collection<String> result = input.chars()
                                    .mapToObj(c -> String.valueOf((char)c) )
                                    .collect(Collectors.groupingBy(c -> atomicInteger.getAndIncrement() / 4
                                                                ,Collectors.joining()))
                                    .values();

It gives the following output:

[Theq, uick, brow, nfox, jump, s]

Solution 14 - Java

Java 8 solution (like this but a bit simpler):

public static List<String> partition(String string, int partSize) {
  List<String> parts = IntStream.range(0, string.length() / partSize)
    .mapToObj(i -> string.substring(i * partSize, (i + 1) * partSize))
    .collect(toList());
  if ((string.length() % partSize) != 0)
    parts.add(string.substring(string.length() / partSize * partSize));
  return parts;
}

Solution 15 - Java

Use code points to handle all characters

Here is a solution:

  • Works with all 143,859 Unicode characters
  • Allows you to examine or manipulate each resulting string, if you have further logic to process.

To work with all Unicode characters, avoid the obsolete char type. And avoid char-based utilities. Instead, use code point integer numbers.

Call String#codePoints to get an IntStream object, a stream of int values. In the code below, we collect those int values into an array. Then we loop the array, for each integer we append the character assigned to that number to our StringBuilder object. Every nth character, we add a string to our master list, and empty the StringBuilder.

String input = "Thequickbrownfoxjumps";

int chunkSize = 4 ;
int[] codePoints = input.codePoints().toArray();  // `String#codePoints` returns an `IntStream`. Collect the elements of that stream into an array.
int initialCapacity = ( ( codePoints.length / chunkSize ) + 1 );
List < String > strings = new ArrayList <>( initialCapacity );

StringBuilder sb = new StringBuilder();
for ( int i = 0 ; i < codePoints.length ; i++ )
{
    sb.appendCodePoint( codePoints[ i ] );
    if ( 0 == ( ( i + 1 ) % chunkSize ) ) // Every nth code point.
    {
        strings.add( sb.toString() ); // Remember this iteration's value.
        sb.setLength( 0 ); // Clear the contents of the `StringBuilder` object.
    }
}
if ( sb.length() > 0 ) // If partial string leftover, save it too. Or not… just delete this `if` block.
{
    strings.add( sb.toString() ); // Remember last iteration's value.
}

System.out.println( "strings = " + strings );

>strings = [Theq, uick, brow, nfox, jump, s]

This works with non-Latin characters. Here we replace q with FACE WITH MEDICAL MASK.

String text = "The😷uickbrownfoxjumps"

>strings = [The, uick, brow, nfox, jump, s]

Solution 16 - Java

Here is my version based on RegEx and Java 8 streams. It's worth to mention that Matcher.results() method is available since Java 9.

Test included.

public static List<String> splitString(String input, int splitSize) {
    Matcher matcher = Pattern.compile("(?:(.{" + splitSize + "}))+?").matcher(input);
    return matcher.results().map(MatchResult::group).collect(Collectors.toList());
}

@Test
public void shouldSplitStringToEqualLengthParts() {
    String anyValidString = "Split me equally!";
    String[] expectedTokens2 = {"Sp", "li", "t ", "me", " e", "qu", "al", "ly"};
    String[] expectedTokens3 = {"Spl", "it ", "me ", "equ", "all"};

    Assert.assertArrayEquals(expectedTokens2, splitString(anyValidString, 2).toArray());
    Assert.assertArrayEquals(expectedTokens3, splitString(anyValidString, 3).toArray());
}

Solution 17 - Java

The simplest solution is:

  /**
   * Slices string by passed - in slice length.
   * If passed - in string is null or slice length less then 0 throws IllegalArgumentException.
   * @param toSlice string to slice
   * @param sliceLength slice length
   * @return List of slices
   */
  public static List<String> stringSlicer(String toSlice, int sliceLength) {
    if (toSlice == null) {
      throw new IllegalArgumentException("Passed - in string is null");
    }
    if (sliceLength < 0) {
      throw new IllegalArgumentException("Slice length can not be less then 0");
    }
    if (toSlice.isEmpty() || toSlice.length() <= sliceLength) {
      return List.of(toSlice);
    }
    
   return Arrays.stream(toSlice.split(String.format("(?s)(?<=\\G.{%d})", sliceLength))).collect(Collectors.toList());
  }

Solution 18 - Java

    import static java.lang.System.exit;
   import java.util.Scanner;
   import Java.util.Arrays.*;


 public class string123 {

public static void main(String[] args) {
    

  Scanner sc=new Scanner(System.in);
    System.out.println("Enter String");
    String r=sc.nextLine();
    String[] s=new String[10];
    int len=r.length();
       System.out.println("Enter length Of Sub-string");
    int l=sc.nextInt();
    int last;
    int f=0;
    for(int i=0;;i++){
        last=(f+l);
            if((last)>=len) last=len;
        s[i]=r.substring(f,last);
     // System.out.println(s[i]);
     
      if (last==len)break;
       f=(f+l);
    } 
    System.out.print(Arrays.tostring(s));
    }}

Result

 Enter String
 Thequickbrownfoxjumps
 Enter length Of Sub-string
 4
  
 ["Theq","uick","brow","nfox","jump","s"]

Solution 19 - Java

I asked @Alan Moore in a comment to the accepted solution how strings with newlines could be handled. He suggested using DOTALL.

Using his suggestion I created a small sample of how that works:

public void regexDotAllExample() throws UnsupportedEncodingException {
    final String input = "The\nquick\nbrown\r\nfox\rjumps";
    final String regex = "(?<=\\G.{4})";

    Pattern splitByLengthPattern;
    String[] split;

    splitByLengthPattern = Pattern.compile(regex);
    split = splitByLengthPattern.split(input);
    System.out.println("---- Without DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is a single entry longer than the desired split size:
    ---- Without DOTALL ----
    [Idx: 0, length: 26] - [B@17cdc4a5
     */


    //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
    splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
    split = splitByLengthPattern.split(input);
    System.out.println("---- With DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is as desired 7 entries with each entry having a max length of 4:
    ---- With DOTALL ----
    [Idx: 0, length: 4] - [B@77b22abc
    [Idx: 1, length: 4] - [B@5213da08
    [Idx: 2, length: 4] - [B@154f6d51
    [Idx: 3, length: 4] - [B@1191ebc5
    [Idx: 4, length: 4] - [B@30ddb86
    [Idx: 5, length: 4] - [B@2c73bfb
    [Idx: 6, length: 2] - [B@6632dd29
     */

}

But I like @Jon Skeets solution in https://stackoverflow.com/a/3760193/1237974 also. For maintainability in larger projects where not everyone are equally experienced in Regular expressions I would probably use Jons solution.

Solution 20 - Java

Another brute force solution could be,

	String input = "thequickbrownfoxjumps";
	int n = input.length()/4;
	String[] num = new String[n];
	
	for(int i = 0, x=0, y=4; i<n; i++){
	num[i]	= input.substring(x,y);
	x += 4;
	y += 4;
	System.out.println(num[i]);
	}

Where the code just steps through the string with substrings

Solution 21 - Java

@Test
public void regexSplit() {
    String source = "Thequickbrownfoxjumps";
    // define matcher, any char, min length 1, max length 4
    Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(source.substring(matcher.start(), matcher.end()));
    }
    String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
    assertArrayEquals(result.toArray(), expected);
}

Solution 22 - Java

public static String[] split(String input, int length) throws IllegalArgumentException {

    if(length == 0 || input == null)
        return new String[0];

    int lengthD = length * 2;

    int size = input.length();
    if(size == 0)
        return new String[0];

    int rep = (int) Math.ceil(size * 1d / length);

    ByteArrayInputStream stream = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_16LE));

    String[] out = new String[rep];
    byte[]  buf = new byte[lengthD];

    int d = 0;
    for (int i = 0; i < rep; i++) {

        try {
            d = stream.read(buf);
        } catch (IOException e) {
            e.printStackTrace();
        }

        if(d != lengthD)
        {
            out[i] = new String(buf,0,d, StandardCharsets.UTF_16LE);
            continue;
        }

        out[i] = new String(buf, StandardCharsets.UTF_16LE);
    }
    return out;
}

Solution 23 - Java

public static List<String> getSplittedString(String stringtoSplit,
			int length) {

		List<String> returnStringList = new ArrayList<String>(
				(stringtoSplit.length() + length - 1) / length);

		for (int start = 0; start < stringtoSplit.length(); start += length) {
			returnStringList.add(stringtoSplit.substring(start,
					Math.min(stringtoSplit.length(), start + length)));
		}

		return returnStringList;
	}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEmilView Question on Stackoverflow
Solution 1 - JavaAlan MooreView Answer on Stackoverflow
Solution 2 - JavaJon SkeetView Answer on Stackoverflow
Solution 3 - JavaSean Patrick FloydView Answer on Stackoverflow
Solution 4 - JavaCowanView Answer on Stackoverflow
Solution 5 - JavaSaulView Answer on Stackoverflow
Solution 6 - JavaGrodriguezView Answer on Stackoverflow
Solution 7 - JavaMarko PrevisicView Answer on Stackoverflow
Solution 8 - JavaFSmView Answer on Stackoverflow
Solution 9 - JavaCheetah CoderView Answer on Stackoverflow
Solution 10 - JavarloeffelView Answer on Stackoverflow
Solution 11 - JavapakoreView Answer on Stackoverflow
Solution 12 - JavaIvan HuangView Answer on Stackoverflow
Solution 13 - JavaPankaj SinghalView Answer on Stackoverflow
Solution 14 - JavaTimofey GorshkovView Answer on Stackoverflow
Solution 15 - JavaBasil BourqueView Answer on Stackoverflow
Solution 16 - JavaitachiView Answer on Stackoverflow
Solution 17 - JavaJackkobecView Answer on Stackoverflow
Solution 18 - JavaRavichandraView Answer on Stackoverflow
Solution 19 - JavajoenssonView Answer on Stackoverflow
Solution 20 - JavaHubblyView Answer on Stackoverflow
Solution 21 - JavaAdrian-Bogdan IonescuView Answer on Stackoverflow
Solution 22 - JavaUser8461View Answer on Stackoverflow
Solution 23 - JavaRaj HiraniView Answer on Stackoverflow