Time complexity of Java's substring()

JavaSubstringTime Complexity

Java Problem Overview


What is the time complexity of the String#substring() method in Java?

Java Solutions


Solution 1 - Java

New answer

As of update 6 within Java 7's lifetime, the behaviour of substring changed to create a copy - so every String refers to a char[] which is not shared with any other object, as far as I'm aware. So at that point, substring() became an O(n) operation where n is the numbers in the substring.

Old answer: pre-Java 7

Undocumented - but in practice O(1) if you assume no garbage collection is required, etc.

It simply builds a new String object referring to the same underlying char[] but with different offset and count values. So the cost is the time taken to perform validation and construct a single new (reasonably small) object. That's O(1) as far as it's sensible to talk about the complexity of operations which can vary in time based on garbage collection, CPU caches etc. In particular, it doesn't directly depend on the length of the original string or the substring.

Solution 2 - Java

It was O(1) in older versions of Java - as Jon stated, it just created a new String with the same underlying char[], and a different offset and length.

However, this has actually changed started with Java 7 update 6.

The char[] sharing was eliminated, and the offset and length fields were removed. substring() now just copies all the characters into a new String.

Ergo, substring is O(n) in Java 7 update 6

Solution 3 - Java

It's now linear complexity. This is after fixing a memory leak issue for substring.

So from Java 1.7.0_06 remember that String.substring now has a linear complexity instead of a constant one.

Solution 4 - Java

Adding proof to Jon's answer. I had same doubt and wanted to check whether length of string has any effects on substring function. Written following code to check on which parameter substring actually depends on.

import org.apache.commons.lang.RandomStringUtils;

public class Dummy {

    private static final String pool[] = new String[3];
    private static int substringLength;

    public static void main(String args[]) {
        pool[0] = RandomStringUtils.random(2000);
        pool[1] = RandomStringUtils.random(10000);
        pool[2] = RandomStringUtils.random(100000);
        test(10);
        test(100);
        test(1000);
    }

    public static void test(int val) {
        substringLength = val;
        StatsCopy statsCopy[] = new StatsCopy[3];
        for (int j = 0; j < 3; j++) {
            statsCopy[j] = new StatsCopy();
        }
        long latency[] = new long[3];
        for (int i = 0; i < 10000; i++) {
            for (int j = 0; j < 3; j++) {
                latency[j] = latency(pool[j]);
                statsCopy[j].send(latency[j]);
            }
        }
        for (int i = 0; i < 3; i++) {
            System.out.println(
                    " Avg: "
                            + (int) statsCopy[i].getAvg()
                            + "\t String length: "
                            + pool[i].length()
                            + "\tSubstring Length: "
                            + substringLength);
        }
        System.out.println();
    }

    private static long latency(String a) {
        long startTime = System.nanoTime();
        a.substring(0, substringLength);
        long endtime = System.nanoTime();
        return endtime - startTime;
    }

    private static class StatsCopy {
        private  long count = 0;
        private  long min = Integer.MAX_VALUE;
        private  long max = 0;
        private  double avg = 0;

        public  void send(long latency) {
            computeStats(latency);
            count++;
        }

        private  void computeStats(long latency) {
            if (min > latency) min = latency;
            if (max < latency) max = latency;
            avg = ((float) count / (count + 1)) * avg + (float) latency / (count + 1);
        }

        public  double getAvg() {
            return avg;
        }

        public  long getMin() {
            return min;
        }

        public  long getMax() {
            return max;
        }

        public  long getCount() {
            return count;
        }
    }

}

Output on execution in Java 8 is:

 Avg: 128	 String length: 2000	Substring Length: 10
 Avg: 127	 String length: 10000	Substring Length: 10
 Avg: 124	 String length: 100000	Substring Length: 10

 Avg: 172	 String length: 2000	Substring Length: 100
 Avg: 175	 String length: 10000	Substring Length: 100
 Avg: 177	 String length: 100000	Substring Length: 100

 Avg: 1199	 String length: 2000	Substring Length: 1000
 Avg: 1186	 String length: 10000	Substring Length: 1000
 Avg: 1339	 String length: 100000	Substring Length: 1000

Proving substring function depends on length of substring requested not on the length of the string.

Solution 5 - Java

O(1) because no copying of the original string is done, it just creates a new wrapper object with different offset information.

Solution 6 - Java

Judge for yourself from following, but Java's performance drawbacks lie somewhere else, not here in substring of a string. Code:

public static void main(String[] args) throws IOException {

        String longStr = "asjf97zcv.1jm2497z20`1829182oqiwure92874nvcxz,nvz.,xo" + 
                "aihf[oiefjkas';./.,z][p\\°°°°°°°°?!(*#&(@*&#!)^(*&(*&)(*&" +
                "fasdznmcxzvvcxz,vc,mvczvcz,mvcz,mcvcxvc,mvcxcvcxvcxvcxvcx";
        int[] indices = new int[32 * 1024];
        int[] lengths = new int[indices.length];
        Random r = new Random();
        final int minLength = 6;
        for (int i = 0; i < indices.length; ++i)
        {
            indices[i] = r.nextInt(longStr.length() - minLength);
            lengths[i] = minLength + r.nextInt(longStr.length() - indices[i] - minLength);
        }

        long start = System.nanoTime();

        int avoidOptimization = 0;
        for (int i = 0; i < indices.length; ++i)
            //avoidOptimization += lengths[i]; //tested - this was cheap
            avoidOptimization += longStr.substring(indices[i],
                    indices[i] + lengths[i]).length();

        long end = System.nanoTime();
        System.out.println("substring " + indices.length + " times");
        System.out.println("Sum of lengths of splits = " + avoidOptimization);
        System.out.println("Elapsed " + (end - start) / 1.0e6 + " ms");
    }

Output:

substring 32768 times
Sum of lengths of splits = 1494414
Elapsed 2.446679 ms

If it is O(1) or not, depends. If you just reference same String in memory, then imagine very long String, you make substring and stop referencing long one. Wouldn't be nice to release memory for long one?

Solution 7 - Java

Before Java 1.7.0_06: O(1).

After Java 1.7.0_06: O(n). This was changed, because of memory leak. After the fields offset and count were removed from String, the substring implementation became O(n).

For more details, please refer to : http://java-performance.info/changes-to-string-java-1-7-0_06/

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTimeToCodeTheRoadView Question on Stackoverflow
Solution 1 - JavaJon SkeetView Answer on Stackoverflow
Solution 2 - JavaChiView Answer on Stackoverflow
Solution 3 - Javafriends.pallavView Answer on Stackoverflow
Solution 4 - JavaPavan KonduriView Answer on Stackoverflow
Solution 5 - JavarodionView Answer on Stackoverflow
Solution 6 - JavapeenutView Answer on Stackoverflow
Solution 7 - JavaomilusView Answer on Stackoverflow