Is Java's System.arraycopy() efficient for small arrays?

Java Problem Overview

Is Java's System.arraycopy() efficient for small arrays, or does the fact that it's a native method make it likely to be substantially less efficient than a simple loop and a function call?

Do native methods incur additional performance overhead for crossing some kind of Java-system bridge?

Java Solutions

Solution 1 - Java

Expanding a little on what Sid has written, it's very likely that System.arraycopy is just a JIT intrinsic; meaning that when code calls System.arraycopy, it will most probably be calling a JIT-specific implementation (once the JIT tags System.arraycopy as being "hot") that is not executed through the JNI interface, so it doesn't incur the normal overhead of native methods.

In general, executing native methods does have some overhead (going through the JNI interface, also some internal JVM operations cannot happen when native methods are being executed). But it's not because a method is marked as "native" that you're actually executing it using JNI. The JIT can do some crazy things.

Easiest way to check is, as has been suggested, writing a small benchmark, being careful with the normal caveats of Java microbenchmarks (warm up the code first, avoid code with no side-effects since the JIT just optimizes it as a no-op, etc).

Solution 2 - Java

Here is my benchmark code:

public void test(int copySize, int copyCount, int testRep) {
    System.out.println("Copy size = " + copySize);
    System.out.println("Copy count = " + copyCount);
    System.out.println();
    for (int i = testRep; i > 0; --i) {
        copy(copySize, copyCount);
        loop(copySize, copyCount);
    }
    System.out.println();
}

public void copy(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        System.arraycopy(src, 1, dst, 0, copySize);
        dst[copySize] = src[copySize] + 1;
        System.arraycopy(dst, 0, src, 0, copySize);
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Arraycopy: " + (end - begin) / 1e9 + " s");
}

public void loop(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        for (int i = copySize - 1; i >= 0; --i) {
            dst[i] = src[i + 1];
        }
        dst[copySize] = src[copySize] + 1;
        for (int i = copySize - 1; i >= 0; --i) {
            src[i] = dst[i];
        }
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Man. loop: " + (end - begin) / 1e9 + " s");
}

public int[] newSrc(int arraySize) {
    int[] src = new int[arraySize];
    for (int i = arraySize - 1; i >= 0; --i) {
        src[i] = i;
    }
    return src;
}

From my tests, calling test() with copyCount = 10000000 (1e7) or greater allows the warm-up to be achieved during the first copy/loop call, so using testRep = 5 is enough; With copyCount = 1000000 (1e6) the warm-up need at least 2 or 3 iterations so testRep shall be increased in order to obtain usable results.

With my configuration (CPU Intel Core 2 Duo E8500 @ 3.16GHz, Java SE 1.6.0_35-b10 and Eclipse 3.7.2) it appears from the benchmark that:

When copySize = 24, System.arraycopy() and the manual loop take almost the same time (sometimes one is very slightly faster than the other, other times it’s the contrary),
When copySize < 24, the manual loop is faster than System.arraycopy() (slightly faster with copySize = 23, really faster with copySize < 5),
When copySize > 24, System.arraycopy() is faster than the manual loop (slightly faster with copySize = 25, the ratio loop-time/arraycopy-time increasing as copySize increases).

Note: I’m not English native speaker, please excuse all my grammar/vocabulary errors.

Solution 3 - Java

This is a valid concern. For example, in java.nio.DirectByteBuffer.put(byte[]), the author tries to avoid a JNI copy for small number of elements

// These numbers represent the point at which we have empirically
// determined that the average cost of a JNI call exceeds the expense
// of an element by element copy.  These numbers may change over time.
static final int JNI_COPY_TO_ARRAY_THRESHOLD   = 6;
static final int JNI_COPY_FROM_ARRAY_THRESHOLD = 6;

For System.arraycopy(), we can examine how JDK uses it. For example, in ArrayList, System.arraycopy() is always used, never "element by element copy", regardless of length (even if it's 0). Since ArrayList is very performance conscious, we can derive that System.arraycopy() is the most effecient way of array copying regardless of length.

Solution 4 - Java

System.arraycopy use a memmove operation for moving words and assembly for moving other primitive types in C behind the scene. So it makes its best effort to move as much as efficient it can reach.

Solution 5 - Java

Instead of relying on speculation and possibly outdated information, I ran some benchmarks using [tag:caliper]. In fact, Caliper comes with some examples, including a CopyArrayBenchmark that measures exactly this question! All you have to do is run

mvn exec:java -Dexec.mainClass=com.google.caliper.runner.CaliperMain -Dexec.args=examples.CopyArrayBenchmark

My results are based on Oracle's Java HotSpot(TM) 64-Bit Server VM, 1.8.0_31-b13, running on a mid-2010 MacBook Pro (macOS 10.11.6 with an Intel Arrandale i7, 8 GiB RAM). I don't believe that it's useful to post the raw timing data. Rather, I'll summarize the conclusions with the supporting visualizations.

In summary:

Writing a manual for loop to copy each element into a newly instantiated array is never advantageous, even for arrays as short as 5 elements.
Arrays.copyOf(array, array.length) and array.clone() are both consistently fast. These two techniques are nearly identical in performance; which one you choose is a matter of taste.
System.arraycopy(src, 0, dest, 0, src.length) is almost as fast as Arrays.copyOf(array, array.length) and array.clone(), but not quite consistently so. (See the case for 50000 ints.) Because of that, and the verbosity of the call, I would recommend System.arraycopy() if you need fine control over which elements get copied where.

Here are the timing plots:

Solution 6 - Java

Byte codes are executed natively anyways so it's likely that performance would be better than a loop.

So in case of a loop it would have to execute byte codes which will incur an overhead. While array copy should be straight memcopy.

Solution 7 - Java

Native functions should be faster than JVM functions, since there is no VM overhead. However for a lot of(>1000) very small(len<10) arrays it might be slower.

Content Type	Original Author	Original Content on Stackoverflow
Question	Gravity	View Question on Stackoverflow
Solution 1 - Java	vanza	View Answer on Stackoverflow
Solution 2 - Java	Ethaniel	View Answer on Stackoverflow
Solution 3 - Java	irreputable	View Answer on Stackoverflow
Solution 4 - Java	lichenbo	View Answer on Stackoverflow
Solution 5 - Java	200_success	View Answer on Stackoverflow
Solution 6 - Java	Sid Malani	View Answer on Stackoverflow
Solution 7 - Java	laci37	View Answer on Stackoverflow