56

Is Java's System.arraycopy() efficient for small arrays, or does the fact that it's a native method make it likely to be substantially less efficient than a simple loop and a function call?

Do native methods incur additional performance overhead for crossing some kind of Java-system bridge?

7
  • 10
    Have you tried it and ran a benchmark?
    – corsiKa
    Commented Dec 15, 2011 at 21:36
  • 2
    I'd love to see a microbenchmark on this. Commented Dec 15, 2011 at 21:36
  • 1
    I think that built-in native code is not affected by JNI latencies
    – gd1
    Commented Dec 15, 2011 at 21:48
  • @glowcoder, i wanna see a micro benchmark of yours, that's exactly a very hard one to get right
    – bestsss
    Commented Dec 19, 2011 at 11:42
  • to answer: there is a certain benchmark that does tons of small (less than a cache line) copies , all JDK implementors know about it and hopefully target the case.
    – bestsss
    Commented Dec 19, 2011 at 11:54

7 Answers 7

29

Expanding a little on what Sid has written, it's very likely that System.arraycopy is just a JIT intrinsic; meaning that when code calls System.arraycopy, it will most probably be calling a JIT-specific implementation (once the JIT tags System.arraycopy as being "hot") that is not executed through the JNI interface, so it doesn't incur the normal overhead of native methods.

In general, executing native methods does have some overhead (going through the JNI interface, also some internal JVM operations cannot happen when native methods are being executed). But it's not because a method is marked as "native" that you're actually executing it using JNI. The JIT can do some crazy things.

Easiest way to check is, as has been suggested, writing a small benchmark, being careful with the normal caveats of Java microbenchmarks (warm up the code first, avoid code with no side-effects since the JIT just optimizes it as a no-op, etc).

3
  • @vanze, small benchmarks running entirely in L1 cache are just wrong.
    – bestsss
    Commented Dec 19, 2011 at 11:50
  • 2
    @bestsss: He may have meant small in terms of the amount of code, not necessarily small in terms of the numbers of arrays I'd do the copy on.
    – Gravity
    Commented Dec 19, 2011 at 20:09
  • @Gravity, most people almost always get the micro benchmarks wrong, esp Java ones. basically you need to know what kind of code the JIT emits to write a proper micro-benchmark. Usually the easiest way is looking into the generated assembler and compare the result vs. expected.
    – bestsss
    Commented Dec 19, 2011 at 21:49
24

Here is my benchmark code:

public void test(int copySize, int copyCount, int testRep) {
    System.out.println("Copy size = " + copySize);
    System.out.println("Copy count = " + copyCount);
    System.out.println();
    for (int i = testRep; i > 0; --i) {
        copy(copySize, copyCount);
        loop(copySize, copyCount);
    }
    System.out.println();
}

public void copy(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        System.arraycopy(src, 1, dst, 0, copySize);
        dst[copySize] = src[copySize] + 1;
        System.arraycopy(dst, 0, src, 0, copySize);
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Arraycopy: " + (end - begin) / 1e9 + " s");
}

public void loop(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        for (int i = copySize - 1; i >= 0; --i) {
            dst[i] = src[i + 1];
        }
        dst[copySize] = src[copySize] + 1;
        for (int i = copySize - 1; i >= 0; --i) {
            src[i] = dst[i];
        }
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Man. loop: " + (end - begin) / 1e9 + " s");
}

public int[] newSrc(int arraySize) {
    int[] src = new int[arraySize];
    for (int i = arraySize - 1; i >= 0; --i) {
        src[i] = i;
    }
    return src;
}

From my tests, calling test() with copyCount = 10000000 (1e7) or greater allows the warm-up to be achieved during the first copy/loop call, so using testRep = 5 is enough; With copyCount = 1000000 (1e6) the warm-up need at least 2 or 3 iterations so testRep shall be increased in order to obtain usable results.

With my configuration (CPU Intel Core 2 Duo E8500 @ 3.16GHz, Java SE 1.6.0_35-b10 and Eclipse 3.7.2) it appears from the benchmark that:

  • When copySize = 24, System.arraycopy() and the manual loop take almost the same time (sometimes one is very slightly faster than the other, other times it’s the contrary),
  • When copySize < 24, the manual loop is faster than System.arraycopy() (slightly faster with copySize = 23, really faster with copySize < 5),
  • When copySize > 24, System.arraycopy() is faster than the manual loop (slightly faster with copySize = 25, the ratio loop-time/arraycopy-time increasing as copySize increases).

Note: I’m not English native speaker, please excuse all my grammar/vocabulary errors.

3
  • 1
    I got different results. It does not seem to matter even with small size arrays. Large arrays into the thousands, systemarraycopy is much faster. small arrays I see no diff..
    – RickHigh
    Commented Dec 30, 2013 at 4:56
  • 1
    I had different results as well. With an array size of 5000 System.arrayCopy was about 4 times faster and with an array size of only 5 it was still about 20% faster. It seemed to even out at a size of 2. So I don't think there is ever a reason not to use System.arrayCopy
    – nikdeapen
    Commented Nov 6, 2015 at 20:16
  • 1
    @nikdeapen seems that situation changed over years Commented Feb 29, 2016 at 13:26
19

This is a valid concern. For example, in java.nio.DirectByteBuffer.put(byte[]), the author tries to avoid a JNI copy for small number of elements

// These numbers represent the point at which we have empirically
// determined that the average cost of a JNI call exceeds the expense
// of an element by element copy.  These numbers may change over time.
static final int JNI_COPY_TO_ARRAY_THRESHOLD   = 6;
static final int JNI_COPY_FROM_ARRAY_THRESHOLD = 6;

For System.arraycopy(), we can examine how JDK uses it. For example, in ArrayList, System.arraycopy() is always used, never "element by element copy", regardless of length (even if it's 0). Since ArrayList is very performance conscious, we can derive that System.arraycopy() is the most effecient way of array copying regardless of length.

7
  • 11
    I suppose part of the question is whether System.arraycopy() goes through JNI at all. As someone pointed out, the mere fact that it's declared native doesn't mean anything because the JVM is allowed to have all kinds of special optimizations.
    – Gravity
    Commented Dec 16, 2011 at 5:32
  • I did think exactly the same thought regarding the ArrayList class, though, even as I was asking this question: that ArrayList, supposedly being a very well-optimized class, wouldn't do anything that's needlessly expensive. Still, maybe the authors figured the performance of small ArrayLists didn't matter all that much (since it's the operations on big datasets that are most costly anyway), or maybe it's not uniformly efficient across all JVMs. I'm going to have to benchmark to be convinced.
    – Gravity
    Commented Dec 16, 2011 at 5:39
  • 2
    they will care about performance of many small array lists. so each small array list must perform well. Commented Dec 16, 2011 at 19:46
  • 3
    @Gravity, System.arraycopy is not JNI, it's intrnsics
    – bestsss
    Commented Dec 19, 2011 at 11:44
  • DirectBuffer copy is a special case, it doesn't use System.arraycopy at all nor JNI, the documentation is just old. It's intrinsics, Unsafe.copymemory is basically memmove/memcpy but simple copy can be just as fast if the range checks are eliminated.
    – bestsss
    Commented Dec 19, 2011 at 11:49
8

Instead of relying on speculation and possibly outdated information, I ran some benchmarks using . In fact, Caliper comes with some examples, including a CopyArrayBenchmark that measures exactly this question! All you have to do is run

mvn exec:java -Dexec.mainClass=com.google.caliper.runner.CaliperMain -Dexec.args=examples.CopyArrayBenchmark

My results are based on Oracle's Java HotSpot(TM) 64-Bit Server VM, 1.8.0_31-b13, running on a mid-2010 MacBook Pro (macOS 10.11.6 with an Intel Arrandale i7, 8 GiB RAM). I don't believe that it's useful to post the raw timing data. Rather, I'll summarize the conclusions with the supporting visualizations.

In summary:

  • Writing a manual for loop to copy each element into a newly instantiated array is never advantageous, even for arrays as short as 5 elements.
  • Arrays.copyOf(array, array.length) and array.clone() are both consistently fast. These two techniques are nearly identical in performance; which one you choose is a matter of taste.
  • System.arraycopy(src, 0, dest, 0, src.length) is almost as fast as Arrays.copyOf(array, array.length) and array.clone(), but not quite consistently so. (See the case for 50000 ints.) Because of that, and the verbosity of the call, I would recommend System.arraycopy() if you need fine control over which elements get copied where.

Here are the timing plots:

Timings for copying arrays of length 5 Timings for copying arrays of length 500 Timings for copying arrays of length 50000

7

System.arraycopy use a memmove operation for moving words and assembly for moving other primitive types in C behind the scene. So it makes its best effort to move as much as efficient it can reach.

0
5

Byte codes are executed natively anyways so it's likely that performance would be better than a loop.

So in case of a loop it would have to execute byte codes which will incur an overhead. While array copy should be straight memcopy.

5
  • 2
    There is an additional cost to crossing from Java to native code. Commented Dec 15, 2011 at 21:44
  • @AndyThomas-Cramer, there is no native code at all in terms of context switch to JNI. And side note: JIT is smart enough to optimize array copy via loop to the same code as System.arraycopy
    – bestsss
    Commented Dec 19, 2011 at 11:56
  • "JIT is smart enough to optimize array copy via loop to the same code as System.arraycopy". -- Is it? In theory that makes a lot of sense, but I haven't heard of such a thing. Can you point to a reference?
    – Gravity
    Commented Dec 19, 2011 at 20:14
  • All byte codes are run as machine code eventually anyway. This can be either interpreted or compiled by a JIT. So on that basis system.array copy is probably likely to be converted to memcpy. While a loop would probably be more expensive on that basis.
    – Sid Malani
    Commented Dec 20, 2011 at 20:07
  • @SidMalani: there's a huge difference between compiling the code to a JNI call and compiling it to a call to memcpy (in fact memmove), and while the second options gets you faster code, a human needs to add a special case to the JVM for each intrinsic. Commented Feb 16, 2014 at 14:31
-1

Native functions should be faster than JVM functions, since there is no VM overhead. However for a lot of(>1000) very small(len<10) arrays it might be slower.

3
  • Can you please ellaborate this ? Why would it be slower for a lot of small arrays ? Commented Dec 15, 2011 at 21:46
  • I'm guessing because of the function call overhead? Unless even native functions that are JIT intrinsic can be inlined...
    – Gravity
    Commented Dec 15, 2011 at 22:49
  • Yes the call overhead might be larger on native functions. As it adds up, it might outweigh the time saved by eliminating the JVM overhead
    – laci37
    Commented Dec 16, 2011 at 13:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.