What is microbenchmarking?

JavaPerformanceBenchmarkingJitMicrobenchmark

Java Problem Overview


I've heard this term used, but I'm not entirely sure what it means, so:

  • What DOES it mean and what DOESN'T it mean?
  • What are some examples of what IS and ISN'T microbenchmarking?
  • What are the dangers of microbenchmarking and how do you avoid it?
    • (or is it a good thing?)

Java Solutions


Solution 1 - Java

It means exactly what it says on the tin can - it's measuring the performance of something "small", like a system call to the kernel of an operating system.

The danger is that people may use whatever results they obtain from microbenchmarking to dictate optimizations. And as we all know:

> We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of > all evil" -- Donald Knuth

There can be many factors that skew the result of microbenchmarks. Compiler optimizations is one of them. If the operation being measured takes so little time that whatever you use to measure it takes longer than the actual operation itself, your microbenchmarks will be skewed also.

For example, someone might take a microbenchmark of the overhead of for loops:

void TestForLoop()
{
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}

Obviously compilers can see that the loop does absolutely nothing and not generate any code for the loop at all. So the value of elapsed and elapsedPerIteration is pretty much useless.

Even if the loop does something:

void TestForLoop()
{
    int sum = 0;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        ++sum;
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
}

The compiler may see that the variable sum isn't going to be used for anything and optimize it away, and optimize away the for loop as well. But wait! What if we do this:

void TestForLoop()
{
    int sum = 0;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        ++sum;
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each iteration: %d\n", elapsedPerIteration);
    printf("Sum: %d\n", sum); // Added
}

The compiler might be smart enough to realize that sum will always be a constant value, and optimize all that away as well. Many would be surprised at the optimizing capabilities of compilers these days.

But what about things that compilers can't optimize away?

void TestFileOpenPerformance()
{
    FILE* file = NULL;
    time start = GetTime();

    for(int i = 0; i < 1000000000; ++i)
    {
        file = fopen("testfile.dat");
        fclose(file);
    }

    time elapsed = GetTime() - start;
    time elapsedPerIteration = elapsed / 1000000000;
    printf("Time elapsed for each file open: %d\n", elapsedPerIteration);
}

Even this is not a useful test! The operating system may see that the file is being opened very frequently, so it may preload it in memory to improve performance. Pretty much all operating systems do this. The same thing happens when you open applications - operating systems may figure out the top ~5 applications you open the most and preload the application code in memory when you boot up the computer!

In fact, there are countless variables that come into play: locality of reference (e.g. arrays vs. linked lists), effects of caches and memory bandwidth, compiler inlining, compiler implementation, compiler switches, number of processor cores, optimizations at the processor level, operating system schedulers, operating system background processes, etc.

So microbenchmarking isn't exactly a useful metric in a lot of cases. It definitely does not replace whole-program benchmarks with well-defined test cases (profiling). Write readable code first, then profile to see what needs to be done, if any.

I would like to emphasize that microbenchmarks are not evil per se, but one has to use them carefully (that's true for lots of other things related to computers)

Solution 2 - Java

There is no definition of micro-benchmarking, but when I use it I mean a small artificial benchmark designed to test the performance of some specific hardware1 or language feature. By contrast, a better benchmark is a real program designed to perform a real task. (Drawing a hard line between the two cases is pointless, IMO, and I won't try.)

The danger of micro benchmarking is that it is easy to write a benchmark that gives results that are totally misleading. Some common traps in Java micro-benchmarks are:

  • writing code that the compiler can deduce does not useful work, and therefore optimize away entirely,
  • not taking account of the "lumpy" nature of Java memory management, and
  • not taking account of JVM startup effects; e.g. the time take to load and JIT compile classes, and (conversely) the execution speedup that happens once the methods have been JIT compiled.

However, even once you have addressed the issues above, there is a systemic issue with benchmarking that is impossible to address. The code and behavior of a benchmark usually bears little relation to what you really care about; i.e. how your application is going to perform. There are far too many "hidden variables" for you to be able to generalize from a benchmark to typical programs, let alone to your program.

For these reasons, we regularly advise people NOT to waste their time with micro-benchmarks. Instead, it is best to write simple and natural code, and use a profiler to identify areas that need to be hand optimized. Interestingly, it usually turns out that the most significant performance problems in real applications are due to bad design of data structures and algorithms (including networking, database and threading-related bottlenecks) rather than the kind of things that typical micro-benchmarks are trying to test.

@BalusC has provided an excellent link to material on this topic in the Hotspot FAQ page. And here is a link to an IBM whitepaper by Brian Goetz.


1 - Experts would not even try to do hardware benchmarking in Java. There are too many "complex things" happening between the bytecodes and the hardware to draw valid / useful conclusions about hardware from the raw results. You would be better off using a language that is closer to the hardware; e.g. C or even assembly code.

Solution 3 - Java

> * What DOES it mean and what DOESN'T it mean?

I would say micro-benchmarking simply means measuring something tiny. Tiny is probably context dependent, but typically on the level of a single system call or something similar. Benchmarking refers to everything above.

> * What are some examples of what IS and ISN'T microbenchmarking?

This (archived) article lists measuring time of a getpid() system call and measuring the time to copy memory using memcpy() as examples of micro-benchmarking.

Any measurement of an algorithm implementation etc would not count as micro-benchmarking. Especially result reports listing tasks with decreasing execution time probably seldom count as micro benchmarking.

> * What are the dangers of microbenchmarking and how do you avoid it?

The obvious danger is that it tempts developers to optimize the wrong parts of a program. Another danger is that it is notoriously difficult to do measurements of something small accurately. The easiest way to avoid it is probably just to get a good picture of where the most time is spent in the program.

People usually say "don't do micro-benchmarking" but what they probably mean is "don't make optimization decisions based on micro-benchmarks".

> - (or is it a good thing?)

It's not at all a bad thing per se as others here, and many webpages seem to suggest. It has it's places. I work with program rewriting and runtime aspect weaving etc. We usually publish micro-benchmarks of our added instructions, not to guide any optimizations, but making sure that our extra code has close to no impact on the execution of the rewritten program.

It's an art however, especially in the context of a VM that has JIT, warm-up times etc. A well described approach for Java is descrbed here (archived).

Solution 4 - Java

The book 'Java Performance: The Definitive Guide' has this definition and example about microbenchmarks:

  1. Microbenchmarks

    A microbenchmark is a test designed to measure a very small unit performance: the time to call a synchronized method versus a non-synchronized method; the overhead in creating a thread versus using a thread pool; the time to execute one arithmetic algoritm versus an alternate implementation; and so on.

    Microbenchmarks may seem like a good idea, but they are very difficult to write correctly. Consider the following code, which is an attempt to write a microbenchmark that tests the performance of different implementations of a method to compute the 50th Fibonacci number:

> public void doTest(){ > double l; > long then = System.currentTimeMillis(); >
> for(int i = 0; i < nLoops; i++){ > l = fibImpl1(50); > } >
> long now = system.currentTimeMillis(); > System.out.println("Elapsed time: " + (now - then)) >
> } >
> ... >
> private double fibImpl1(int n){ > if(n < 0) throw new IllegalArgumentException("Must be > 0"); > if(n == 0) return 0d; > if(n == 1) return 1d; > double d = fibImpl1(n - 2) + fibImpl(n - 1); > if(Double.isInfinited(d)) throw new ArithmeticException("Overflow"); > return d; > }

Microbenchmarks must use their results.

The biggest problem with this code is that it never actually changes any program state. Because the result of the Fibonacci calculation is never used, the compiler is free to discard that calculation, A smart compiler (including current Java 7 and 8 compilers) will end up executing this code:

> long then = System.currentTimeMillis(); > long now = System.currentTimeMillis(); > System.out.println("Elapsed time: " + (now - then));

As a result, the elapsed time will be only a few milliseconds, regardless of the implementation of the Fibonacci method, or the number of times the loop is supposed to be executed.

There is a way around that particular issue: ensure that each result is read, nor simply written. In practice, changing the definition of l from a local variable to an instance variable (declared with the volatile keyword) will allow the performance of the method to be measured.

Solution 5 - Java

Here are some good articles by Brian Goetz that explain why (micro)benchmarking is especially hard in Java:

Solution 6 - Java

Microbenchmarking is benchmarking I don't think is worthwhile. Effective benchmarking is benchmarking I think is worth the time.

Generally speaking, microbenchmarking is (as in silico says) attempting to measure the performance of some very granular task, which is both hard to do well and usually pointless in the context of actual performance headaches.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpolygenelubricantsView Question on Stackoverflow
Solution 1 - JavaIn silicoView Answer on Stackoverflow
Solution 2 - JavaStephen CView Answer on Stackoverflow
Solution 3 - JavaaioobeView Answer on Stackoverflow
Solution 4 - JavacastilloRTView Answer on Stackoverflow
Solution 5 - JavaJesperView Answer on Stackoverflow
Solution 6 - JavaDan Davies BrackettView Answer on Stackoverflow