Java thread executing remainder operation in a loop blocks all other threads

JavaMultithreading

Java Problem Overview


The following code snippet executes two threads, one is a simple timer logging every second, the second is an infinite loop that executes a remainder operation:

public class TestBlockingThread {
    private static final Logger LOGGER = LoggerFactory.getLogger(TestBlockingThread.class);

    public static final void main(String[] args) throws InterruptedException {
        Runnable task = () -> {
            int i = 0;
            while (true) {
                i++;
                if (i != 0) {
                    boolean b = 1 % i == 0;
                }
            }
        };

        new Thread(new LogTimer()).start();
        Thread.sleep(2000);
        new Thread(task).start();
    }

    public static class LogTimer implements Runnable {
        @Override
        public void run() {
            while (true) {
                long start = System.currentTimeMillis();
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    // do nothing
                }
                LOGGER.info("timeElapsed={}", System.currentTimeMillis() - start);
            }
        }
    }
}

This gives the following result:

[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1003
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=13331
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1006
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1003
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004

I don't understand why the infinite task blocks all other threads for 13.3 seconds. I tried to change thread priorities and other settings, nothing worked.

If you have any suggestions to fix this (including tweaking OS context switching settings) please let me know.

Java Solutions


Solution 1 - Java

After all the explanations here (thanks to Peter Lawrey) we found that the main source of this pause is that safepoint inside the loop is reached rather rarely so it takes a long time to stop all threads for JIT-compiled code replacement.

But I decided to go deeper and find why safepoint is reached rarely. I found it a bit confusing why the back jump of while loop is not "safe" in this case.

So I summon -XX:+PrintAssembly in all its glory to help

-XX:+UnlockDiagnosticVMOptions \
-XX:+TraceClassLoading \
-XX:+DebugNonSafepoints \
-XX:+PrintCompilation \
-XX:+PrintGCDetails \
-XX:+PrintStubCode \
-XX:+PrintAssembly \
-XX:PrintAssemblyOptions=-Mintel

After some investigation I found that after third recompilation of lambda C2 compiler threw away safepoint polls inside loop completely.

UPDATE

During the profiling stage variable i was never seen equal to 0. That's why C2 speculatively optimized this branch away, so that the loop was transformed to something like

for (int i = OSR_value; i != 0; i++) {
    if (1 % i == 0) {
        uncommon_trap();
    }
}
uncommon_trap();

Note that originally infinite loop was reshaped to a regular finite loop with a counter! Due to JIT optimization to eliminate safepoint polls in finite counted loops, there was no safepoint poll in this loop either.

After some time, i wrapped back to 0, and the uncommon trap was taken. The method was deoptimized and continued execution in the interpreter. During recompilation with a new knowledge C2 recognized the infinite loop and gave up compilation. The rest of the method proceeded in the interpreter with proper safepoints.

There is a great must-read blog post "Safepoints: Meaning, Side Effects and Overheads" by Nitsan Wakart covering safepoints and this particular issue.

Safepoint elimination in very long counted loops is known to be a problem. The bug JDK-5014723 (thanks to Vladimir Ivanov) addresses this problem.

The workaround is available until the bug is finally fixed.

  1. You can try using -XX:+UseCountedLoopSafepoints (it will cause overall performance penalty and may lead to JVM crash JDK-8161147). After using it C2 compiler continue keeping safepoints at the back jumps and original pause disappears completely.

  2. You can explicitly disable compilation of problematic method by using
    -XX:CompileCommand='exclude,binary/class/Name,methodName'

  3. Or you can rewrite your code by adding safepoint manually. For example Thread.yield() call at the end of cycle or even changing int i to long i (thanks, Nitsan Wakart) will also fix pause.

Solution 2 - Java

In short, the loop you have has no safe point inside it except when i == 0 is reached. When this method is compiled and triggers the code to be replaced it needs to bring all the threads to a safe point, but this takes a very long time, locking up not just the thread running the code but all threads in the JVM.

I added the following command line options.

-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintCompilation

I also modified the code to use floating point which appears to take longer.

boolean b = 1.0 / i == 0;

And what I see in the output is

timeElapsed=100
Application time: 0.9560686 seconds
  41423  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
Total time for which application threads were stopped: 40.3971116 seconds, Stopping threads took: 40.3967755 seconds
Application time: 0.0000219 seconds
Total time for which application threads were stopped: 0.0005840 seconds, Stopping threads took: 0.0000383 seconds
  41424  281 %     3       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
timeElapsed=40473
  41425  282 %     4       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
  41426  281 %     3       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
timeElapsed=100

Note: for code to be replaced, threads have to be stopped at a safe point. However it appears here that such a safe point is reached very rarely (possibly only when i == 0 Changing the task to

Runnable task = () -> {
    for (int i = 1; i != 0 ; i++) {
        boolean b = 1.0 / i == 0;
    }
};

I see a similar delay.

timeElapsed=100
Application time: 0.9587419 seconds
  39044  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (28 bytes)   made not entrant
Total time for which application threads were stopped: 38.0227039 seconds, Stopping threads took: 38.0225761 seconds
Application time: 0.0000087 seconds
Total time for which application threads were stopped: 0.0003102 seconds, Stopping threads took: 0.0000105 seconds
timeElapsed=38100
timeElapsed=100

Adding code to the loop carefully you get a longer delay.

for (int i = 1; i != 0 ; i++) {
    boolean b = 1.0 / i / i == 0;
}

gets

 Total time for which application threads were stopped: 59.6034546 seconds, Stopping threads took: 59.6030773 seconds

However, change the code to use a native method which always has a safe point (if it is not an intrinsic)

for (int i = 1; i != 0 ; i++) {
    boolean b = Math.cos(1.0 / i) == 0;
}

prints

Total time for which application threads were stopped: 0.0001444 seconds, Stopping threads took: 0.0000615 seconds

Note: adding if (Thread.currentThread().isInterrupted()) { ... } to a loop adds a safe point.

Note: This happened on a 16 core machine so there is no lack of CPU resources.

Solution 3 - Java

Found the answer of why. They are called safepoints, and are best known as the Stop-The-World that happens because of GC.

See this articles: Logging stop-the-world pauses in JVM

> Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certain JVMTI operations , and many more also require the application to be stopped. > > The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses. > > It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options: > > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime > > If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above.

Reading the HotSpot Glossary of Terms, it defines this:

> safepoint > > A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints.

Running with the above mentioned flags, I get this output:

Application time: 0.9668750 seconds
Total time for which application threads were stopped: 0.0000747 seconds, Stopping threads took: 0.0000291 seconds
timeElapsed=1015
Application time: 1.0148568 seconds
Total time for which application threads were stopped: 0.0000556 seconds, Stopping threads took: 0.0000168 seconds
timeElapsed=1015
timeElapsed=1014
Application time: 2.0453971 seconds
Total time for which application threads were stopped: 10.7951187 seconds, Stopping threads took: 10.7950774 seconds
timeElapsed=11732
Application time: 1.0149263 seconds
Total time for which application threads were stopped: 0.0000644 seconds, Stopping threads took: 0.0000368 seconds
timeElapsed=1015

Notice the third STW event:
Total time stopped: 10.7951187 seconds
Stopping threads took: 10.7950774 seconds

JIT itself took virtually no time, but once the JVM had decided to perform a JIT compilation, it entered STW mode, however since the code to be compiled (the infinite loop) doesn't have a call site, no safepoint was ever reached.

The STW ends when JIT eventually gives up waiting and concludes the code is in an infinite loop.

Solution 4 - Java

After following the comment threads and some testing on my own, I believe that the pause is caused by the JIT compiler. Why the JIT compiler is taking such a long time is beyond my ability to debug.

However, since you only asked for how to prevent this, I have a solution:

Pull your infinite loop into a method where it can be excluded from the JIT compiler

public class TestBlockingThread {
	private static final Logger LOGGER = Logger.getLogger(TestBlockingThread.class.getName());

	public static final void main(String[] args) throws InterruptedException     {
		Runnable task = () -> {
			infLoop();
		};
		new Thread(new LogTimer()).start();
		Thread.sleep(2000);
		new Thread(task).start();
	}
	
	private static void infLoop()
	{
		int i = 0;
		while (true) {
			i++;
			if (i != 0) {
				boolean b = 1 % i == 0;
			}
		}
	}

Run your program with this VM argument:

-XX:CompileCommand=exclude,PACKAGE.TestBlockingThread::infLoop (replace PACKAGE with your package information)

You should get a message like this to indicate when the method would have been JIT-compiled:
### Excluding compile: static blocking.TestBlockingThread::infLoop
you may notice that I put the class into a package called blocking

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionkms333View Question on Stackoverflow
Solution 1 - JavavsminkovView Answer on Stackoverflow
Solution 2 - JavaPeter LawreyView Answer on Stackoverflow
Solution 3 - JavaAndreasView Answer on Stackoverflow
Solution 4 - JavaJeutnargView Answer on Stackoverflow