Can the JVM recover from an OutOfMemoryError without a restart

JavaJvmOut of-Memory

Java Problem Overview


  1. Can the JVM recover from an OutOfMemoryError without a restart if it gets a chance to run the GC before more object allocation requests come in?

  2. Do the various JVM implementations differ in this aspect?

My question is about the JVM recovering and not the user program trying to recover by catching the error. In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it? Or can I let it run if further requests seem to work without a problem.

Java Solutions


Solution 1 - Java

It may work, but it is generally a bad idea. There is no guarantee that your application will succeed in recovering, or that it will know if it has not succeeded. For example:

  • There really may be not enough memory to do the requested tasks, even after taking recovery steps like releasing block of reserved memory. In this situation, your application may get stuck in a loop where it repeatedly appears to recover and then runs out of memory again.

  • The OOME may be thrown on any thread. If an application thread or library is not designed to cope with it, this might leave some long-lived data structure in an incomplete or inconsistent state.

  • If threads die as a result of the OOME, the application may need to restart them as part of the OOME recovery. At the very least, this makes the application more complicated.

  • Suppose that a thread synchronizes with other threads using notify/wait or some higher level mechanism. If that thread dies from an OOME, other threads may be left waiting for notifies (etc) that never come ... for example. Designing for this could make the application significantly more complicated.

In summary, designing, implementing and testing an application to recover from OOMEs can be difficult, especially if the application (or the framework in which it runs, or any of the libraries it uses) is multi-threaded. It is a better idea to treat OOME as a fatal error.

See also my answer to a related question:

EDIT - in response to this followup question:

> In other words if an OOME is thrown in an application server (jboss/websphere/..) do I have to restart it?

No you don't have to restart. But it is probably wise to, especially if you don't have a good / automated way of checking that the service is running correctly.

The JVM will recover just fine. But the application server and the application itself may or may not recover, depending on how well they are designed to cope with this situation. (My experience is that some app servers are not designed to cope with this, and that designing and implementing a complicated application to recover from OOMEs is hard, and testing it properly is even harder.)

EDIT 2

In response to this comment:

> "other threads may be left waiting for notifies (etc) that never come" Really? Wouldn't the killed thread unwind its stacks, releasing resources as it goes, including held locks?

Yes really! Consider this:

Thread #1 runs this:

    synchronized(lock) {
         while (!someCondition) {
             lock.wait();
         }
    }
    // ...

Thread #2 runs this:

    synchronized(lock) {
         // do something
         lock.notify();
    }

If Thread #1 is waiting on the notify, and Thread #2 gets an OOME in the // do something section, then Thread #2 won't make the notify() call, and Thread #1 may get stuck forever waiting for a notification that won't ever occur. Sure, Thread #2 is guaranteed to release the mutex on the lock object ... but that is not sufficient!

> If not the code ran by the thread is not exception safe, which is a more general problem.

"Exception safe" is not a term I've heard of (though I know what you mean). Java programs are not normally designed to be resilient to unexpected exceptions. Indeed, in a scenario like the above, it is likely to be somewhere between hard and impossible to make the application exception safe.

You'd need some mechanism whereby the failure of Thread #1 (due to the OOME) gets turned into an inter-thread communication failure notification to Thread #2. Erlang does this ... but not Java. The reason they can do this in Erlang is that Erlang processes communicate using strict CSP-like primitives; i.e. there is no sharing of data structures!

(Note that you could get the above problem for just about any unexpected exception ... not just Error exceptions. There are certain kinds of Java code where attempting to recover from an unexpected exception is likely to end badly.)

Solution 2 - Java

The JVM will run the GC when it's on edge of the OutOfMemoryError. If the GC didn't help at all, then the JVM will throw OOME.

You can however catch it and if necessary take an alternative path. Any allocations inside the try block will be GC'ed.

Since the OOME is "just" an Error which you could just catch, I would expect the different JVM implementations to behave the same. I can at least confirm from experience that the above is true for the Sun JVM.

See also:

Solution 3 - Java

I'd say it depends partly on what caused the OutOfMemoryError. If the JVM truly is running low on memory, it might be a good idea to restart it, and with more memory if possible (or a more efficient app). However, I've seen a fair amount of OOMEs that were caused by allocating 2GB arrays and such. In that case, if it's something like a J2EE web app, the effects of the error should be constrained to that particular app, and a JVM-wide restart wouldn't do any good.

Solution 4 - Java

Can it recover? Possibly. Any well-written JVM is only going to throw an OOME after it's tried everything it can to reclaim enough memory to do what you tell it to do. There's a very good chance that this means you can't recover. But...

It depends on a lot of things. For example if the garbage collector isn't a copying collector, the "out of memory" condition may actually be "no chunk big enough left to allocate". The very act of unwinding the stack may have objects cleaned up in a later GC round that leave open chunks big enough for your purposes. In that situation you may be able to restart. It's probably worth at least retrying once as a result. But...

You probably don't want to rely on this. If you're getting an OOME with any regularity, you'd better look over your server and find out what's going on and why. Maybe you have to clean up your code (you could be leaking or making too many temporary objects). Maybe you have to raise your memory ceiling when invoking the JVM. Treat the OOME, even if it's recoverable, as a sign that something bad has hit the fan somewhere in your code and act accordingly. Maybe your server doesn't have to come down NOWNOWNOWNOWNOW, but you will have to fix something before you get into deeper trouble.

Solution 5 - Java

You can increase your odds of recovering from this scenario although its not recommended that you try. What you do is pre-allocate some fixed amount of memory on startup thats dedicated to doing your recovery work, and when you catch the OOM, null out that pre-allocated reference and you're more likely to have some memory to use in your recovery sequence.

I don't know about different JVM implementations.

Solution 6 - Java

Any sane JVM will throw an OutOfMemoryError only if there is nothing the Garbage collector can do. However, if you catch the OutOfMemoryError early enough on the stack frame it can be likely enough that the cause was itself became unreachable and was garbage collected (unless the problem is not in the current thread).

Generally frameworks that run other code, like application servers, attempting to continue in the face of an OME makes sense (as long as it can reasonably release the third-party code), but otherwise, in the general case, recovery should probably consist of bailing and telling the user why, rather than trying to go on as if nothing happened.

To answer your newly updated question: There is no reason to think you need to shut down the server if all is working well. My experience with JBoss is that as long as the OME didn't affect a deployment, things work fine. Sometimes JBoss runs out of permgen space if you do a lot of hot deployment. Then indeed the situation is hopeless and an immediate restart (which will have to be forced with a kill) is inevitable.

Of course each app server (and deployment scenario) will vary and it is really something learned from experience in each case.

Solution 7 - Java

You cannot fully a JVM that had OutOfMemoryError. At least with the oracle JVM you can add -XX:OnOutOfMemoryError="cmd args;cmd args" and take recovery actions, like kill the JVM or send the event somewhere.

Reference: https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsengsView Question on Stackoverflow
Solution 1 - JavaStephen CView Answer on Stackoverflow
Solution 2 - JavaBalusCView Answer on Stackoverflow
Solution 3 - JavaAdam CrumeView Answer on Stackoverflow
Solution 4 - JavaJUST MY correct OPINIONView Answer on Stackoverflow
Solution 5 - JavaAmir AfghaniView Answer on Stackoverflow
Solution 6 - JavaYishaiView Answer on Stackoverflow
Solution 7 - Javai000174View Answer on Stackoverflow