Can the JVM GC move objects in the middle of a reference comparison, causing a comparison to fail even when both sides refer to the same object?

JavaGarbage CollectionJvm

Java Problem Overview


It's well known that GCs will sometimes move objects around in memory. And it's to my understanding that as long as all references are updated when the object is moved (before any user code is called), this should be perfectly safe.

However, I saw someone mention that reference comparison could be unsafe due to the object being moved by the GC in the middle of a reference comparison such that the comparison could fail even when both references should be referring to the same object?

ie, is there any situation under which the following code would not print "true"?

Foo foo = new Foo();
Foo bar = foo;
if(foo == bar) {
    System.out.println("true");
}

I tried googling this and the lack of reliable results leads me to believe that the person who stated this was wrong, but I did find an assortment of forum posts (like this one) that seemed to indicate that he was correct. But that thread also has people saying that it shouldn't be the case.

Java Solutions


Solution 1 - Java

Java Bytecode instructions are always atomic in relation to the GC (i.e. no cycle can happen while a single instruction is being executed).

The only time the GC will run is between two Bytecode instructions.

Looking at the bytecode that javac generates for the if instruction in your code we can simply check to see if a GC would have any effect:

// a GC here wouldn't change anything
ALOAD 1
// a GC cycle here would update all references accordingly, even the one on the stack
ALOAD 2
// same here. A GC cycle will update all references to the object on the stack
IF_ACMPNE L3
// this is the comparison of the two references. no cycle can happen while this comparison
// "is running" so there won't be any problems with this either

Aditionally, even if the GC were able to run during the execution of a bytecode instruction, the references of the object would not change. It's still the same object before and after the cycle.

So, in short the answer to your question is no, it will always output true.

Solution 2 - Java

Source:

https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.21.3

The short answer is, looking at the java 8 specification: No.

The == operator will always perform object equality check (given that neither reference is null). Even if the object is moved, the object is still the same object.

If you see such an effect, you have just found a JVM bug. Go submit it.

It could, of course, be that some obscure implementation of the JVM does not enforce this for whatever strange performance reason. If that is the case, it would be wise to simply move on from that JVM...

Solution 3 - Java

TL;DR

You should not think about that kind of stuff what so ever, It's a dark place. Java has clearly stated out it's specifications and you should not doubt it, ever.

> 2.7. Representation of Objects > >The Java Virtual Machine does not mandate any particular internal structure for objects.

Source: JVMS SE8.

I doubt it! If you may doubt this very basic operator you may find yourself doubt everything else, getting frustrated and paranoid with trust issues is not the place you want to be.

What if it happens to me? Such a bug should not be existed. The Oracle discussion you supplied reporting a bug that happened years ago and somehow discussion OP decided to pop that up for no reason, either without reliable documentation of such bug existed now days. However, if such bug or any others has occurred to you, please submit it here.

To let your worries go away, Java has adjusted the pointer to pointer approach into the JVM pointer table, you can read more about it's efficenty here.

Solution 4 - Java

GCs only happen at points in the program where the state is well-defined and the JVM has exact knowledge where everything is in registers/the stack/on the heap so all references can be fixed up when an object gets moved.

I.e. they cannot occur between execution of arbitrary assembly instructions. Conceptually you can think of them occuring between bytecode instructions of the JVM with the GC adjusting all references that have been generated by previous instructions.

Solution 5 - Java

You are asking a question with a wrong premise. Since the == operator does not compare memory locations, it isn’t sensible to changes of memory location per se. The == operator, applied to references, compares the identity of the referred objects, regardless of how the JVM implements it.

To name an example that counteracts the usual understanding, a distributed JVM may have objects held in the RAM of different computers, including the possibility of local copies. So simply comparing addresses won’t work. Of course, it’s up to the JVM implementation to ensure that the semantics, as defined in the Java Language Specification, do not change.

If a particular JVM implementation implements a reference comparison by directly comparing memory locations of objects and has a garbage collector that can change memory locations, of course, it’s up to the JVM to ensure that these two features can’t interfere with each other in an incompatible way.

If you are curious on how this can work, e.g. inside optimized, JIT compiled code, the granularity isn’t as fine as you might think. Every sequential code, including forward branches, can be considered to run fast enough to allow to delay garbage collection to its completion. So garbage collection can’t happen at any time inside optimized code, but must be allowed at certain points, e.g.

  • backward branches (note that due to loop unrolling, not every loop iteration implies a backward branch)
  • memory allocations
  • thread synchronization actions
  • invoking a method that hasn’t been inlined/analyzed
  • maybe something special, I forgot

So the JVM emits code containing certain “safe points” at which it is known, which references are currently held, how to replace them, if necessary and, of course, changing locations has no impact on the correctness. Between these points, the code can run without having to care about the possibility of changing memory locations whereas the garbage collector will wait for code reaching a safe point when necessary, which is guaranteed to happen in finite, rather short time.

But, as said, these are implementation details. On the formal level, things like changing memory locations do not exist, so there is no need to explicitly specify that they are not allowed to change the semantics of Java code. No implementation detail is allowed to do that.

Solution 6 - Java

I understand you are asking this question after someone says it behaves that way, but really asking if it does behave that way isn't the right approach to evaluating what they said.

What you should really be asking (primarily yourself, others only if you can't decide on an answer) is whether it makes sense for the GC to be allowed to cause a comparison to fail that logically should succeed (basically any comparison that doesn't include a weak reference).

The answer to that is obviously "no", as it would break pretty much anything beyond "hello, world" and probably even that.

So, if allowed, it is a bug -- either in the spec or the implementation. Now since both the spec and the implementation were written by humans, it is possible such a bug exists. If so, it will be reported and almost certainly fixed.

Solution 7 - Java

No, because that would be flagrantly ridiculous and a patent bug.

The GC takes a great deal of care behind the scenes to avoid catastrophically breaking everything. In particular, it will only move objects when threads are paused at safepoints, which are specific places in the running code generated by the JVM for threads to be paused at. A thread at a safepoint is in a known state, where the positions of all the possible object references in registers and memory are known, so the GC can update them to point to the object's new address. Garbage collection won't break your comparison operations.

Solution 8 - Java

Java object hold a reference to the "object" not to the memory space where the object is stored.

Java do this because it allow the JVM to manage memory usage by its own (e.g. Garbage collector) and to improve global usage without impacting the client program directly.

As instance for improvement, the first X int (I don't remember how much) are always allocated in memory to execute for loop fatser (ex: for (int i =0; i<10; i++))

And as example for object reference, just try to create an and try to print it

int[] i = {1,2,3};
System.out.println(i);

You will see that Java returning a something starting with [I@. It is saying that is point on a "array of int at" and then the reference to the object. Not the memory zone!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKatView Question on Stackoverflow
Solution 1 - JavamhlzView Answer on Stackoverflow
Solution 2 - JavaErik NyströmView Answer on Stackoverflow
Solution 3 - JavahomerunView Answer on Stackoverflow
Solution 4 - Javathe8472View Answer on Stackoverflow
Solution 5 - JavaHolgerView Answer on Stackoverflow
Solution 6 - JavajmorenoView Answer on Stackoverflow
Solution 7 - JavaBoannView Answer on Stackoverflow
Solution 8 - JavaKraissView Answer on Stackoverflow