Do any compilers for the JVM use the "wide" goto?
JavaJvmGotoJava Problem Overview
I figure most of you know that goto
is a reserved keyword in the Java language but is not actually used. And you probably also know that goto
is a Java Virtual Machine (JVM) opcode. I reckon all the sophisticated control flow structures of Java, Scala and Kotlin are, at the JVM level, implemented using some combination of goto
and ifeq
, ifle
, iflt
, etc.
Looking at the JVM spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.goto_w I see there's also a goto_w
opcode. Whereas goto
takes a 2-byte branch offset, goto_w
takes a 4-byte branch offset. The spec states that
> Although the goto_w instruction takes a 4-byte branch offset, other factors limit the size of a method to 65535 bytes (§4.11). This limit may be raised in a future release of the Java Virtual Machine.
It sounds to me like goto_w
is future-proofing, like some of the other *_w
opcodes. But it also occurs to me that maybe goto_w
could be used with the two more significant bytes zeroed out and the two less significant bytes the same as for goto
, with adjustments as needed.
For example, given this Java Switch-Case (or Scala Match-Case):
12: lookupswitch {
112785: 48 // case "red"
3027034: 76 // case "green"
98619139: 62 // case "blue"
default: 87
}
48: aload_2
49: ldc #17 // String red
51: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
54: ifeq 87
57: iconst_0
58: istore_3
59: goto 87
62: aload_2
63: ldc #19 // String green
65: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
68: ifeq 87
71: iconst_1
72: istore_3
73: goto 87
76: aload_2
77: ldc #20 // String blue
79: invokevirtual #18
// etc.
we could rewrite it as
12: lookupswitch {
112785: 48
3027034: 78
98619139: 64
default: 91
}
48: aload_2
49: ldc #17 // String red
51: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
54: ifeq 91 // 00 5B
57: iconst_0
58: istore_3
59: goto_w 91 // 00 00 00 5B
64: aload_2
65: ldc #19 // String green
67: invokevirtual #18
// Method java/lang/String.equals:(Ljava/lang/Object;)Z
70: ifeq 91
73: iconst_1
74: istore_3
75: goto_w 91
79: aload_2
81: ldc #20 // String blue
83: invokevirtual #18
// etc.
I haven't actually tried this, since I've probably made a mistake changing the "line numbers" to accommodate the goto_w
s. But since it's in the spec, it should be possible to do it.
My question is whether there is a reason a compiler or other generator of bytecode might use goto_w
with the current 65535 limit other than to show that it can be done?
Java Solutions
Solution 1 - Java
The size of the method code can be as large as 64K.
The branch offset of the short goto
is a signed 16-bit integer: from -32768 to 32767.
So, the short offset is not enough to make a jump from the beginning of 65K method to the end.
Even javac
sometimes emits goto_w
. Here is an example:
public class WideGoto {
public static void main(String[] args) {
for (int i = 0; i < 1_000_000_000; ) {
i += 123456;
// ... repeat 10K times ...
}
}
}
Decompiling with javap -c
:
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iload_1
3: ldc #2
5: if_icmplt 13
8: goto_w 50018 // <<< Here it is! A jump to the end of the loop
...
Solution 2 - Java
There is no reason to use goto_w
when the branch fits into a goto
. But you seem to have missed that the branches are relative, using a signed offset, as a branch can also go backward.
You don’t notice it when looking at the output of a tool like javap
, as it calculates the resulting absolute target address before printing.
So goto
’s range of -327678 … +32767
is not always enough to address each possible target location in the 0 … +65535
range.
For example, the following method will have a goto_w
instruction at the beginning:
public static void methodWithLargeJump(int i) {
for(; i == 0;) {
try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1:
try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1:
try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1:
try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1:
try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1:
} } } } } } } } } } } } } } } } } } } }
}
}
static void x() {}
Compiled from "Main.java"
class LargeJump {
public static void methodWithLargeJump(int);
Code:
0: iload_0
1: ifeq 9
4: goto_w 57567
…
Solution 3 - Java
It appears that in some compilers (tried in 1.6.0 and 11.0.7), if a method is large enough the ever need goto_w, it uses exclusively goto_w. Even when it has very local jumps, it still uses goto_w.