Do any compilers for the JVM use the "wide" goto?

JavaJvmGoto

Java Problem Overview


I figure most of you know that goto is a reserved keyword in the Java language but is not actually used. And you probably also know that goto is a Java Virtual Machine (JVM) opcode. I reckon all the sophisticated control flow structures of Java, Scala and Kotlin are, at the JVM level, implemented using some combination of goto and ifeq, ifle, iflt, etc.

Looking at the JVM spec https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.goto_w I see there's also a goto_w opcode. Whereas goto takes a 2-byte branch offset, goto_w takes a 4-byte branch offset. The spec states that

> Although the goto_w instruction takes a 4-byte branch offset, other factors limit the size of a method to 65535 bytes (§4.11). This limit may be raised in a future release of the Java Virtual Machine.

It sounds to me like goto_w is future-proofing, like some of the other *_w opcodes. But it also occurs to me that maybe goto_w could be used with the two more significant bytes zeroed out and the two less significant bytes the same as for goto, with adjustments as needed.

For example, given this Java Switch-Case (or Scala Match-Case):

     12: lookupswitch  {
                112785: 48 // case "red"
               3027034: 76 // case "green"
              98619139: 62 // case "blue"
               default: 87
          }
      48: aload_2
      49: ldc           #17                 // String red
      51: invokevirtual #18
            // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      54: ifeq          87
      57: iconst_0
      58: istore_3
      59: goto          87
      62: aload_2
      63: ldc           #19                 // String green
      65: invokevirtual #18
            // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      68: ifeq          87
      71: iconst_1
      72: istore_3
      73: goto          87
      76: aload_2
      77: ldc           #20                 // String blue
      79: invokevirtual #18 
      // etc.

we could rewrite it as

     12: lookupswitch  { 
                112785: 48
               3027034: 78
              98619139: 64
               default: 91
          }
      48: aload_2
      49: ldc           #17                 // String red
      51: invokevirtual #18
            // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      54: ifeq          91 // 00 5B
      57: iconst_0
      58: istore_3
      59: goto_w        91 // 00 00 00 5B
      64: aload_2
      65: ldc           #19                 // String green
      67: invokevirtual #18
            // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      70: ifeq          91
      73: iconst_1
      74: istore_3
      75: goto_w          91
      79: aload_2
      81: ldc           #20                 // String blue
      83: invokevirtual #18 
      // etc.

I haven't actually tried this, since I've probably made a mistake changing the "line numbers" to accommodate the goto_ws. But since it's in the spec, it should be possible to do it.

My question is whether there is a reason a compiler or other generator of bytecode might use goto_w with the current 65535 limit other than to show that it can be done?

Java Solutions


Solution 1 - Java

The size of the method code can be as large as 64K.

The branch offset of the short goto is a signed 16-bit integer: from -32768 to 32767.

So, the short offset is not enough to make a jump from the beginning of 65K method to the end.

Even javac sometimes emits goto_w. Here is an example:

public class WideGoto {

    public static void main(String[] args) {
        for (int i = 0; i < 1_000_000_000; ) {
            i += 123456;
            // ... repeat 10K times ...
        }
    }
}

Decompiling with javap -c:

  public static void main(java.lang.String[]);
    Code:
       0: iconst_0
       1: istore_1
       2: iload_1
       3: ldc           #2
       5: if_icmplt     13
       8: goto_w        50018     // <<< Here it is! A jump to the end of the loop
          ...

Solution 2 - Java

There is no reason to use goto_w when the branch fits into a goto. But you seem to have missed that the branches are relative, using a signed offset, as a branch can also go backward.

You don’t notice it when looking at the output of a tool like javap, as it calculates the resulting absolute target address before printing.

So goto’s range of -327678 … +32767‬ is not always enough to address each possible target location in the 0 … +65535 range.

For example, the following method will have a goto_w instruction at the beginning:

public static void methodWithLargeJump(int i) {
    for(; i == 0;) {
        try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1: 
        try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1: 
        try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1: 
        try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1: 
        try {x();} finally { switch(i){ case 1: try {x();} finally { switch(i){ case 1: 
        } } } } } } } } } } } } } } } } } } } } 
    }
}
static void x() {}

Demo on Ideone

Compiled from "Main.java"
class LargeJump {
  public static void methodWithLargeJump(int);
    Code:
       0: iload_0
       1: ifeq          9
       4: goto_w        57567

Solution 3 - Java

It appears that in some compilers (tried in 1.6.0 and 11.0.7), if a method is large enough the ever need goto_w, it uses exclusively goto_w. Even when it has very local jumps, it still uses goto_w.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlonso del ArteView Question on Stackoverflow
Solution 1 - JavaapanginView Answer on Stackoverflow
Solution 2 - JavaHolgerView Answer on Stackoverflow
Solution 3 - JavaDavid G.View Answer on Stackoverflow