Can Java class files use reserved keywords as names?

JavaReflectionJvm.Class File

Java Problem Overview


I'm aware that Java-the-compilable-programming-language is not one and the same as Java-the-bytecode-format-for-JVM-execution. There are examples of things that are valid in the .class format but not in the .java source code, such as constructor-less classes and synthetic methods.

  1. If we hand-craft a .class file with a reserved Java language keyword (e.g. int, while) as the class, method, or field name, will the Java virtual machine accept it for loading?

  2. If the class is loaded, does it imply that the only way to access this class or member is through Java reflection, because the name is syntactically illegal in the Java programming language?

Java Solutions


Solution 1 - Java

Yes, you can use reserved words. The words are only for the compiler. They do not appear in the generated byte code.

An example of using reserved Java words is in the JVM-based Scala language. Scala has different constructs and syntax than Java, but compiles to Java byte code, for running on a JVM.

This is legal Scala:

class `class`

This defines a class named class with a no-arg constructor. Running javap (a disassembler) on the compiled class.class file shows

public class class {
    public class();
}

Scala can do the same with any other Java reserved word.

class int
class `while`
class goto

They can also be used for method or field names.

As you suspected, you would not be able to use these classes from Java, except for reflection. You could use these from a similarly "customized" class file, e.g. from a class file generated by the Scala compiler.

In summary, this is a limitation of javac (the compiler), not java (the VM/runtime environment).

Solution 2 - Java

The only restrictions on class names at the bytecode level are that they can't contain the characters [, . or ; and that they're at most 65535 bytes long. Among other things, this means that you can freely use reserved words, whitespace, special characters, Unicode, or even weird stuff like newlines.

You can theoretically even use null characters in a class name, but since it's impossible to have a null character in the filename, you can't include such a classfile in a jar. You might be able to create and load one dynamically though.

Here's an example of some of the things that you can do (written in Krakatau assembly):

; Entry point for the jar
.class Main
.super java/lang/Object

.method public static main : ([Ljava/lang/String;)V
    .limit stack 10
    .limit locals 10
	invokestatic int 								hello ()V
	invokestatic "-42" 								hello ()V
	invokestatic "" 								hello ()V
	invokestatic "  some  whitespace and \t tabs" 	hello ()V
	invokestatic "new\nline" 						hello ()V
	invokestatic 'name with "Quotes" in it' 		hello ()V
	return
.end method
.end class


.class int
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from int"
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

.class "-42"
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from -42"
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

; Even the empty string can be a class name!
.class ""
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from "
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

.class "  some  whitespace and \t tabs"
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from   some  whitespace and \t tabs"
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

.class "new\nline"
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from new\nline"
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

.class 'name with "Quotes" in it'
.super java/lang/Object
.method public static hello : ()V
    .limit stack 2
    .limit locals 0
    getstatic java/lang/System out Ljava/io/PrintStream;
    ldc "Hello from name with \"Quotes\" in it"
    invokevirtual java/io/PrintStream println (Ljava/lang/Object;)V
	return
.end method
.end class

Execution output:

Hello from int
Hello from -42
Hello from
Hello from   some  whitespace and        tabs
Hello from new
line
Hello from name with "Quotes" in it

See Holger's answer for the exact quote of the rules from the JVM specification.

Solution 3 - Java

The restrictions about names are fixed in the JVM specification:

> ###§4.2.1. Binary Class and Interface Names

> Class and interface names that appear in class file structures are always represented in a fully qualified form known as binary names (JLS §13.1). Such names are always represented as CONSTANT_Utf8_info structures (§4.4.7) and thus may be drawn, where not further constrained, from the entire Unicode codespace…

> For historical reasons, the syntax of binary names that appear in class file structures differs from the syntax of binary names documented in JLS §13.1. In this internal form, the ASCII periods (.) that normally separate the identifiers which make up the binary name are replaced by ASCII forward slashes (/). The identifiers themselves must be unqualified names (§4.2.2).   >###§4.2.2. Unqualified Names

> Names of methods, fields, local variables, and formal parameters are stored as unqualified names. An unqualified name must contain at least one Unicode code point and must not contain any of the ASCII characters . ; [ / (that is, period or semicolon or left square bracket or forward slash).

> Method names are further constrained so that, with the exception of the special method names <init> and <clinit> (§2.9), they must not contain the ASCII characters < or > (that is, left angle bracket or right angle bracket).

So the answer is, there are only a few characters you can’t use on the binary level. First, / is the package separator. Then, ; and [ can’t be used because the have special meaning in field signatures and method signatures which may contain type names. In these signatures, [ starts an array type and ; marks the end of a reference type name.

There is no clear reason why . is forbidden. It isn’t used within the JVM and only has a meaning within generic signatures but if you are using generic signatures, the type names are further restricted by not being allowed to contain <, >, : as well as these characters have a special meaning within generic signatures too.

Consequently, violating the specification by using . within identifiers has no impact on the primary function of the JVM. There are obfuscators doing so. The resulting code works but you may encounter problems with Reflection when asking for Generic type signatures. Also, converting binary names to source name by replacing all /s with .s will become irreversible if the binary name contains .s.


It might be interesting that there was a proposal to support all possible identifiers within Java syntax (see point 3, “exotic identifiers”), but it didn’t make it into the final Java 7. And it seems, no-one is currently making a new attempt to bring it in.


There is the additional technical limitation that the names can’t have a Modified UTF-8 representation being longer than 65535 bytes because the number of bytes is stored as unsigned short value.

Solution 4 - Java

  1. Keywords are known only to the compiler. The compiler translates them into adequate bytecode. So they don't exist during runtime of compiled bytecode and, consequently, are not verified by the JVM.
  2. Surely, you can't access the class members that are not known at compile time. But you can use reflection for that purpose if you are sure that such class member will exist in the compiled code (you will "hand-craft" them there), because access by reflection is not verified by the compiler.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNayukiView Question on Stackoverflow
Solution 1 - JavaPaul DraperView Answer on Stackoverflow
Solution 2 - JavaAntimonyView Answer on Stackoverflow
Solution 3 - JavaHolgerView Answer on Stackoverflow
Solution 4 - JavaSergeiView Answer on Stackoverflow