How is hashCode() calculated in Java

JavaHashcode

Java Problem Overview


What value does the hashCode() method return in java?

I read that it is a memory reference of an object... The hash value for new Integer(1) is 1; the hash value for String("a") is 97.

I am confused: is it ASCII or what type of value is?

Java Solutions


Solution 1 - Java

The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.

There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.

Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.

public static void main(String[] args) {
    Map<Integer,Long> map = new HashMap<>();
    Random generator = new Random();
    Collection<Integer> counts = new LinkedList<>();

    Long object = generator.nextLong();
    // We use the identityHashCode as the key into the map
    // This makes it easier to check if any other objects
    // have the same key.
    int hash = System.identityHashCode(object);
    while (!map.containsKey(hash)) {
        map.put(hash, object);
        object = generator.nextLong();
        hash = System.identityHashCode(object);
    }
    System.out.println("Identical maps for size:  " + map.size());
    System.out.println("First object value: " + object);
    System.out.println("Second object value: " + map.get(hash));
    System.out.println("First object identityHash:  " + System.identityHashCode(object));
    System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}

Example output:

Identical maps for size:  105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash:  2134400190
Second object identityHash: 2134400190

Solution 2 - Java

A hashcode is an integer value that represents the state of the object upon which it was called. That is why an Integer that is set to 1 will return a hashcode of "1" because an Integer's hashcode and its value are the same thing. A character's hashcode is equal to it's ASCII character code. If you write a custom type you are responsible for creating a good hashCode implementation that will best represent the state of the current instance.

Solution 3 - Java

If you want to know how they are implmented, I suggest you read the source. If you are using an IDE you can just + on a method you are interested in and see how a method is implemented. If you cannot do that, you can google for the source.

For example, Integer.hashCode() is implemented as

   public int hashCode() {
       return value;
   }

and String.hashCode()

   public int hashCode() {
       int h = hash;
       if (h == 0) {
           int off = offset;
           char val[] = value;
           int len = count;

           for (int i = 0; i < len; i++) {
               h = 31*h + val[off++];
           }
           hash = h;
       }
       return h;
   }

Solution 4 - Java

The hashCode() method is often used for identifying an object. I think the Object implementation returns the pointer (not a real pointer but a unique id or something like that) of the object. But most classes override the method. Like the String class. Two String objects have not the same pointer but they are equal:

new String("a").hashCode() == new String("a").hashCode()

I think the most common use for hashCode() is in Hashtable, HashSet, etc..

Java API Object hashCode()

Edit: (due to a recent downvote and based on an article I read about JVM parameters)

With the JVM parameter -XX:hashCode you can change the way how the hashCode is calculated (see the Issue 222 of the Java Specialists' Newsletter).

> HashCode==0: Simply returns random numbers with no relation to where > in memory the object is found. As far as I can make out, the global > read-write of the seed is not optimal for systems with lots of > processors. > > HashCode==1: Counts up the hash code values, not sure at what value > they start, but it seems quite high. > > HashCode==2: Always returns the exact same identity hash code of 1. > This can be used to test code that relies on object identity. The > reason why JavaChampionTest returned Kirk's URL in the example above > is that all objects were returning the same hash code. > > HashCode==3: Counts up the hash code values, starting from zero. It > does not look to be thread safe, so multiple threads could generate > objects with the same hash code. > > HashCode==4: This seems to have some relation to the memory location > at which the object was created. > > HashCode>=5: This is the default algorithm for Java 8 and has a > per-thread seed. It uses Marsaglia's xor-shift scheme to produce > pseudo-random numbers.

Solution 5 - Java

> I read that it is an memory reference of an object..

No. Object.hashCode() used to return a memory address about 14 years ago. Not since.

> what type of value is

What it is depends entirely on what class you're talking about and whether or not it has overridden `Object.hashCode().

Solution 6 - Java

From OpenJDK sources (JDK8):

Use default of 5 to generate hash codes:

product(intx, hashCode, 5,                                                
      "(Unstable) select hashCode generation algorithm")       

Some constant data and a random generated number with a seed initiated per thread:

// thread-specific hashCode stream generator state - Marsaglia shift-xor form
  _hashStateX = os::random() ;
  _hashStateY = 842502087 ;
  _hashStateZ = 0x8767 ;    // (int)(3579807591LL & 0xffff) ;
  _hashStateW = 273326509 ;

Then, this function creates the hashCode (defaulted to 5 as specified above):

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

So we can see that at least in JDK8 the default is set to random thread specific.

Solution 7 - Java

Definition: The String hashCode() method returns the hashcode value of the String as an Integer.

Syntax: public int hashCode()

Hashcode is calculated using below formula

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

where:

s is ith character in the string
n is length of the string
^ is exponential operand

Example: For example if you want to calculate hashcode for string "abc" then we have below details

s[] = {'a', 'b', 'c'}
n = 3

So the hashcode value will be calculated as:

s[0]*31^(2) + s[1]*31^1 + s[2]
= a*31^2 + b*31^1 + c*31^0
= (ASCII value of a = 97, b = 98 and c = 99)
= 97*961 + 98*31 + 99 
= 93217 + 3038 + 99
= 96354

So the hashcode value for 'abc' is 96354

Solution 8 - Java

Object.hashCode(), if memory serves correctly (check the JavaDoc for java.lang.Object), is implementation-dependent, and will change depending on the object (the Sun JVM derives the value from the value of the reference to the object).

Note that if you are implementing any nontrivial object, and want to correctly store them in a HashMap or HashSet, you MUST override hashCode() and equals(). hashCode() can do whatever you like (it's entirely legal, but suboptimal to have it return 1.), but it's vital that if your equals() method returns true, then the value returned by hashCode() for both objects are equal.

Confusion and lack of understanding of hashCode() and equals() is a big source of bugs. Make sure that you thoroughly familiarize yourself with the JavaDocs for Object.hashCode() and Object.equals(), and I guarantee that the time spent will pay for itself.

Solution 9 - Java

From the Javadoc:

> As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#hashCode--

Solution 10 - Java

I'm surprised that no one mentioned this but although its obvious for any non Object class your first action should be to read the source code for many classes .hashcode() is simply extended from Object in which case there are several different interesting things that may happen depending on your JVM implementation. Object.hashcode() calls to System.identityHashcode(object).

Indeed using object address in memory is ancient history but many do not realise they can control this behaviour and how Object.hashcode() is computed via jvm argument -XX:hashCode=N where N can be a number from [0-5]...

0 – Park-Miller RNG (default, blocking)
1 – f(address, global_statement)
2 – constant 1
3 – serial counter
4object address
5 – Thread-local Xorshift

Depending on an application you may see unexpected performance hits when .hashcode() is called, when that happens it is likely you are using one of the algorithms that shares global state and/or blocks.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJothiView Question on Stackoverflow
Solution 1 - JavadanbenView Answer on Stackoverflow
Solution 2 - JavaAndrew HareView Answer on Stackoverflow
Solution 3 - JavaPeter LawreyView Answer on Stackoverflow
Solution 4 - JavaalexvetterView Answer on Stackoverflow
Solution 5 - Javauser207421View Answer on Stackoverflow
Solution 6 - JavaJordan SheinfeldView Answer on Stackoverflow
Solution 7 - JavaShubham Kumar SinghView Answer on Stackoverflow
Solution 8 - JavaBen FowlerView Answer on Stackoverflow
Solution 9 - JavaSamView Answer on Stackoverflow
Solution 10 - JavavachView Answer on Stackoverflow