Java HashSet vs HashMap

JavaCollections

Java Problem Overview


I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.

    Set testSet = new HashSet<SimpleObject>();
	Map testMap = new HashMap<Integer, SimpleObject>();	
	
	SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
	SimpleObject simplObject2 = new SimpleObject("Igor", 1);
	testSet.add(simpleObject1);
	testSet.add(simplObject2);
	
	
	Integer key = new Integer(10);
	
	testMap.put(key, simpleObject1);
	testMap.put(key, simplObject2);
	
	System.out.println(testSet.size());
	System.out.println(testMap.size());

The output is 1 and 1.

SimpleObject code

public class SimpleObject {

private String dataField1;
private int dataField2;

public SimpleObject(){}

public SimpleObject(String data1, int data2){
	this.dataField1 = data1;
	this.dataField2 = data2;
}

public String getDataField1() {
	return dataField1;
}

public int getDataField2() {
	return dataField2;
}

@Override
public int hashCode() {
	final int prime = 31;
	int result = 1;
	result = prime * result
			+ ((dataField1 == null) ? 0 : dataField1.hashCode());
	result = prime * result + dataField2;
	return result;
}

@Override
public boolean equals(Object obj) {
	if (this == obj)
		return true;
	if (obj == null)
		return false;
	if (getClass() != obj.getClass())
		return false;
	SimpleObject other = (SimpleObject) obj;
	if (dataField1 == null) {
		if (other.dataField1 != null)
			return false;
	} else if (!dataField1.equals(other.dataField1))
		return false;
	if (dataField2 != other.dataField2)
		return false;
	return true;
 }
}

Java Solutions


Solution 1 - Java

The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.

The difference between the two should be obvious:

  • in a Map you store key-value pairs
  • in a Set you store only the keys

In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.

Solution 2 - Java

A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.

Solution 3 - Java

In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:

SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);


Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());

Solution 4 - Java

I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.

Solution 5 - Java

Answer is simple because it is nature of HashSets. HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.

hash(simpleObject1) and hash(simplObject2) will return the same int. So?

When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.

As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.

Solution 6 - Java

I think the major difference is, HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.

Solution 7 - Java

public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set More Details

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionIgorDiyView Question on Stackoverflow
Solution 1 - JavaBozhoView Answer on Stackoverflow
Solution 2 - JavaColinDView Answer on Stackoverflow
Solution 3 - Javalobster1234View Answer on Stackoverflow
Solution 4 - JavaNemanja ParipovicView Answer on Stackoverflow
Solution 5 - JavahuseyinView Answer on Stackoverflow
Solution 6 - JavaDavid PrunView Answer on Stackoverflow
Solution 7 - JavaJitendraView Answer on Stackoverflow