Which data structure would you use: TreeMap or HashMap? (Java)

JavaData StructuresMapHashmapTreemap

Java Problem Overview


Description | A Java program to read a text file and print each of the unique words in alphabetical order together with the number of times the word occurs in the text.

The program should declare a variable of type Map<String, Integer> to store the words and corresponding frequency of occurrence. Which concrete type, though? TreeMap<String, Number> or HashMap<String, Number> ?

The input should be converted to lower case.

A word does not contain any of these characters: \t\t\n]f.,!?:;\"()'

Example output |

 Word            Frequency
  a                 1
  and               5
  appearances       1
  as                1
         .
         .
         .
     

Remark | I know, I've seen elegant solutions to this in Perl with roughly two lines of code. However, I want to see it in Java.

Edit: Oh yeah, it be helpful to show an implementation using one of these structures (in Java).

Java Solutions


Solution 1 - Java

TreeMap seems a no-brainer to me - simply because of the "in alphabetical order" requirement. HashMap has no ordering when you iterate through it; TreeMap iterates in the natural key order.

EDIT: I think Konrad's comment may have been suggesting "use HashMap, then sort." This is good because although we'll have N iterations initially, we'll have K <= N keys by the end due to duplicates. We might as well save the expensive bit (sorting) until the end when we've got fewer keys than take the small-but-non-constant hit of keeping it sorted as we go.

Having said that, I'm sticking to my answer for the moment: because it's the simplest way of achieving the goal. We don't really know that the OP is particularly worried about performance, but the question implies that he's concerned about the elegance and brevity. Using a TreeMap makes this incredibly brief, which appeals to me. I suspect that if performance is really an issue, there may be a better way of attacking it than either TreeMap or HashMap :)

Solution 2 - Java

TreeMap beats HashMap because TreeMap is already sorted for you.

However, you might want to consider using a more appropriate data structure, a bag. See Commons Collections - and the TreeBag class:

This has a nice optimised internal structure and API:

bag.add("big")
bag.add("small")
bag.add("big")
int count = bag.getCount("big")

EDIT: The question of HashMap vs TreeMap performance was answered by Jon - HashMap and sort may be quicker (try it!), but TreeBag is easier. The same is true for bags. There is a HashBag as well as a TreeBag. Based on the implementation (uses a mutable integer) a bag should outperform the equivalent plain map of Integer. The only way to know for sure is to test, as with any performance question.

Solution 3 - Java

I see quite a few people saying "TreeMap look-up takes O(n log n)"!! How come?

I don't know how it has been implemented but in my head it takes O(log n).

This is because look-up in a tree can be done in O(log n). You don't sort the entire tree every time you insert an item in it. That's the whole idea of using a tree!

Hence, going back to the original question, the figures for comparison turn out to be:

HashMap approach: O(n + k log k) average case, worst case could be much larger

TreeMap approach: O(k + n log k) worst case

where n = number of words in the text , k = number of distinct words in the text.

Solution 4 - Java

Hash map should be much faster. You should not choose a container based on how you want the items to be arranged eventually; Just sort the list of (word, frequency)-pairs at the end. There will usually be less such pairs to be sorted than words in the files, so asymptotic (and real) performance with a hash map will be better.

Solution 5 - Java

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.Iterator;
import java.util.Map;
import java.util.StringTokenizer;
import java.util.TreeMap;

public class TreeMapExample {
	
	public static void main (String args[]){
		Map<String,Integer> tm = new TreeMap<String,Integer>();
		try {
			
			FileInputStream fis = new FileInputStream("Test.txt");
			DataInputStream in = new DataInputStream(fis);
			BufferedReader br = new BufferedReader(new InputStreamReader(in));
			String line;
			int countValue = 1;
			while((line = br.readLine())!= null ){
				line = line.replaceAll("[-+.^:;,()\"\\[\\]]","");
				StringTokenizer st = new StringTokenizer(line, " ");	
				while(st.hasMoreTokens()){
					String nextElement = (String) st.nextElement();
					
					if(tm.size()>0 && tm.containsKey(nextElement)){
						int val = 0;
						if(tm.get(nextElement)!= null){
						val = (Integer) tm.get(nextElement);
						val = val+1;
						}
						tm.put(nextElement, val);
					}else{
					tm.put(nextElement, 1);
					}
					
				}
			}
			for(Map.Entry<String,Integer> entry : tm.entrySet()) {
			System.out.println(entry.getKey() + " : " + entry.getValue());
			}
			
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

}

Solution 6 - Java

You can't assign a TreeMap<String,Number> to a variable with the type Map<String,Integer>. Double, Long, etc. can be "put" into a TreeMap<String,Number>. When I "get" a value from a Map<String,Integer>, it must be an Integer.

Completely ignoring any i18n issues, memory constraints, and error handling, here goes:

class Counter {

  public static void main(String... argv)
    throws Exception
  {
    FileChannel fc = new FileInputStream(argv[0]).getChannel();
    ByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
    CharBuffer cb = Charset.defaultCharset().decode(bb);
    Pattern p = Pattern.compile("[^ \t\r\n\f.,!?:;\"()']+");
    Map<String, Integer> counts = new TreeMap<String, Integer>();
    Matcher m = p.matcher(cb);
    while (m.find()) {
      String word = m.group();
      Integer count = counts.get(word);
      count = (count == null) ? 1 : count + 1;
      counts.put(word, count);
    }
    fc.close();
    for (Map.Entry<String, Integer> e : counts.entrySet()) {
      System.out.printf("%s: %d%n", e.getKey(), e.getValue());
    }
  }

}

Solution 7 - Java

"When a key already exists it has the same performance as a HashMap." - That is just plain wrong. HashMap has O(1) insertion and TreeMap O(n log n). It'll take at least n log n checks to find out if it's in the table!

Solution 8 - Java

For this way, in my opinion, better use HashBag from Apache Commons Collections or HashMultiset from Guava or HashBag from Eclipse Collections (formaly GS Collections) or any following classes:

    Order    |	Guava           |	Apache  | Eclipse(GS) | JDK analog
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Not define   | HashMultiset     |	HashBag | HashBag     | HashMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Sorted       | TreeMultiset     |	TreeBag | TreeBag     | TreeMap<String, Integer>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Linked       |LinkedHashMultiset|	  -     |     -       | LinkedHashMap<String, Integere>
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent & | ConcurrentHash-  |Synchroniz-|Synchroniz-  | Collections.synchronizedMap(
not define   | Multiset         |	edBag   | edBag       |       HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Concurrent   |         -        |Synchroniz-|Synchroniz-  | Collections.synchronizedSorted-
and sorted   |                  |edSortedBag| edSortedBag |       Map(TreeMap<>))
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableMultiset|Unmodifiab-|Unmodifiab-  | Collections.unmodifiableMap(
not define   |                  |	leBag   | leBag       | HashMap<String, Integer>)
─────────────┼──────────────────┼───────────┼─────────────┼─────────────
Immutable and| ImmutableSorted- |Unmodifiab-|Unmodifiab-  | Collections.unmodifiableSorted-
sorted       | Multiset         |leSortedBag| leSortedBag | Map(TreeMap<String, Integer>))
────────────────────────────────────────────────────────────────────────

Examples:

1. Using SynchronizedSortedBag from Apache:
    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Bag bag = SynchronizedSortedBag.synchronizedBag(new TreeBag(Arrays.asList(INPUT_TEXT.split(" "))));

    // Print count words
    System.out.println(bag); // print [1:All!,2:Hello,1:Hi,2:World!]- in natural (alphabet) order
    // Print all unique words
    System.out.println(bag.uniqueSet());    // print [All!, Hello, Hi, World!]- in natural (alphabet) order


    // Print count occurrences of words
    System.out.println("Hello = " + bag.getCount("Hello"));    // print 2
    System.out.println("World = " + bag.getCount("World!"));    // print 2
    System.out.println("All = " + bag.getCount("All!"));    // print 1
    System.out.println("Hi = " + bag.getCount("Hi"));    // print 1
    System.out.println("Empty = " + bag.getCount("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.uniqueSet().size());    //print 4
2. Using TreeBag from Eclipse(GC):
    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    MutableSortedBag<String> bag =  TreeBag.newBag(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(bag); // print [All!, Hello, Hello, Hi, World!, World!]- in natural order
    // Print all unique words
    System.out.println(bag.toSortedSet());    // print [All!, Hello, Hi, World!]- in natural order

    // Print count occurrences of words
    System.out.println("Hello = " + bag.occurrencesOf("Hello"));    // print 2
    System.out.println("World = " + bag.occurrencesOf("World!"));    // print 2
    System.out.println("All = " + bag.occurrencesOf("All!"));    // print 1
    System.out.println("Hi = " + bag.occurrencesOf("Hi"));    // print 1
    System.out.println("Empty = " + bag.occurrencesOf("Empty"));    // print 0

    // Print count all words
    System.out.println(bag.size());    //print 6

    // Print count unique words
    System.out.println(bag.toSet().size());    //print 4
3. Using LinkedHashMultiset from Guava:
    // Parse text to separate words
    String INPUT_TEXT = "Hello World! Hello All! Hi World!";
    // Create Multiset
    Multiset<String> multiset = LinkedHashMultiset.create(Arrays.asList(INPUT_TEXT.split(" ")));

    // Print count words
    System.out.println(multiset); // print [Hello x 2, World! x 2, All!, Hi]- in predictable iteration order
    // Print all unique words
    System.out.println(multiset.elementSet());    // print [Hello, World!, All!, Hi] - in predictable iteration order

    // Print count occurrences of words
    System.out.println("Hello = " + multiset.count("Hello"));    // print 2
    System.out.println("World = " + multiset.count("World!"));    // print 2
    System.out.println("All = " + multiset.count("All!"));    // print 1
    System.out.println("Hi = " + multiset.count("Hi"));    // print 1
    System.out.println("Empty = " + multiset.count("Empty"));    // print 0

    // Print count all words
    System.out.println(multiset.size());    //print 6

    // Print count unique words
    System.out.println(multiset.elementSet().size());    //print 4

More examples you can find in my github projects

Solution 9 - Java

I would definitely choose a TreeMap:

  • TreeMap automatically sorts new keys on insertion, no sorting afterwards is needed.
  • When a key already exists it has the same performance as a HashMap.

A TreeSet internally uses a TreeMap so why not use TreeMap directly.

Solution 10 - Java

Depending on what the speed requirements are, you could also use a Trie. But there's no point in implementing one of those if a TreeMap is quick enough.

Solution 11 - Java

consider the frequency of addition or deletion to the data structure. TreeMap would not be ideal if it is high. Apart from the search for existing entry nLn it also undergoes frequent rebalancing.

on the other hand Hash structures are bit flamboyant on memory (over allocates). If you can bite that bullet then go for hash structure and sort when required.

Solution 12 - Java

Here is the java example for reading a text file, sorting based on key, then upon values; depending on the number of occurrence of a words in the file.

public class SortFileWords {

	public static void main(String[] args) {
		HashMap<String, Integer> map = new HashMap<String, Integer>();
		ValueCompare vc = new ValueCompare(map);
		TreeMap<String, Integer> sorted_map = new TreeMap<String, Integer>(map);
		List<String> list = new ArrayList<>();
		Scanner sc;
		try {
			sc = new Scanner(new File("c:\\ReadMe1.txt"));
			while (sc.hasNext()) {
				list.add(sc.next());
			}
			sc.close();
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}

		for (String s : list) {
			if (map.containsKey(s)) {
				map.put(s, map.get(s) + 1);
			} else
				map.put(s, 1);
		}
		
		System.out.println("Unsorted map: " + map);
		sorted_map.putAll(map);
		System.out.println("Sorted map on keys: " + sorted_map);
		
		TreeMap<String, Integer> sorted_value_map = new TreeMap<>(vc);
		sorted_value_map.putAll(map);
		System.out.println("Sorted map on values: " + sorted_value_map);
	}
}

class ValueCompare implements Comparator<String> {

	Map<String, Integer> map;

	public ValueCompare(Map<String, Integer> map) {
		this.map = map;
	}

	@Override
	public int compare(String s1, String s2) {
		if (map.get(s1) >= map.get(s2))
			return -1;
		else
			return 1;
	}
}

Solution 13 - Java

Why not use TreeSet?

Same ordering concept as a TreeMap, except it's a Set - which, by definition, is "A collection that contains no duplicate elements".

From your problem description, it sounds as if you need a Set, I don't see what keys and values you are mapping together.

>This class implements the Set interface, backed by a TreeMap instance. This class guarantees that the sorted set will be in ascending element order, sorted according to the natural order of the elements (see Comparable), or by the comparator provided at set creation time, depending on which constructor is used.

Solution 14 - Java

Basically it depend on the requirement. Sometimes hash map is good sometimes treemap. but hash map is better to use only their is some constraint for overhead to sort it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohnZajView Question on Stackoverflow
Solution 1 - JavaJon SkeetView Answer on Stackoverflow
Solution 2 - JavaJodaStephenView Answer on Stackoverflow
Solution 3 - JavasaurabhView Answer on Stackoverflow
Solution 4 - JavahashloverView Answer on Stackoverflow
Solution 5 - JavaBaluView Answer on Stackoverflow
Solution 6 - JavaericksonView Answer on Stackoverflow
Solution 7 - JavacoderzView Answer on Stackoverflow
Solution 8 - JavaSlava VedeninView Answer on Stackoverflow
Solution 9 - JavaChrisView Answer on Stackoverflow
Solution 10 - JavaCAdakerView Answer on Stackoverflow
Solution 11 - JavaG KumarView Answer on Stackoverflow
Solution 12 - Javahardeep thakurView Answer on Stackoverflow
Solution 13 - Javamatt bView Answer on Stackoverflow
Solution 14 - JavaAnit SinghView Answer on Stackoverflow