Why Java Collection Framework doesn't contain Tree and Graph
JavaCollectionsGraphTreeJava Problem Overview
I am familiar with Java Collection Framework which contains basic interfaces: Collection
and Map
. I am wondering why the Framework doesn't contain structures as Tree and Graph which are basic collections. Both can be regarded as sub types of Collection
.
By the way, I know TreeSet
is implemented by Red-Black Tree underlying. However, the TreeSet
is not a Tree but a Set
, so there's no real Tree in the framework.
Java Solutions
Solution 1 - Java
> I am wondering why the Framework doesn't contain structures as Tree and Graph which are basic collections. Both can be regarded as sub types of Collection
.
This is a good question. I think it simply boils down to scoping. The core features that Collections API provides classes for are:
-
iteration order: Lists and sorted maps have specified iteration order, most sets don't.
-
duplicates: Lists allow duplicates, sets do not
-
index: List values are indexed by integers, map values are indexed by other objects.
This gets us very far and I assume Joshua Bloch et al argued that more feature rich collections (graphs and trees which require internal relationship between elements, sets with multiplicity, bi-directional maps, ...) can be implemented on top of these three core features, and are thus better off in libraries.
Solution 2 - Java
The java.util
package contains data structures to organize data of any kind. It deals basically with abstract data structures (like List
, Set
, Map
) which are defined via their methods and behavior (e.g. a Set does contain no elements twice, a List maintains order and allows duplicates, etc.).
You as a developer are free to choose which implementation of these data structures are best suited for the kind of data you deal with (HashSet vs. TreeSet / LinkedList vs. ArrayList / etc.). For example for Sets and Maps you may choose between hash-based implementations and tree-based implementations, which may or may not be suited for what you want to do (in most cases a hash-based implementation will be the best choice, while sometimes, when order is important, a tree may be better suited for your needs - See also HashSet vs TreeSet (here at Stackoverflow)).
If you are thinking of a Tree as a special kind of Graph (which it is), then you’re interested in specific properties which apply to graphs, not to collections in general (which, essentially, are lists, and are in turn used to implement things like graphs).
As mentioned in this thread, if you’re interested in modeling Graphs, there are plenty of choices for Graph libraries. Personally I can recommend JGraphT.
I don’t know why there is no graph library within the JDK (and I don’t know whether that’s a good thing to ask?), but I guess Sun decided to leave this to developers, since most applications that require graphs also require very unique implementations.
Solution 3 - Java
I suspect that the answer is that it is a combination of two things:
- A generic tree or graph interface would be "feature poor".
- It is easier and more efficient to implement a tree or graph using fields to represent child and (if you need them) parent pointers.
Note that neither Apache commons or Google commons have generic graph or tree support. However, I did come across a couple of generic tree/graph hierarchies:
- The jsfcompounds project on JavaNet has the "com.truchsess.util" graph and tree framework.
- The OpenJGraph project (inactive) on SourceForge includes tree and graph libraries.
Solution 4 - Java
Original answer
The main problem with graphs (and more concretely trees, as you know, a special case where there's a one only root, there can be no loops and all nodes are connected) is that each item in that supposed collection should be hold in another structure: the node. It's tough to make standard that kind of treatment as it requires a double layer of "containers".
If anyone ever happens to work with the TreeModel
for JTree
would notice that is almost impossible to avoid the fact that there are TreeNode
s behind (inside?) and you have to manipulate both, nodes and trees.
Anyway, I agree that the structure would be useful but it's very hard to make it standard, just notice that, for instance, neither Apache Commons Collections nor Google Guava, two big collection extensions for the API, don't have them either.
Update
According to the overview of the API:
> The main design goal was to produce an API that was reasonably small, > both in size, and, more importantly, in "conceptual weight." It was > critical that the new functionality not seem alien to current Java > programmers; it had to augment current facilities, rather than > replacing them. At the same time, the new API had to be powerful > enough to provide all the advantages described above.
So while graphs are proper collections and it'd be maybe possible to make it standard, the size and complexity of the implementation would be too big to fit this design goal, because of the previously explained need of a double layer of containers.
Also, Google Guava has added support for graphs. Apache had a sandbox (development in progress) for graphs but seems dormant since 2011.
Solution 5 - Java
I am afraid the answer is: it's too troublesome to design and maintain a generic tree structure and interfaces (see answer to this post). Most users will need the performance of tree-based implementations of lists and sets, but will not be concerned with the internals, so most of them are hidden.
Solution 6 - Java
There is an interface for trees - javax.swing.tree.TreeModel
, which in fact works for arbitrary directed graphs (with a distinguished "root" node). It does not implement the Collection
interface, though (since Swing is a bit older than the collection framework, and it is not really clear that being a collection would really be appropriate here).
(One implementation of TreeModel is DefaultTreeModel
, which uses a tree build from TreeNode
objects, which is itself an interface implementable by users.)
Another type of trees are given by the XML-DOM frameworks.
Some specific use trees (or mostly "tree nodes") are defined by java.io.File
, java.awt.Component
(and subclasses), the Compiler tree API (com.sun.source.tree.*
), the Doclet API (com.sun.javadoc.*
), the Reflection API (java.lang.Class
), the language model API (javax.lang.**
).
If you compare these API
The problem is, there is no clear general-purpose useful interface for trees - what should such a tree be able to do? Is a tree simply a collection of other trees (those one level lower), or simply a pointer to the root node, where each node itself contains more nodes? Or is a tree a collection of all the nodes? Or a collection of all the contents of all the nodes? Do all nodes have to have "content", or only the leaf nodes? If a tree were a collection of some content elements (and not the nodes itself), how should an iterator behave? What would be the size of a tree?
Here is a proposal for a tree node (or in fact it could be a general directed graph node, if not for the parent-pointer) interface:
/**
* A general interface for a tree node.
* @param <N> the concrete node type.
*/
public interface Node<N extends Node<N>> {
/**
* equivalent to {@link #children children()}.{@link Collection#isEmpty isEmpty()}.
* @returns true, if this is a leaf node, else false.
*/
public boolean isLeaf();
/**
* returns a collection of all children of this node.
* This collection can be changed, if this node is mutable.
*/
public Collection<N> children();
/**
* returns the parent of this node.
* @return null, if this is the root node, or the node does not know its parent, or has multiple parents.
*/
public N parent();
}
/**
* a tree node with content objects.
* @param <N> the concrete node type
* @param <C> the type of the content of this node.
*/
public interface ContentNode<C,N extends ContentNode<C,N>>
extends Node<N>
{
/**
* returns the content of this node, if any.
*/
public C content();
}
Would this be enough for all types of trees, for example the ones listed above? I don't think so.