Java: Most efficient method to iterate over all elements in a org.w3c.dom.Document?

JavaXmlDomIteration

Java Problem Overview


What is the most efficient way to iterate through all DOM elements in Java?

Something like this but for every single DOM elements on current org.w3c.dom.Document?

for(Node childNode = node.getFirstChild(); childNode!=null;){
    Node nextChild = childNode.getNextSibling();
    // Do something with childNode, including move or delete...
    childNode = nextChild;
}

Java Solutions


Solution 1 - Java

Basically you have two ways to iterate over all elements:

1. Using recursion (the most common way I think):

public static void main(String[] args) throws SAXException, IOException,
	    ParserConfigurationException, TransformerException {

	DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
		.newInstance();
	DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
	Document document = docBuilder.parse(new File("document.xml"));
	doSomething(document.getDocumentElement());
}

public static void doSomething(Node node) {
	// do something with the current node instead of System.out
	System.out.println(node.getNodeName());

	NodeList nodeList = node.getChildNodes();
	for (int i = 0; i < nodeList.getLength(); i++) {
	    Node currentNode = nodeList.item(i);
	    if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
		    //calls this method for all the children which is Element
		    doSomething(currentNode);
	    }
	}
}

2. Avoiding recursion using getElementsByTagName() method with * as parameter:

public static void main(String[] args) throws SAXException, IOException,
        ParserConfigurationException, TransformerException {

	DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
	        .newInstance();
	DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
	Document document = docBuilder.parse(new File("document.xml"));
	
	NodeList nodeList = document.getElementsByTagName("*");
	for (int i = 0; i < nodeList.getLength(); i++) {
        Node node = nodeList.item(i);
        if (node.getNodeType() == Node.ELEMENT_NODE) {
        	// do something with the current element
			System.out.println(node.getNodeName());
        }
    }
}

I think these ways are both efficient.
Hope this helps.

Solution 2 - Java

for (int i = 0; i < nodeList.getLength(); i++)

change to

for (int i = 0, len = nodeList.getLength(); i < len; i++)

to be more efficient.

The second way of javanna answer may be the best as it tends to use a flatter, predictable memory model.

Solution 3 - Java

I also stumbled over this problem recently. Here is my solution. I wanted to avoid recursion, so I used a while loop.

Because of the adds and removes in arbitrary places on the list, I went with the LinkedList implementation.

/* traverses tree starting with given node */
  private static List<Node> traverse(Node n)
  {
    return traverse(Arrays.asList(n));
  }

  /* traverses tree starting with given nodes */
  private static List<Node> traverse(List<Node> nodes)
  {
    List<Node> open = new LinkedList<Node>(nodes);
    List<Node> visited = new LinkedList<Node>();

    ListIterator<Node> it = open.listIterator();
    while (it.hasNext() || it.hasPrevious())
    {
      Node unvisited;
      if (it.hasNext())
        unvisited = it.next();
      else
        unvisited = it.previous();

      it.remove();

      List<Node> children = getChildren(unvisited);
      for (Node child : children)
        it.add(child);

      visited.add(unvisited);
    }

    return visited;
  }

  private static List<Node> getChildren(Node n)
  {
    List<Node> children = asList(n.getChildNodes());
    Iterator<Node> it = children.iterator();
    while (it.hasNext())
      if (it.next().getNodeType() != Node.ELEMENT_NODE)
        it.remove();
    return children;
  }

  private static List<Node> asList(NodeList nodes)
  {
    List<Node> list = new ArrayList<Node>(nodes.getLength());
    for (int i = 0, l = nodes.getLength(); i < l; i++)
      list.add(nodes.item(i));
    return list;
  }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKJWView Question on Stackoverflow
Solution 1 - JavajavannaView Answer on Stackoverflow
Solution 2 - JavaAndrewView Answer on Stackoverflow
Solution 3 - JavamikeView Answer on Stackoverflow