How to query XML using namespaces in Java with XPath?

JavaXmlXpathXml Namespaces

Java Problem Overview


When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

But when it looks like this then I can't

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

Any ideas?

Java Solutions


Solution 1 - Java

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
    and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheets'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheet'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

Solution 2 - Java

Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html

One of the conclusions they draw is:

> So, to be able to use XPath > expressions on XML content defined in > a (default) namespace, we need to > specify a namespace prefix mapping

Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet to your default namespace.

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();

// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
    public String getNamespaceURI(String prefix) {
        if (prefix == null) throw new NullPointerException("Null prefix");
        else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }
});

// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");

// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);

And voila...Now you've got your element saved in the result variable.

Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory. Otherwise, this code won't work!

Solution 3 - Java

All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext. Unfortunately, there is no implementation of NamespaceContext provided in the SDK.

Fortunately, it's very easy to write your own:

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;

public class SimpleNamespaceContext implements NamespaceContext {
	
	private final Map<String, String> PREF_MAP = new HashMap<String, String>();
	
	public SimpleNamespaceContext(final Map<String, String> prefMap) {
		PREF_MAP.putAll(prefMap);		
	}
	
	public String getNamespaceURI(String prefix) {
		return PREF_MAP.get(prefix);
	}

	public String getPrefix(String uri) {
		throw new UnsupportedOperationException();
	}

	public Iterator getPrefixes(String uri) {
		throw new UnsupportedOperationException();
	}
	
}

Use it like this:

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
	put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
	put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
		.compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);

Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:

/main:workbook/main:sheets/main:sheet[1]

The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.

Solution 4 - Java

If you are using Spring, it already contains org.springframework.util.xml.SimpleNamespaceContext.

        import org.springframework.util.xml.SimpleNamespaceContext;
        ...

        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        SimpleNamespaceContext nsc = new SimpleNamespaceContext();

        nsc.bindNamespaceUri("a", "http://some.namespace.com/nsContext");
        xpath.setNamespaceContext(nsc);

        XPathExpression xpathExpr = xpath.compile("//a:first/a:second");

        String result = (String) xpathExpr.evaluate(object, XPathConstants.STRING);

Solution 5 - Java

I've written a simple NamespaceContext implementation (here), that takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.

It follows the NamespaceContext spesification, and you can see how it works in the unit tests.

Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");

context = new SimpleNamespaceContext(mappings);

context.getNamespaceURI("foo");    // "http://foo"
context.getPrefix("http://foo");   // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]

Note that it has a dependency on Google Guava

Solution 6 - Java

Make sure that you are referencing the namespace in your XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
             xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
             xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"       >

Solution 7 - Java

Startlingly, if I don't set factory.setNamespaceAware(true); then the xpath you mentioned does work with and without namespaces at play. You just aren't able to select things "with namespace specified" only generic xpaths. Go figure. So this may be an option:

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 factory.setNamespaceAware(false);

Solution 8 - Java

Two things to add to the existing answers:

  • I don't know whether this was the case when you asked the question: With Java 10, your XPath actually works for the second document if you don't use setNamespaceAware(true) on the document builder factory (falseis the default).

  • If you do want to use setNamespaceAware(true), other answers have already shown how to do this using a namespace context. However, you don't need to provide the mapping of prefixes to namespaces yourself, as these answers do: It's already there in the document element, and you can use that for your namespace context:

import java.util.Iterator;

import javax.xml.namespace.NamespaceContext;

import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class DocumentNamespaceContext implements NamespaceContext {
    Element documentElement;

    public DocumentNamespaceContext (Document document) {
        documentElement = document.getDocumentElement();
    }

    public String getNamespaceURI(String prefix) {
        return documentElement.getAttribute(prefix.isEmpty() ? "xmlns" : "xmlns:" + prefix);
    }

    public String getPrefix(String namespaceURI) {
        throw new UnsupportedOperationException();
    }

    public Iterator<String> getPrefixes(String namespaceURI) {
        throw new UnsupportedOperationException();
    }
}

The rest of the code is as in the other answers. Then the XPath /:workbook/:sheets/:sheet[1] yields the sheet element. (You could also use a non-empty prefix for the default namespace, as the other answers do, by replacing prefix.isEmpty() by e.g. prefix.equals("spreadsheet") and using the XPath /spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1].)

P.S.: I just found here that there's actually a method Node.lookupNamespaceURI(String prefix), so you could use that instead of the attribute lookup:

    public String getNamespaceURI(String prefix) {
        return documentElement.lookupNamespaceURI(prefix.isEmpty() ? null : prefix);
    }

Also, note that namespaces can be declared on elements other than the document element, and those wouldn't be recognized (by either version).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionInezView Question on Stackoverflow
Solution 1 - JavaMads HansenView Answer on Stackoverflow
Solution 2 - JavastevevlsView Answer on Stackoverflow
Solution 3 - JavaWayneView Answer on Stackoverflow
Solution 4 - JavakasiView Answer on Stackoverflow
Solution 5 - JavatomajView Answer on Stackoverflow
Solution 6 - JavacordsenView Answer on Stackoverflow
Solution 7 - JavarogerdpackView Answer on Stackoverflow
Solution 8 - JavajorikiView Answer on Stackoverflow