xmllint failing to properly query with xpath

XmlXpathXmllint

Xml Problem Overview


I'm trying to query an xml file generated by adium. xmlwf says that it's well formed. By using xmllint's debug option i get the following:

$ xmllint --debug doc.xml
DOCUMENT
version=1.0
encoding=UTF-8
URL=doc.xml
standalone=true
  ELEMENT chat
    default namespace href=http://purl.org/net/ulf/ns/0.4-02
    ATTRIBUTE account
      TEXT
        content[email protected]
    ATTRIBUTE service
      TEXT compact
        content=MSN
    TEXT compact
      content= 
    ELEMENT event
      ATTRIBUTE type

Everything seems to parse just fine. However, when I try to query even the simplest things, I don't get anything:

$ xmllint --xpath '/chat' doc.xml 
XPath set is empty

What's happening? Running that exact same query using xpath returns the correct results (however with no newline between results). Am I doing something wrong or is xmllint just not working properly?

Here's a shorter, anonymized version of the xml that shows the same behavior:

<?xml version="1.0" encoding="UTF-8" ?>
<chat xmlns="http://purl.org/net/ulf/ns/0.4-02" account="[email protected]" service="MSN">
<event type="windowOpened" sender="[email protected]" time="2011-11-22T00:34:43-03:00"></event>
<message sender="[email protected]" time="2011-11-22T00:34:43-03:00" alias="foo"><div><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div></message>
</chat>

Xml Solutions


Solution 1 - Xml

I don't use xmllint, but I think the reason your XPath isn't working is because your doc.xml file is using a default namespace (http://purl.org/net/ulf/ns/0.4-02).

From what I can see, you have 2 options.

A. Use xmllint in shell mode and declare the namespace with a prefix. You can then use that prefix in your XPath.

    xmllint --shell doc.xml
/ > setns x=http://purl.org/net/ulf/ns/0.4-02
/ > xpath /x:chat

B. Use local-name() to match element names.

    xmllint --xpath /*[local-name()='chat']

You may also want to use namespace-uri()='http://purl.org/net/ulf/ns/0.4-02' along with local-name() so you are sure to return exactly what you are intending to return.

Solution 2 - Xml

I realize this question is very old now, but in case it helps someone...

Had the same problem and it was due to the XML having a namespace (and sometimes it was duplicated in various places in the XML). Found it easiest to just remove the namespace before using xmllint:

sed -e 's/xmlns="[^"]*"//g' file.xml | xmllint --xpath "..." -

In my case the XML was UTF-16 so I had to convert to UTF-8 first (for sed):

iconv -f utf16 -t utf8 file.xml | sed -e 's/encoding="UTF-16"?>/encoding="UTF-8"?>/' | sed -e 's/xmlns="[^"]*"//g' | xmllint --xpath "..." -

Solution 3 - Xml

If you're allowed to install powershell in your environment (it's also available for Linux), you can do it like this:

Select-Xml -XPath '/ns:chat' -Namespace $Namespace .\doc.xml | foreach { $_.Node }
   xmlns   : http://purl.org/net/ulf/ns/0.4-02
   account : [email protected]
   service : MSN
   event   : event
   message : message

Of course all the same rules for xpath apply here. To access the text content of a node:

Select-Xml -XPath '/ns:chat/ns:message' -Namespace $Namespace .\doc.xml |foreach {$_.Node.InnerXML }
<div xmlns="http://purl.org/net/ulf/ns/0.4-02"><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div>

Or the content of the sender attribute:

Select-Xml -XPath '/ns:chat/ns:message/@sender' -Namespace $Namespace .\doc.xml |foreach {$_.Node }

#text
-----
[email protected]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionailnlvView Question on Stackoverflow
Solution 1 - XmlDaniel HaleyView Answer on Stackoverflow
Solution 2 - XmlcodesnifferView Answer on Stackoverflow
Solution 3 - XmlJuan Carlos MuñozView Answer on Stackoverflow