xmllint failing to properly query with xpath
XmlXpathXmllintXml Problem Overview
I'm trying to query an xml file generated by adium. xmlwf says that it's well formed. By using xmllint's debug option i get the following:
$ xmllint --debug doc.xml
DOCUMENT
version=1.0
encoding=UTF-8
URL=doc.xml
standalone=true
ELEMENT chat
default namespace href=http://purl.org/net/ulf/ns/0.4-02
ATTRIBUTE account
TEXT
content[email protected]
ATTRIBUTE service
TEXT compact
content=MSN
TEXT compact
content=
ELEMENT event
ATTRIBUTE type
Everything seems to parse just fine. However, when I try to query even the simplest things, I don't get anything:
$ xmllint --xpath '/chat' doc.xml
XPath set is empty
What's happening? Running that exact same query using xpath returns the correct results (however with no newline between results). Am I doing something wrong or is xmllint just not working properly?
Here's a shorter, anonymized version of the xml that shows the same behavior:
<?xml version="1.0" encoding="UTF-8" ?>
<chat xmlns="http://purl.org/net/ulf/ns/0.4-02" account="[email protected]" service="MSN">
<event type="windowOpened" sender="[email protected]" time="2011-11-22T00:34:43-03:00"></event>
<message sender="[email protected]" time="2011-11-22T00:34:43-03:00" alias="foo"><div><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div></message>
</chat>
Xml Solutions
Solution 1 - Xml
I don't use xmllint, but I think the reason your XPath isn't working is because your doc.xml file is using a default namespace (http://purl.org/net/ulf/ns/0.4-02
).
From what I can see, you have 2 options.
A. Use xmllint in shell mode and declare the namespace with a prefix. You can then use that prefix in your XPath.
xmllint --shell doc.xml
/ > setns x=http://purl.org/net/ulf/ns/0.4-02
/ > xpath /x:chat
B. Use local-name()
to match element names.
xmllint --xpath /*[local-name()='chat']
You may also want to use namespace-uri()='http://purl.org/net/ulf/ns/0.4-02'
along with local-name()
so you are sure to return exactly what you are intending to return.
Solution 2 - Xml
I realize this question is very old now, but in case it helps someone...
Had the same problem and it was due to the XML having a namespace (and sometimes it was duplicated in various places in the XML). Found it easiest to just remove the namespace before using xmllint:
sed -e 's/xmlns="[^"]*"//g' file.xml | xmllint --xpath "..." -
In my case the XML was UTF-16 so I had to convert to UTF-8 first (for sed):
iconv -f utf16 -t utf8 file.xml | sed -e 's/encoding="UTF-16"?>/encoding="UTF-8"?>/' | sed -e 's/xmlns="[^"]*"//g' | xmllint --xpath "..." -
Solution 3 - Xml
If you're allowed to install powershell in your environment (it's also available for Linux), you can do it like this:
Select-Xml -XPath '/ns:chat' -Namespace $Namespace .\doc.xml | foreach { $_.Node }
xmlns : http://purl.org/net/ulf/ns/0.4-02
account : [email protected]
service : MSN
event : event
message : message
Of course all the same rules for xpath apply here. To access the text content of a node:
Select-Xml -XPath '/ns:chat/ns:message' -Namespace $Namespace .\doc.xml |foreach {$_.Node.InnerXML }
<div xmlns="http://purl.org/net/ulf/ns/0.4-02"><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div>
Or the content of the sender attribute:
Select-Xml -XPath '/ns:chat/ns:message/@sender' -Namespace $Namespace .\doc.xml |foreach {$_.Node }
#text
-----
[email protected]