Grep and Sed Equivalent for XML Command Line Processing

XmlCommand LineScripting

Xml Problem Overview


When doing shell scripting, typically data will be in files of single line records like csv. It's really simple to handle this data with grep and sed. But I have to deal with XML often, so I'd really like a way to script access to that XML data via the command line. What are the best tools?

Xml Solutions


Solution 1 - Xml

I've found xmlstarlet to be pretty good at this sort of thing.

http://xmlstar.sourceforge.net/

Should be available in most distro repositories, too. An introductory tutorial is here:

http://www.ibm.com/developerworks/library/x-starlet.html

Solution 2 - Xml

Some promising tools:

  • nokogiri: parsing HTML/XML DOMs in ruby using XPath & CSS selectors

  • hpricot: deprecated

  • fxgrep: Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.

  • LT XML: XML toolkit derived from SGML tools, including sggrep, sgsort, xmlnorm and others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.

  • xmlgrep2: simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.

  • XQSharp: Supports XQuery, the extension to XPath. Written for the .NET Framework.

  • xml-coreutils: Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.

  • xmldiff: Simple tool for comparing two xml files.

  • xmltk: doesn't seem to have package in debian, ubuntu, fedora, or macports, hasn't had a release since 2007, and uses non-portable build automation.

xml-coreutils seems the best documented and most UNIX-oriented.

Solution 3 - Xml

There is also xml2 and 2xml pair. It will allow usual string editing tools to process XML.

Example. q.xml:

<?xml version="1.0"?>
<foo>
	text
	more text
	<textnode>ddd</textnode><textnode a="bv">dsss</textnode>
	<![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>

xml2 < q.xml

/foo=
/foo=	text
/foo=	more text
/foo=	
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo=	 asfdasdsa <foo> sdfsdfdsf <bar> 
/foo=

xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml

<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>

P.S. There are also html2 / 2html.

Solution 4 - Xml

To Joseph Holsten's excellent list, I add the xpath command-line script which comes with Perl library XML::XPath. A great way to extract information from XML files:

 xpath -q -e '/entry[@xml:lang="fr"]' *xml

Solution 5 - Xml

You can use xmllint:

xmllint --xpath //title books.xml

Should be bundled with most distros, and is also bundled with Cygwin.

$ xmllint --version
xmllint: using libxml version 20900

See:

$ xmllint
Usage : xmllint [options] XMLfiles ...
        Parse the XML files and output the result of the parsing
        --version : display the version of the XML library used
        --debug : dump a debug tree of the in-memory document
        ...
        --schematron schema : do validation against a schematron
        --sax1: use the old SAX1 interfaces for processing
        --sax: do not build a tree but work just at the SAX level
        --oldxml10: use XML-1.0 parsing rules before the 5th edition
        --xpath expr: evaluate the XPath expression, inply --noout

Solution 6 - Xml

If you're looking for a solution on Windows, Powershell has built-in functionality for reading and writing XML.

test.xml:

<root>
  <one>I like applesauce</one>
  <two>You sure bet I do!</two>
</root>

Powershell script:

# load XML file into local variable and cast as XML type.
$doc = [xml](Get-Content ./test.xml)

$doc.root.one                                   #echoes "I like applesauce"
$doc.root.one = "Who doesn't like applesauce?"  #replace inner text of <one> node

# create new node...
$newNode = $doc.CreateElement("three")
$newNode.set_InnerText("And don't you forget it!")

# ...and position it in the hierarchy
$doc.root.AppendChild($newNode)

# write results to disk
$doc.save("./testNew.xml")

testNew.xml:

<root>
  <one>Who likes applesauce?</one>
  <two>You sure bet I do!</two>
  <three>And don't you forget it!</three>
</root>

Source: https://serverfault.com/questions/26976/update-xml-from-the-command-line-windows

Solution 7 - Xml

There're also xmlsed & xmlgrep of the NetBSD xmltools!

http://blog.huoc.org/xmltools-not-dead.html

Solution 8 - Xml

Depends on exactly what you want to do.

XSLT may be the way to go, but there is a learning curve. Try xsltproc and note that you can hand in parameters.

Solution 9 - Xml

There's also saxon-lint from command line with the ability to use XPath 3.0/XQuery 3.0. (Other command-line tools use XPath 1.0).

EXAMPLES :

http/html:

$ saxon-lint --html --xpath 'count(//a)' http://stackoverflow.com/q/91791
328

xml :

$ saxon-lint --xpath '//a[@class="x"]' file.xml

Solution 10 - Xml

D. Bohdan maintains an open source GitHub repo that keeps a list of command line tools for structured text tools, there a section for XML/HTML tools:

https://github.com/dbohdan/structured-text-tools#xml-html

Solution 11 - Xml

XQuery might be a good solution. It is (relatively) easy to learn and is a W3C standard.

I would recommend XQSharp for a command line processor.

Solution 12 - Xml

I first used xmlstarlet and still using it. When the query gets tough, i need XML's xpath2 and xquery feature support I turn to xidel http://www.videlibri.de/xidel.html

Solution 13 - Xml

Grep Equivalent

You can define a bash function, say "xp" ("xpath") that wraps some python3 code. To use it you need to install python3 and python-lxml. Benefits:

  1. regex matching which you lack in e.g. xmllint.
  2. Use as a filter (in a pipe) on the commandline

It's easy and powerful to use like this:

xmldoc=$(cat <<EOF
<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/">programming</job>
EOF
)
selection='//*[namespace-uri()="http://www.sample.com/" and local-name()="job" and re:test(.,"^pro.*ing$")]/text()'
echo "$xmldoc" | xp "$selection"
# prints programming

xp() looks something like this:

xp()
{ 
local selection="$1";
local xmldoc;
if ! [[ -t 0 ]]; then
    read -rd '' xmldoc;
else
    xmldoc="$2";
fi;
python3 <(printf '%b' "from lxml.html import tostring\nfrom lxml import etree\nfrom sys import stdin\nregexpNS = \"http://exslt.org/regular-expressions\"\ntree = etree.parse(stdin)\nfor e in tree.xpath('""$selection""', namespaces={'re':regexpNS}):\n  if isinstance(e, str):\n    print(e)\n  else:\n    print(tostring(e).decode('UTF-8'))") <<< "$xmldoc"
}

Sed Equivalent

Consider using xq which gives you the full power of the jq "programming language". If you have python-pip installed, you can install xq with pip install yq, then in below example we are replacing "Keep Accounts" with "Keep Accounts 2":

xmldoc=$(cat <<'EOF'
<resources>
    <string name="app_name">Keep Accounts</string>
    <string name="login">"login"</string>
    <string name="login_password">"password:"</string>
    <string name="login_account_hint">input to login</string>
    <string name="login_password_hint">input your password</string>
    <string name="login_fail">login failed</string>
</resources>
EOF
)
echo "$xmldoc" | xq '.resources.string = ([.resources.string[]|select(."#text" == "Keep Accounts") ."#text" = "Keep Accounts 2"])' -x

Solution 14 - Xml

JEdit has a plugin called "XQuery" which provides querying functionality for XML documents.

Not quite the command line, but it works!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoseph HolstenView Question on Stackoverflow
Solution 1 - XmlRussView Answer on Stackoverflow
Solution 2 - XmlJoseph HolstenView Answer on Stackoverflow
Solution 3 - XmlVi.View Answer on Stackoverflow
Solution 4 - XmlbortzmeyerView Answer on Stackoverflow
Solution 5 - XmlDave JarvisView Answer on Stackoverflow
Solution 6 - XmlClayView Answer on Stackoverflow
Solution 7 - XmltaggoView Answer on Stackoverflow
Solution 8 - XmlAdrian MouatView Answer on Stackoverflow
Solution 9 - XmlGilles QuenotView Answer on Stackoverflow
Solution 10 - XmlDevyView Answer on Stackoverflow
Solution 11 - XmlOliver HallamView Answer on Stackoverflow
Solution 12 - XmleigenfieldView Answer on Stackoverflow
Solution 13 - Xmlmethuselah-0View Answer on Stackoverflow
Solution 14 - XmlBenView Answer on Stackoverflow