What's the difference between PHP's DOM and SimpleXML extensions?

PhpSimplexmlDomdocument

Php Problem Overview


I'm failing to comprehend why do we need 2 XML parsers in PHP.

Can someone explain the difference between those two?

Php Solutions


Solution 1 - Php

In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to

Solution 2 - Php

I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.


DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one object to another.

SimpleXML is intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some values or remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.

DOM offers a full-fledged implementation of the DOM plus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitation in what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complex and more verbose than SimpleXML.


Side-note

People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:

  • if you need to deal with HTML, you don't really have a choice: you have to use DOM
  • if you have to do anything fancy such as moving nodes or appending some raw XML, again you pretty much have to use DOM
  • if all you need to do is read and/or write some basic XML (e.g. exchanging data with an XML service or reading a RSS feed) then you can use either. Or both.
  • if your XML document is so big that it doesn't fit in memory, you can't use either and you have to use XMLReader which is also based on libxml2, is even more annoying to use but still plays nice with others

TL;DR

  • SimpleXML is super easy to use but only good for 90% of use cases.
  • DOM is more complex, but can do everything.
  • XMLReader is super complicated, but uses very little memory. Very situational.

Solution 3 - Php

As others have pointed out, the DOM and SimpleXML extensions are not strictly "XML parsers", rather they are different interfaces to the structure generated by the underlying libxml2 parser.

The SimpleXML interface treats XML as a serialized data structure, in the same way you would treat a decoded JSON string. So it provides quick access to the contents of a document, with emphasis on accessing elements by name, and reading their attributes and text content (including automatically folding in entities and CDATA sections). It supports documents containing multiple namespaces (primarily using the children() and attributes() methods), and can search a document using an XPath expression. It also includes support for basic manipulation of the content - e.g. adding or overwriting elements or attributes with a new string.

The DOM interface, on the other hand, treats XML as a structured document, where the representation used is as important as the data represented. It therefore provides much more granular and explicit access to different types of "node", such as entities and CDATA sections, as well as some which are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing you to rearrange nodes and choose how to represent text content, for instance. The tradeoff is a fairly complex API, with a large number of classes and methods; since it implements a standard API (originally developed for manipulating HTML in JavaScript), there may be less of a "natural PHP" feel, but some programmers may be familiar with it from other contexts.

Both interfaces require the full document to be parsed into memory, and effectively wrap up pointers into that parsed representation; you can even switch between the two wrappers with simplexml_import_dom() and dom_import_simplexml(), for instance to add a "missing" feature to SimpleXML using a function from the DOM API. For larger documents, the "pull-based" XMLReader or the "event-based" XML Parser may be more appropriate.

Solution 4 - Php

Which DOMNodes can be represented by SimpleXMLElement?

The biggest difference between the two libraries is that SimpleXML is mainly a single class: SimpleXMLElement. In contrast, the DOM extension has many classes, most of them a subtype of DOMNode.

So one core question when comparing those two libraries is which of the many classes DOM offers can be represented by a SimpleXMLElement in the end?

The following is a comparison table containing those DOMNode types that are actually useful as long as dealing with XML is concerned (useful node types). Your mileage may vary, e.g. when you need to deal with DTDs for example:

+-------------------------+----+--------------------------+-----------+
| LIBXML Constant         |  # | DOMNode Classname        | SimpleXML |
+-------------------------+----+--------------------------+-----------+
| XML_ELEMENT_NODE        |  1 | DOMElement               |    yes    |
| XML_ATTRIBUTE_NODE      |  2 | DOMAttr                  |    yes    |
| XML_TEXT_NODE           |  3 | DOMText                  |  no [1]   |
| XML_CDATA_SECTION_NODE  |  4 | DOMCharacterData         |  no [2]   |
| XML_PI_NODE             |  7 | DOMProcessingInstruction |    no     |
| XML_COMMENT_NODE        |  8 | DOMComment               |    no     |
| XML_DOCUMENT_NODE       |  9 | DOMDocument              |    no     |
| XML_DOCUMENT_FRAG_NODE  | 11 | DOMDocumentFragment      |    no     |
+-------------------------+----+--------------------------+-----------+

As this table shows, SimpleXML has really limited interfaces compared to DOM. Next to the ones in the table, SimpleXMLElement also abstracts access to children and attribute lists as well as it provides traversal via element names (property access), attributes (array access) as well as being a Traversable iterating it's "own" children (elements or attributes) and offering namespaced access via the children() and attributes() methods.

As long as all this magic interface it's fine, however it can not be changed by extending from SimpleXMLElement, so as magic as it is, as limited it is as well.

To find out which nodetype a SimpleXMLElement object represents, please see:

DOM follows here the DOMDocument Core Level 1 specs. You can do nearly every imaginable XML handling with that interface. However it's only Level 1, so compared with modern DOMDocument Levels like 3, it's somewhat limited for some cooler stuff. Sure SimpleXML has lost here as well.

SimpleXMLElement allows casting to subtypes. This is very special in PHP. DOM allows this as well, albeit it's a little bit more work and a more specific nodetype needs to be chosen.

XPath 1.0 is supported by both, the result in SimpleXML is an array of SimpleXMLElements, in DOM a DOMNodelist.

SimpleXMLElement supports casting to string and array (json), the DOMNode classes in DOM do not. They offer casting to array, but only like any other object does (public properties as keys/values).

Common usage patterns of those two extensions in PHP are:
  • You normally start to use SimpleXMLElement. Your level of knowledge about XML and XPath is on an equally low level.
  • After fighting with the magic of its interfaces, a certain level of frustration is reached sooner or later.
  • You discover that you can import SimpleXMLElements into DOM and vice-versa. You learn more about DOM and how to use the extension to do stuff you were not able (or not able to find out how) to do with SimpleXMLElement.
  • You notice that you can load HTML documents with the DOM extension. And invalid XML. And do output formatting. Things SimpleXMLElement just can't do. Not even with the dirty tricks.
  • You probably even switch to DOM extension fully because at least you know that the interface is more differentiated and allows you to do stuff. Also you see a benefit in learning the DOM Level 1 because you can use it as well in Javascript and other languages (a huge benefit of DOM extension for many).

You can have fun with both extensions and I think you should know both. The more the better. All the libxml based extensions in PHP are very good and powerful extensions. And on Stackoverflow under the [tag:php] tag there is a good tradition to cover these libraries well and also with detailed information.

Solution 5 - Php

SimpleXML is, as name states, simple parser for XML content, and nothing else. You cannot parse, let's say standard html content. It's easy and quick, and therefore a great tool for creating simple applications.

DOM extension, on other side, is much more powerful. It enables you to parse almost any DOM document, including html, xhtml, xml. It enables you to open, write and even correct output code, supports xpath and overall more manipulation. Therefore, its usage is much more complicated, because library is quite complex, and that makes it a perfect tool for bigger projects where heavy data manipulation is needed.

Hope that answers your question :)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionStannView Question on Stackoverflow
Solution 1 - PhpGordonView Answer on Stackoverflow
Solution 2 - PhpJosh DavisView Answer on Stackoverflow
Solution 3 - PhpIMSoPView Answer on Stackoverflow
Solution 4 - PhphakreView Answer on Stackoverflow
Solution 5 - PhpusobanView Answer on Stackoverflow