How do I get Python's ElementTree to pretty print to an XML file?

PythonXmlPython 2.6ElementtreePretty Print

Python Problem Overview


Background

I am using SQLite to access a database and retrieve the desired information. I'm using ElementTree in Python version 2.6 to create an XML file with that information.

Code

import sqlite3
import xml.etree.ElementTree as ET

# NOTE: Omitted code where I acccess the database,
# pull data, and add elements to the tree

tree = ET.ElementTree(root)

# Pretty printing to Python shell for testing purposes
from xml.dom import minidom
print minidom.parseString(ET.tostring(root)).toprettyxml(indent = "   ")

#######  Here lies my problem  #######
tree.write("New_Database.xml")

Attempts

I've tried using tree.write("New_Database.xml", "utf-8") in place of the last line of code above, but it did not edit the XML's layout at all - it's still a jumbled mess.

I also decided to fiddle around and tried doing:
tree = minidom.parseString(ET.tostring(root)).toprettyxml(indent = " ")
instead of printing this to the Python shell, which gives the error AttributeError: 'unicode' object has no attribute 'write'.

Questions

When I write my tree to an XML file on the last line, is there a way to pretty print to the XML file as it does to the Python shell?

Can I use toprettyxml() here or is there a different way to do this?

Python Solutions


Solution 1 - Python

Whatever your XML string is, you can write it to the file of your choice by opening a file for writing and writing the string to the file.

from xml.dom import minidom

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml(indent="   ")
with open("New_Database.xml", "w") as f:
    f.write(xmlstr)
    

There is one possible complication, especially in Python 2, which is both less strict and less sophisticated about Unicode characters in strings. If your toprettyxml method hands back a Unicode string (u"something"), then you may want to cast it to a suitable file encoding, such as UTF-8. E.g. replace the one write line with:

f.write(xmlstr.encode('utf-8'))

Solution 2 - Python

I simply solved it with the indent() function:

> xml.etree.ElementTree.indent(tree, space=" ", level=0) Appends > whitespace to the subtree to indent the tree visually. This can be > used to generate pretty-printed XML output. tree can be an Element or > ElementTree. space is the whitespace string that will be inserted for > each indentation level, two space characters by default. For indenting > partial subtrees inside of an already indented tree, pass the initial > indentation level as level.

tree = ET.ElementTree(root)
ET.indent(tree, space="\t", level=0)
tree.write(file_name, encoding="utf-8")

Note, the indent() function was added in Python 3.9.

Solution 3 - Python

I found a way using straight ElementTree, but it is rather complex.

ElementTree has functions that edit the text and tail of elements, for example, element.text="text" and element.tail="tail". You have to use these in a specific way to get things to line up, so make sure you know your escape characters.

As a basic example:

I have the following file:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1">
        <data>76939</data>
    </data>
    <data version="2">
        <data>266720</data>
        <newdata>3569</newdata>
    </data>
</root>

To place a third element in and keep it pretty, you need the following code:

addElement = ET.Element("data")             # Make a new element
addElement.set("version", "3")              # Set the element's attribute
addElement.tail = "\n"                      # Edit the element's tail
addElement.text = "\n\t\t"                  # Edit the element's text
newData = ET.SubElement(addElement, "data") # Make a subelement and attach it to our element
newData.tail = "\n\t"                       # Edit the subelement's tail
newData.text = "5431"                       # Edit the subelement's text
root[-1].tail = "\n\t"                      # Edit the previous element's tail, so that our new element is properly placed
root.append(addElement)                     # Add the element to the tree.

To indent the internal tags (like the internal data tag), you have to add it to the text of the parent element. If you want to indent anything after an element (usually after subelements), you put it in the tail.

This code give the following result when you write it to a file:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1">
        <data>76939</data>
    </data>
    <data version="2">
        <data>266720</data>
        <newdata>3569</newdata>
    </data> <!--root[-1].tail-->
    <data version="3"> <!--addElement's text-->
	    <data>5431</data> <!--newData's tail-->
    </data> <!--addElement's tail-->
</root>

As another note, if you wish to make the program uniformally use \t, you may want to parse the file as a string first, and replace all of the spaces for indentations with \t.

This code was made in Python3.7, but still works in Python2.7.

Solution 4 - Python

Install bs4

pip install bs4

Use this code to pretty print:

from bs4 import BeautifulSoup

x = your xml

print(BeautifulSoup(x, "xml").prettify())

Solution 5 - Python

If one wants to use lxml, it could be done in the following way:

from lxml import etree

xml_object = etree.tostring(root,
                            pretty_print=True,
                            xml_declaration=True,
                            encoding='UTF-8')

with open("xmlfile.xml", "wb") as writter:
    writter.write(xml_object)`

If you see xml namespaces e.g. py:pytype="TREE", one might want to add before the creation of xml_object

etree.cleanup_namespaces(root) 

This should be sufficient for any adaptation in your code.

Solution 6 - Python

Riffing on Ben Anderson answer as a function.

def _pretty_print(current, parent=None, index=-1, depth=0):
    for i, node in enumerate(current):
        _pretty_print(node, current, i, depth + 1)
    if parent is not None:
        if index == 0:
            parent.text = '\n' + ('\t' * depth)
        else:
            parent[index - 1].tail = '\n' + ('\t' * depth)
        if index == len(parent) - 1:
            current.tail = '\n' + ('\t' * (depth - 1))

So running the test on unpretty data:

import xml.etree.ElementTree as ET
root = ET.fromstring('''<?xml version='1.0' encoding='utf-8'?>
<root>
    <data version="1"><data>76939</data>
</data><data version="2">
        <data>266720</data><newdata>3569</newdata>
    </data> <!--root[-1].tail-->
    <data version="3"> <!--addElement's text-->
<data>5431</data> <!--newData's tail-->
    </data> <!--addElement's tail-->
</root>
''')
_pretty_print(root)

tree = ET.ElementTree(root)
tree.write("pretty.xml")
with open("pretty.xml", 'r') as f:
    print(f.read())

We get:

<root>
	<data version="1">
		<data>76939</data>
	</data>
	<data version="2">
		<data>266720</data>
		<newdata>3569</newdata>
	</data>
	<data version="3">
		<data>5431</data>
	</data>
</root>

Solution 7 - Python

One liner(*) to read, parse (once) and pretty print XML from file named fname:

from xml.dom import minidom
print(minidom.parseString(open(fname).read()).toprettyxml(indent="  "))

(* not counting import)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKimblueyView Question on Stackoverflow
Solution 1 - PythonJonathan EuniceView Answer on Stackoverflow
Solution 2 - PythonRafal.PyView Answer on Stackoverflow
Solution 3 - PythonBen AndersonView Answer on Stackoverflow
Solution 4 - PythonRJXView Answer on Stackoverflow
Solution 5 - PythonNickView Answer on Stackoverflow
Solution 6 - PythonTatarizeView Answer on Stackoverflow
Solution 7 - PythonqneillView Answer on Stackoverflow