Pretty-printing output from javax.xml.transform.Transformer with only standard java api (Indentation and Doctype positioning)

JavaXmlPretty Print

Java Problem Overview


Using the following simple code:

package test;

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class TestOutputKeys {
	public static void main(String[] args) throws TransformerException {

		// Instantiate transformer input
		Source xmlInput = new StreamSource(new StringReader(
				"<!-- Document comment --><aaa><bbb/><ccc/></aaa>"));
		StreamResult xmlOutput = new StreamResult(new StringWriter());

		// Configure transformer
		Transformer transformer = TransformerFactory.newInstance()
				.newTransformer(); // An identity transformer
		transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "testing.dtd");
		transformer.setOutputProperty(OutputKeys.INDENT, "yes");
		transformer.transform(xmlInput, xmlOutput);

		System.out.println(xmlOutput.getWriter().toString());
	}

}

I get the output:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Document comment --><!DOCTYPE aaa SYSTEM "testing.dtd">

<aaa>
<bbb/>
<ccc/>
</aaa>

Question A: The doctype tag appears after the document comment. Is it possible to make it appear before the document comment?

Question B: How do I achieve indentation, using only the JavaSE 5.0 API? This question is essentially identical to https://stackoverflow.com/questions/139076/how-to-pretty-print-xml-from-java">How to pretty-print xml from java, however almost all answers in that question depend on external libraries. The only applicable answer (posted by a user named Lorenzo Boccaccia) which only uses java's api, is basically equal to the code posted above, but does not work for me (as shown in the output, i get no indentation).

I am guessing that you have to set the amount of spaces to use for indentation, as many of the answers with external libraries do, but I just cannot find where to specify that in the java api. Given the fact that the possibility to set an indentation property to "yes" exists in the java api, it must be possible to perform indentation somehow. I just can't figure out how.

Java Solutions


Solution 1 - Java

The missing part is the amount to indent. You can set the indentation and indent amount as follow:

transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.transform(xmlInput, xmlOutput);

Solution 2 - Java

A little util class as an example...

import org.apache.xml.serialize.XMLSerializer;

public class XmlUtil {

public static Document file2Document(File file) throws Exception {
	if (file == null || !file.exists()) {
		throw new IllegalArgumentException("File must exist![" + file == null ? "NULL"
		        : ("Could not be found: " + file.getAbsolutePath()) + "]");
	}
	DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
	dbFactory.setNamespaceAware(true);
	return dbFactory.newDocumentBuilder().parse(new FileInputStream(file));
}

public static Document string2Document(String xml) throws Exception {
	InputSource src = new InputSource(new StringReader(xml));
	DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
	dbFactory.setNamespaceAware(true);
	return dbFactory.newDocumentBuilder().parse(src);
}

public static OutputFormat getPrettyPrintFormat() {
	OutputFormat format = new OutputFormat();
	format.setLineWidth(120);
	format.setIndenting(true);
	format.setIndent(2);
	format.setEncoding("UTF-8");
	return format;
}

public static String document2String(Document doc, OutputFormat format) throws Exception {
	StringWriter stringOut = new StringWriter();
	XMLSerializer serial = new XMLSerializer(stringOut, format);
	serial.serialize(doc);
	return stringOut.toString();
}

public static String document2String(Document doc) throws Exception {
	return XmlUtil.document2String(doc, XmlUtil.getPrettyPrintFormat());
}

public static void document2File(Document doc, File file) throws Exception {
	XmlUtil.document2String(doc, XmlUtil.getPrettyPrintFormat());
}

public static void document2File(Document doc, File file, OutputFormat format) throws Exception {
	XMLSerializer serializer = new XMLSerializer(new FileOutputStream(file), format);
	serializer.serialize(doc);
}
}

XMLserializer is provided by xercesImpl from the Apache Foundation. Here is the maven dependency:

<dependency>
    <groupId>xerces</groupId>
	<artifactId>xercesImpl</artifactId>
    <version>2.11.0</version>
</dependency>

You can find the dependency for your favourite build tool here: http://mvnrepository.com/artifact/xerces/xercesImpl/2.11.0.

Solution 3 - Java

You could probably prettify everything with an XSLT file. Google throws up a few results, but I can't comment on their correctness.

Solution 4 - Java

To make the output a valid XML document, NO. A valid XML document must start with a processing instruction. See the XML specification http://www.w3.org/TR/REC-xml/#sec-prolog-dtd for more details.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlderathView Question on Stackoverflow
Solution 1 - JavaRich SellerView Answer on Stackoverflow
Solution 2 - JavaRobView Answer on Stackoverflow
Solution 3 - JavaMcDowellView Answer on Stackoverflow
Solution 4 - JavaOskarView Answer on Stackoverflow