Best way to compare 2 XML documents in Java

JavaXmlTestingParsingComparison

Java Problem Overview


I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.

When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)

I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.

So, boiled down, the question is:

Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

Java Solutions


Solution 1 - Java

Sounds like a job for XMLUnit

Example:

public class SomeTest extends XMLTestCase {
  @Test
  public void test() {
    String xml1 = ...
    String xml2 = ...

    XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences

    // can also compare xml Documents, InputSources, Readers, Diffs
    assertXMLEqual(xml1, xml2);  // assertXMLEquals comes from XMLTestCase
  }
}

Solution 2 - Java

The following will check if the documents are equal using standard JDK libraries.

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();

Document doc1 = db.parse(new File("file1.xml")); doc1.normalizeDocument();

Document doc2 = db.parse(new File("file2.xml")); doc2.normalizeDocument();

Assert.assertTrue(doc1.isEqualNode(doc2));

normalize() is there to make sure there are no cycles (there technically wouldn't be any)

The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:space if that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

Solution 3 - Java

Xom has a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.

This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

Solution 4 - Java

The latest version of XMLUnit can help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace() and XMLUnit.setIgnoreAttributeOrder() may be necessary to the case in question.

See working code of a simple example of XML Unit use below.

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;

public class TestXml {
	
	public static void main(String[] args) throws Exception {
		String result = "<abc             attr=\"value1\"                title=\"something\">            </abc>";
		// will be ok
		assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
	}
	
	public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
		XMLUnit.setIgnoreWhitespace(true);
        XMLUnit.setIgnoreAttributeOrder(true);

		DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));
		
		List<?> allDifferences = diff.getAllDifferences();
		Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
	}

}

If using Maven, add this to your pom.xml:

<dependency>
	<groupId>xmlunit</groupId>
	<artifactId>xmlunit</artifactId>
	<version>1.4</version>
</dependency>

Solution 5 - Java

Building on Tom's answer, here's an example using XMLUnit v2.

It uses these maven dependencies

	<dependency>
		<groupId>org.xmlunit</groupId>
		<artifactId>xmlunit-core</artifactId>
		<version>2.0.0</version>
		<scope>test</scope>
	</dependency>
	<dependency>
		<groupId>org.xmlunit</groupId>
		<artifactId>xmlunit-matchers</artifactId>
		<version>2.0.0</version>
		<scope>test</scope>
	</dependency>

..and here's the test code

import static org.junit.Assert.assertThat;
import static org.xmlunit.matchers.CompareMatcher.isIdenticalTo;
import org.xmlunit.builder.Input;
import org.xmlunit.input.WhitespaceStrippedSource;

public class SomeTest extends XMLTestCase {
	@Test
	public void test() {
		String result = "<root></root>";
		String expected = "<root>  </root>";

		// ignore whitespace differences
		// https://github.com/xmlunit/user-guide/wiki/Providing-Input-to-XMLUnit#whitespacestrippedsource
		assertThat(result, isIdenticalTo(new WhitespaceStrippedSource(Input.from(expected).build())));
		
		assertThat(result, isIdenticalTo(Input.from(expected).build())); // will fail due to whitespace differences
	}
}

The documentation that outlines this is https://github.com/xmlunit/xmlunit#comparing-two-documents

Solution 6 - Java

Thanks, I extended this, try this ...

import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

public class XmlDiff 
{
	private boolean nodeTypeDiff = true;
	private boolean nodeValueDiff = true;
	
	public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
	{
		DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		dbf.setNamespaceAware(true);
		dbf.setCoalescing(true);
		dbf.setIgnoringElementContentWhitespace(true);
		dbf.setIgnoringComments(true);
		DocumentBuilder db = dbf.newDocumentBuilder();

		
		Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
		Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));

		doc1.normalizeDocument();
		doc2.normalizeDocument();

		return diff( doc1, doc2, diffs );

	}
	
	/**
	 * Diff 2 nodes and put the diffs in the list 
	 */
	public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
	{
		if( diffNodeExists( node1, node2, diffs ) )
		{
			return true;
		}
		
		if( nodeTypeDiff )
		{
			diffNodeType(node1, node2, diffs );
		}

		if( nodeValueDiff )
		{
			diffNodeValue(node1, node2, diffs );
		}
		
		
		System.out.println(node1.getNodeName() + "/" + node2.getNodeName());
		
		diffAttributes( node1, node2, diffs );
		diffNodes( node1, node2, diffs );
		
		return diffs.size() > 0;
	}
	
	/**
	 * Diff the nodes
	 */
	public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
	{
		//Sort by Name
		Map<String,Node> children1 = new LinkedHashMap<String,Node>();		
		for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
		{
			children1.put( child1.getNodeName(), child1 );
		}

		//Sort by Name
		Map<String,Node> children2 = new LinkedHashMap<String,Node>();		
		for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
		{
			children2.put( child2.getNodeName(), child2 );
		}
		
		//Diff all the children1
		for( Node child1 : children1.values() )
		{
			Node child2 = children2.remove( child1.getNodeName() );
			diff( child1, child2, diffs );
		}
		
		//Diff all the children2 left over
		for( Node child2 : children2.values() )
		{
			Node child1 = children1.get( child2.getNodeName() );
			diff( child1, child2, diffs );
		}
		
		return diffs.size() > 0;
	}
	
	
	/**
	 * Diff the nodes
	 */
	public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
	{		 
		//Sort by Name
		NamedNodeMap nodeMap1 = node1.getAttributes();
		Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();		
		for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
		{
			attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
		}

		//Sort by Name
		NamedNodeMap nodeMap2 = node2.getAttributes();
		Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();		
		for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
		{
			attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );

		}
		
		//Diff all the attributes1
		for( Node attribute1 : attributes1.values() )
		{
			Node attribute2 = attributes2.remove( attribute1.getNodeName() );
			diff( attribute1, attribute2, diffs );
		}
		
		//Diff all the attributes2 left over
		for( Node attribute2 : attributes2.values() )
		{
			Node attribute1 = attributes1.get( attribute2.getNodeName() );
			diff( attribute1, attribute2, diffs );
		}
		
		return diffs.size() > 0;
	}
	/**
	 * Check that the nodes exist
	 */
	public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
	{
		if( node1 == null && node2 == null )
		{
			diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
			return true;
		}
		
		if( node1 == null && node2 != null )
		{
			diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
			return true;
		}
			
		if( node1 != null && node2 == null )
		{
			diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
			return true;
		}
		
		return false;
	}
	
	/**
	 * Diff the Node Type
	 */
	public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
	{		
		if( node1.getNodeType() != node2.getNodeType() ) 
		{
			diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
			return true;
		}
		
		return false;
	}

	/**
	 * Diff the Node Value
	 */
	public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
	{		
		if( node1.getNodeValue() == null && node2.getNodeValue() == null )
		{
			return false;
		}
		
		if( node1.getNodeValue() == null && node2.getNodeValue() != null )
		{
			diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
			return true;
		}
		
		if( node1.getNodeValue() != null && node2.getNodeValue() == null )
		{
			diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
			return true;
		}
		
		if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
		{
			diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
			return true;
		}
		
		return false;
	}
	

	/**
	 * Get the node path
	 */
	public String getPath( Node node )
	{
		StringBuilder path = new StringBuilder();
		
		do
		{			
			path.insert(0, node.getNodeName() );
			path.insert( 0, "/" );
		}
		while( ( node = node.getParentNode() ) != null );
		
		return path.toString();
	}
}

Solution 7 - Java

AssertJ 1.4+ has specific assertions to compare XML content:

String expectedXml = "<foo />";
String actualXml = "<bar />";
assertThat(actualXml).isXmlEqualTo(expectedXml);

Here is the Documentation

Solution 8 - Java

Below code works for me

String xml1 = ...
String xml2 = ...
XMLUnit.setIgnoreWhitespace(true);
XMLUnit.setIgnoreAttributeOrder(true);
XMLAssert.assertXMLEqual(actualxml, xmlInDb);

Solution 9 - Java

skaffman seems to be giving a good answer.

another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

Solution 10 - Java

I'm using Altova DiffDog which has options to compare XML files structurally (ignoring string data).

This means that (if checking the 'ignore text' option):

<foo a="xxx" b="xxx">xxx</foo>

and

<foo b="yyy" a="yyy">yyy</foo> 

are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

Solution 11 - Java

I required the same functionality as requested in the main question. As I was not allowed to use any 3rd party libraries, I have created my own solution basing on @Archimedes Trajano solution.

Following is my solution.

import java.io.ByteArrayInputStream;
import java.nio.charset.Charset;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.junit.Assert;
import org.w3c.dom.Document;

/**
 * Asserts for asserting XML strings.
 */
public final class AssertXml {

    private AssertXml() {
    }

    private static Pattern NAMESPACE_PATTERN = Pattern.compile("xmlns:(ns\\d+)=\"(.*?)\"");

    /**
     * Asserts that two XML are of identical content (namespace aliases are ignored).
     * 
     * @param expectedXml expected XML
     * @param actualXml actual XML
     * @throws Exception thrown if XML parsing fails
     */
    public static void assertEqualXmls(String expectedXml, String actualXml) throws Exception {
        // Find all namespace mappings
        Map<String, String> fullnamespace2newAlias = new HashMap<String, String>();
        generateNewAliasesForNamespacesFromXml(expectedXml, fullnamespace2newAlias);
        generateNewAliasesForNamespacesFromXml(actualXml, fullnamespace2newAlias);

        for (Entry<String, String> entry : fullnamespace2newAlias.entrySet()) {
            String newAlias = entry.getValue();
            String namespace = entry.getKey();
            Pattern nsReplacePattern = Pattern.compile("xmlns:(ns\\d+)=\"" + namespace + "\"");
            expectedXml = transletaNamespaceAliasesToNewAlias(expectedXml, newAlias, nsReplacePattern);
            actualXml = transletaNamespaceAliasesToNewAlias(actualXml, newAlias, nsReplacePattern);
        }

        // nomralize namespaces accoring to given mapping

        DocumentBuilder db = initDocumentParserFactory();

        Document expectedDocuemnt = db.parse(new ByteArrayInputStream(expectedXml.getBytes(Charset.forName("UTF-8"))));
        expectedDocuemnt.normalizeDocument();

        Document actualDocument = db.parse(new ByteArrayInputStream(actualXml.getBytes(Charset.forName("UTF-8"))));
        actualDocument.normalizeDocument();

        if (!expectedDocuemnt.isEqualNode(actualDocument)) {
            Assert.assertEquals(expectedXml, actualXml); //just to better visualize the diffeences i.e. in eclipse
        }
    }


    private static DocumentBuilder initDocumentParserFactory() throws ParserConfigurationException {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(false);
        dbf.setCoalescing(true);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setIgnoringComments(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db;
    }

    private static String transletaNamespaceAliasesToNewAlias(String xml, String newAlias, Pattern namespacePattern) {
        Matcher nsMatcherExp = namespacePattern.matcher(xml);
        if (nsMatcherExp.find()) {
            xml = xml.replaceAll(nsMatcherExp.group(1) + "[:]", newAlias + ":");
            xml = xml.replaceAll(nsMatcherExp.group(1) + "=", newAlias + "=");
        }
        return xml;
    }

    private static void generateNewAliasesForNamespacesFromXml(String xml, Map<String, String> fullnamespace2newAlias) {
        Matcher nsMatcher = NAMESPACE_PATTERN.matcher(xml);
        while (nsMatcher.find()) {
            if (!fullnamespace2newAlias.containsKey(nsMatcher.group(2))) {
                fullnamespace2newAlias.put(nsMatcher.group(2), "nsTr" + (fullnamespace2newAlias.size() + 1));
            }
        }
    }

}

It compares two XML strings and takes care of any mismatching namespace mappings by translating them to unique values in both input strings.

Can be fine tuned i.e. in case of translation of namespaces. But for my requirements just does the job.

Solution 12 - Java

This will compare full string XMLs (reformatting them on the way). It makes it easy to work with your IDE (IntelliJ, Eclipse), cos you just click and visually see the difference in the XML files.

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;

import static org.apache.xml.security.Init.init;
import static org.junit.Assert.assertEquals;

public class XmlUtils {
    static {
        init();
    }

    public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
        Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
        byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
        return new String(canonXmlBytes);
    }

    public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        InputSource src = new InputSource(new StringReader(input));
        Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
        Boolean keepDeclaration = input.startsWith("<?xml");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        LSSerializer writer = impl.createLSSerializer();
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
        writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
        return writer.writeToString(document);
    }

    public static void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
        String canonicalExpected = prettyFormat(toCanonicalXml(expected));
        String canonicalActual = prettyFormat(toCanonicalXml(actual));
        assertEquals(canonicalExpected, canonicalActual);
    }
}

I prefer this to XmlUnit because the client code (test code) is cleaner.

Solution 13 - Java

Using JExamXML with java application

    import com.a7soft.examxml.ExamXML;
    import com.a7soft.examxml.Options;
    
       .................
    
       // Reads two XML files into two strings
       String s1 = readFile("orders1.xml");
       String s2 = readFile("orders.xml");
    
       // Loads options saved in a property file
       Options.loadOptions("options");
    
       // Compares two Strings representing XML entities
       System.out.println( ExamXML.compareXMLString( s1, s2 ) );

Solution 14 - Java

Using XMLUnit 2.x

In the pom.xml

<dependency>
    <groupId>org.xmlunit</groupId>
    <artifactId>xmlunit-assertj3</artifactId>
    <version>2.9.0</version>
</dependency>

Test implementation (using junit 5) :

import org.junit.jupiter.api.Test;
import org.xmlunit.assertj3.XmlAssert;

public class FooTest {

    @Test
    public void compareXml() {
        //
        String xmlContentA = "<foo></foo>";
        String xmlContentB = "<foo></foo>";
        //
        XmlAssert.assertThat(xmlContentA).and(xmlContentB).areSimilar();
    }
}

Other methods : areIdentical(), areNotIdentical(), areNotSimilar()

More details (configuration of assertThat(~).and(~) and examples) in this documentation page.

XMLUnit also has (among other features) a DifferenceEvaluator to do more precise comparisons.

XMLUnit website

Solution 15 - Java

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like

<foo> some stuff here</foo></code>

and

<foo>some stuff here</foo></code>

do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Simply build that object from the messages and use a custom equals() to define what you're looking for.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMike DeckView Question on Stackoverflow
Solution 1 - JavaTomView Answer on Stackoverflow
Solution 2 - JavaArchimedes TrajanoView Answer on Stackoverflow
Solution 3 - JavaskaffmanView Answer on Stackoverflow
Solution 4 - JavaacdcjuniorView Answer on Stackoverflow
Solution 5 - JavaTom SaleebaView Answer on Stackoverflow
Solution 6 - JavaJavelinView Answer on Stackoverflow
Solution 7 - JavaGian MarcoView Answer on Stackoverflow
Solution 8 - Javaarunkumar sambuView Answer on Stackoverflow
Solution 9 - JavaanjanbView Answer on Stackoverflow
Solution 10 - JavaPimin Konstantin KefaloukosView Answer on Stackoverflow
Solution 11 - JavaTouDickView Answer on Stackoverflow
Solution 12 - JavaWojtekView Answer on Stackoverflow
Solution 13 - JavaSreeView Answer on Stackoverflow
Solution 14 - JavaNicolas SénaveView Answer on Stackoverflow
Solution 15 - JavaSteve B.View Answer on Stackoverflow