Anyone have a diff algorithm for rendered HTML?

JavascriptHtmlDiff

Javascript Problem Overview


I'm interested in seeing a good diff algorithm, possibly in Javascript, for rendering a side-by-side diff of two HTML pages. The idea would be that the diff would show the differences of the rendered HTML.

To clarify, I want to be able to see the side-by-side diffs as rendered output. So if I delete a paragraph, the side by side view would know to space things correctly.


@Josh exactly. Though maybe it would show the deleted text in red or something. The idea is that if I use a WYSIWYG editor for my HTML content, I don't want to have to switch to HTML to do diffs. I want to do it with two WYSIWYG editors side by side maybe. Or at least display diffs side-by-side in an end-user friendly matter.

Javascript Solutions


Solution 1 - Javascript

There's another nice trick you can use to significantly improve the look of a rendered HTML diff. Although this doesn't fully solve the initial problem, it will make a significant difference in the appearance of your rendered HTML diffs.

Side-by-side rendered HTML will make it very difficult for your diff to line up vertically. Vertical alignment is crucial for comparing side-by-side diffs. In order to improve the vertical alignment of a side-by-side diff, you can insert invisible HTML elements in each version of the diff at "checkpoints" where the diff should be vertically aligned. Then you can use a bit of client-side JavaScript to add vertical spacing around checkpoint until the sides line up vertically.

Explained in a little more detail:

If you want to use this technique, run your diff algorithm and insert a bunch of visibility:hidden <span>s or tiny <div>s wherever your side-by-side versions should match up, according to the diff. Then run JavaScript that finds each checkpoint (and its side-by-side neighbor) and adds vertical spacing to the checkpoint that is higher-up (shallower) on the page. Now your rendered HTML diff will be vertically aligned up to that checkpoint, and you can continue repairing vertical alignment down the rest of your side-by-side page.

Solution 2 - Javascript

Over the weekend I posted a new project on codeplex that implements an HTML diff algorithm in C#. The original algorithm was written in Ruby. I understand you were looking for a JavaScript implementation, perhaps having one available in C# with source code could assist you to port the algorithm. Here is the link if you are interested: htmldiff.codeplex.com. You can read more about it here.

UPDATE: This library has been moved to GitHub.

Solution 3 - Javascript

I ended up needing something similar awhile back. To get the HTML to line up side to side, you could use two iFrames, but you'd then have to tie their scrolling together via javascript as you scroll (if you allow scrolling).

To see the diff, however, you will more than likely want to use someone else's library. I used DaisyDiff, a Java library, for a similar project where my client was happy with seeing a single HTML rendering of the content with MS Word "track changes"-like markup.

HTH

Solution 4 - Javascript

Consider using the output of links or lynx to render a text-only version of the html, and then diff that.

Solution 5 - Javascript

What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".

  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.

  • In addition to the default visual diff, HTML source can be diffed coherently.

  • Provides easy to understand descriptions of the changes.

  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.

Solution 6 - Javascript

So, you expect

<font face="Arial">Hi Mom</font>

and

<span style="font-family:Arial;">Hi Mom</span>

to be considered the same?

The output depends very much on the User Agent. Like Ionut Anghelcovici suggests, make an image. Do one for every browser you care about.

Solution 7 - Javascript

Use the markup mode of Pretty Diff for HTML. It is written entirely in JavaScript.

http://prettydiff.com/

Solution 8 - Javascript

If it is XHTML (which assumes a lot on my part) would the Xml Diff Patch Toolkit help? http://msdn.microsoft.com/en-us/library/aa302294.aspx

Solution 9 - Javascript

For smaller differences you might be able to do a normal text-diff, and then analyse the missing or inserted pieces to see how to resolve it, but for any larger differences you're going to have a very tough time doing this.

For instance, how would you detect, and show, that a left-aligned image (floating left of a paragraph of text) has suddenly become right-aligned?

Solution 10 - Javascript

Using a text differ will break on non-trivial documents. Depending on what you think is intuitive, XML differs will probably generate diffs that aren't very good for text with markup. AFAIK, DaisyDiff is the only library specialized in HTML. It works great for a subset of HTML.

Solution 11 - Javascript

If you were working with Java and XHTML, XMLUnit allows you to compare two XML documents via the org.custommonkey.xmlunit.DetailedDiff class:

> Compares and describes all the > differences between two XML documents. > The document comparison does not stop > once the first unrecoverable > difference is found, unlike the Diff > class.

Solution 12 - Javascript

I believe a good way to do this is to render the HTML to an image and then use some diff tool that can compare images to spot the differences.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHaackedView Question on Stackoverflow
Solution 1 - JavascriptkamensView Answer on Stackoverflow
Solution 2 - JavascriptRohlandView Answer on Stackoverflow
Solution 3 - JavascriptkooshmooseView Answer on Stackoverflow
Solution 4 - JavascriptArafangionView Answer on Stackoverflow
Solution 5 - JavascriptelhoimView Answer on Stackoverflow
Solution 6 - JavascriptJoshView Answer on Stackoverflow
Solution 7 - Javascriptaustin cheneyView Answer on Stackoverflow
Solution 8 - JavascriptMotoWilliamsView Answer on Stackoverflow
Solution 9 - JavascriptLasse V. KarlsenView Answer on Stackoverflow
Solution 10 - JavascriptguyvdbView Answer on Stackoverflow
Solution 11 - JavascriptAtes GoralView Answer on Stackoverflow
Solution 12 - JavascriptIonut AnghelcoviciView Answer on Stackoverflow