Wanted: Command line HTML5 beautifier

HtmlCommand LineIndentationPretty Print

Html Problem Overview


Wanted

A command line HTML5 beautifier running under Linux.

Input

Garbled, ugly HTML5 code. Possibly the result of multiple templates. You don't love it, it doesn't love you.

Output

Pure beauty. The code is nicely indented, has enough line breaks, cares for it's whitespace. Rather than viewing it in a webbrowser, you would like to display the code on your website directly.

Suspects
  • tidy does too much (heck, it alters my doctype!), and it doesn't work well with HTML5. Maybe there is a way to make it cooperate and not alter anything?
  • vim does too little. It only indents. I want the program to add and remove line breaks, and to play with the whitespace inside of tags.

DEAD OR ALIVE!

Html Solutions


Solution 1 - Html

HTML Tidy has been forked by the w3c and now has support for HTML5 validation.

https://github.com/w3c/tidy-html5

Solution 2 - Html

I suspect tidy can be made to work with the right command-line parameters.

http://tidy.sourceforge.net/docs/quickref.html

You can specify an arbitrary doctype and add new block, inline, and empty tags, and turn on and off lots of tidy's cleaning options.

Depending on what you want it to "beautify" you can probably get decent results. It probably won't be able to do some of the more advanced things like rewriting the html content to eliminate spurious elements or combining them, if it doesn't recognize them.

Solution 3 - Html

Copied from a live website I did using HTML5 that is validated as proper HTML5 on all pages thanks to this snippet (PHP in this case but the options and logic is the same for any language used):

	$options = array(
	    'hide-comments' => true,
		'tidy-mark' => false,
		'indent' => true,
		'indent-spaces' => 4,
		'new-blocklevel-tags' => 'article,header,footer,section,nav',
		'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
        'new-empty-tags' => 'source',
		'doctype' => '<!DOCTYPE HTML>',
		'sort-attributes' => 'alpha',
		'vertical-space' => false,
		'output-xhtml' => true,
		'wrap' => 180,
		'wrap-attributes' => false,
		'break-before-br' => false,
	);

	$buffer = tidy_parse_string($buffer, $options, 'utf8');
	tidy_clean_repair($buffer);
	// Fix a tidy doctype bug
	$buffer = str_replace('<html lang="en" xmlns="http://www.w3.org/1999/xhtml">', '<!DOCTYPE HTML>', $buffer);

Solution 4 - Html

If you use Haml as your nanoc-filter, your html will automatically be pretty-printed. You can set html5 output as an option.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionblinryView Question on Stackoverflow
Solution 1 - HtmlmhansenView Answer on Stackoverflow
Solution 2 - HtmlMr. Shiny and New 安宇View Answer on Stackoverflow
Solution 3 - HtmlPhilippView Answer on Stackoverflow
Solution 4 - HtmlDan BrendstrupView Answer on Stackoverflow