PHP DOMDocument errors/warnings on html5-tags

PhpHtmlDomdocument

Php Problem Overview


I've been attempting to parse HTML5-code so I can set attributes/values within the code, but it seems DOMDocument(PHP5.3) doesn't support tags like <nav> and <section>.

Is there any way to parse this as HTML in PHP and manipulate the code?


Code to reproduce:

<?php
$dom = new DOMDocument();
$dom->loadHTML("<!DOCTYPE HTML>
<html><head><title>test</title></head>
<body>
<nav>
  <ul>
    <li>first
    <li>second
  </ul>
</nav>
<section>
  ...
</section>
</body>
</html>");

Error > Warning: DOMDocument::loadHTML(): Tag > nav invalid in Entity, line: 4 in > /home/wbkrnl/public_html/new-mvc/1.php > on line 17 > > Warning: DOMDocument::loadHTML(): Tag > section invalid in Entity, line: 10 in > /home/wbkrnl/public_html/new-mvc/1.php > on line 17

Php Solutions


Solution 1 - Php

No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.

Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors:

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();

Solution 2 - Php

You could also do

@$dom->loadHTML($htmlString);

Solution 3 - Php

You can filter the errors you get from the parser. As per other answers here, turn off error reporting to the screen, and then iterate through the errors and only show the ones you want:

libxml_use_internal_errors(TRUE);
// Do your load here
$errors = libxml_get_errors();

foreach ($errors as $error)
{
    /* @var $error LibXMLError */
}

Here is a print_r() of a single error:

LibXMLError Object
(
    [level] => 2
    [code] => 801
    [column] => 17
    [message] => Tag section invalid

    [file] => 
    [line] => 39
)

By matching on the message and/or the code, these can be filtered out quite easily.

Solution 4 - Php

There doesn't seem to be a way to kill warnings but not errors. PHP has constants that are supposed to do this, but they don't seem to work. Here is what is SHOULD work, but doesn't because (bug?)....

 $doc=new DOMDocument();
 $doc->loadHTML("<tagthatdoesnotexist><h1>Hi</h1></tagthatdoesnotexist>", LIBXML_NOWARNING );
 echo $doc->saveHTML();

http://php.net/manual/en/libxml.constants.php

Solution 5 - Php

This worked for me:

$html = file_get_contents($url);
        
$search = array("<header>", "</header>", "<nav>", "</nav>", "<section>", "</section>");
$replace = array("<div>", "</div>","<div>", "</div>", "<div>", "</div>");
$html = str_replace($search, $replace, $html);
            
$dom = new DOMDocument();
$dom->loadHTML($html);

If you need the header tag, change the header with a div tag and use an id. For instance:

$search = array("<header>", "</header>");
$replace = array("<div id='header1'>", "</div>");

It's not the best solution but depending on the situation it can be useful.

Good luck.

Solution 6 - Php

HTML5 tags almost always use attributes such as id, class and so on. So the code for replacing will be:

$html = file_get_contents($url);
$search = array(
	"<header", "</header>", 
	"<nav", "</nav>", 
	"<section", "</section>",
	"<article", "</article>",
	"<footer", "</footer>",
	"<aside", "</aside>",
	"<noindex", "</noindex>",
);
$replace = array(
	"<div", "</div>",
	"<div", "</div>", 
	"<div", "</div>",
	"<div", "</div>",
	"<div", "</div>",
	"<div", "</div>",
	"<div", "</div>",
);
$html = str_replace($search, $replace, $html);
$dom = new DOMDocument();
$dom->loadHTML($html);

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKlaas SangersView Question on Stackoverflow
Solution 1 - PhplonesomedayView Answer on Stackoverflow
Solution 2 - PhpIlker MutluView Answer on Stackoverflow
Solution 3 - PhphalferView Answer on Stackoverflow
Solution 4 - Phpuser2782001View Answer on Stackoverflow
Solution 5 - PhpEmiliano SangoiView Answer on Stackoverflow
Solution 6 - PhpSergey KaluzhskyView Answer on Stackoverflow