Best way to automatically remove comments from PHP code

PhpCommentsStrip

Php Problem Overview


What’s the best way to remove comments from a PHP file?

I want to do something similar to strip-whitespace() - but it shouldn't remove the line breaks as well.

For example,

I want this:

<?PHP
// something
if ($whatsit) {
    do_something(); # we do something here
    echo '<html>Some embedded HTML</html>';
}
/* another long
comment
*/
some_more_code();
?>

to become:

<?PHP
if ($whatsit) {
    do_something();
    echo '<html>Some embedded HTML</html>';
}
some_more_code();
?>

(Although if the empty lines remain where comments are removed, that wouldn't be OK.)

It may not be possible, because of the requirement to preserve embedded HTML - that’s what’s tripped up the things that have come up on Google.

Php Solutions


Solution 1 - Php

I'd use tokenizer. Here's my solution. It should work on both PHP 4 and 5:

$fileStr = file_get_contents('path/to/file');
$newStr  = '';

$commentTokens = array(T_COMMENT);
    
if (defined('T_DOC_COMMENT')) {
    $commentTokens[] = T_DOC_COMMENT; // PHP 5
}

if (defined('T_ML_COMMENT')) {
    $commentTokens[] = T_ML_COMMENT;  // PHP 4
}

$tokens = token_get_all($fileStr);

foreach ($tokens as $token) {    
    if (is_array($token)) {
        if (in_array($token[0], $commentTokens)) {
            continue;
        }
        
        $token = $token[1];
    }

    $newStr .= $token;
}

echo $newStr;

Solution 2 - Php

Use php -w <sourcefile> to generate a file stripped of comments and whitespace, and then use a beautifier like PHP_Beautifier to reformat for readability.

Solution 3 - Php

$fileStr = file_get_contents('file.php');
foreach (token_get_all($fileStr) as $token ) {
    if ($token[0] != T_COMMENT) {
        continue;
    }
    $fileStr = str_replace($token[1], '', $fileStr);
}

echo $fileStr;

Solution 4 - Php

Here's the function posted above, modified to recursively remove all comments from all PHP files within a directory and all its subdirectories:

function rmcomments($id) {
    if (file_exists($id)) {
        if (is_dir($id)) {
            $handle = opendir($id);
            while($file = readdir($handle)) {
                if (($file != ".") && ($file != "..")) {
                    rmcomments($id . "/" . $file); }}
            closedir($handle); }
        else if ((is_file($id)) && (end(explode('.', $id)) == "php")) {
            if (!is_writable($id)) { chmod($id, 0777); }
            if (is_writable($id)) {
                $fileStr = file_get_contents($id);
                $newStr  = '';
                $commentTokens = array(T_COMMENT);
                if (defined('T_DOC_COMMENT')) { $commentTokens[] = T_DOC_COMMENT; }
                if (defined('T_ML_COMMENT')) { $commentTokens[] = T_ML_COMMENT; }
                $tokens = token_get_all($fileStr);
                foreach ($tokens as $token) {
                    if (is_array($token)) {
                        if (in_array($token[0], $commentTokens)) { continue; }
                        $token = $token[1]; }
                    $newStr .= $token; }
                if (!file_put_contents($id, $newStr)) {
                    $open = fopen($id, "w");
                    fwrite($open, $newStr);
                    fclose($open);
                }
            }
        }
    }
}

rmcomments("path/to/directory");

Solution 5 - Php

A more powerful version: remove all comments in the folder

<?php
    $di = new RecursiveDirectoryIterator(__DIR__, RecursiveDirectoryIterator::SKIP_DOTS);
    $it = new RecursiveIteratorIterator($di);
    $fileArr = [];
    foreach($it as $file) {
        if(pathinfo($file, PATHINFO_EXTENSION) == "php") {
            ob_start();
            echo $file;
            $file = ob_get_clean();
            $fileArr[] = $file;
        }
    }
    $arr = [T_COMMENT, T_DOC_COMMENT];
    $count = count($fileArr);
    for($i=1; $i < $count; $i++) {
        $fileStr = file_get_contents($fileArr[$i]);
        foreach(token_get_all($fileStr) as $token) {
            if(in_array($token[0], $arr)) {
                $fileStr = str_replace($token[1], '', $fileStr);
            }
        }
        file_put_contents($fileArr[$i], $fileStr);
    }

Solution 6 - Php

If you already use an editor like UltraEdit, you can open one or multiple PHP file(s) and then use a simple Find&Replace (Ctrl + R) with the following Perl regular expression:

(?s)/\*.*\*/

Beware the above regular expression also removes comments inside a string, i.e., in echo "hello/*babe*/"; the /*babe*/ would be removed too. Hence, it could be a solution if you have few files to remove comments from. In order to be absolutely sure it does not wrongly replace something that is not a comment, you would have to run the Find&Replace command and approve each time what is getting replaced.

Solution 7 - Php

Bash solution: If you want to remove recursively comments from all PHP files starting from the current directory, you can write this one-liner in the terminal. (It uses temp1 file to store PHP content for processing.)

Note that this will strip all white spaces with comments.

 find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1  ;  cat temp1 > $VAR; done

Then you should remove temp1 file after.

If PHP_BEAUTIFER is installed then you can get nicely formatted code without comments with

 find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1; php_beautifier temp1 > temp2;  cat temp2 > $VAR; done;

Then remove two files (temp1 and temp2).

Solution 8 - Php

Following upon the accepted answer, I needed to preserve the line numbers of the file too, so here is a variation of the accepted answer:

    /**
     * Removes the php comments from the given valid php string, and returns the result.
     *
     * Note: a valid php string must start with <?php.
     *
     * If the preserveWhiteSpace option is true, it will replace the comments with some whitespaces, so that
     * the line numbers are preserved.
     *
     *
     * @param string $str
     * @param bool $preserveWhiteSpace
     * @return string
     */
    function removePhpComments(string $str, bool $preserveWhiteSpace = true): string
    {
        $commentTokens = [
            \T_COMMENT,
            \T_DOC_COMMENT,
        ];
        $tokens = token_get_all($str);


        if (true === $preserveWhiteSpace) {
            $lines = explode(PHP_EOL, $str);
        }


        $s = '';
        foreach ($tokens as $token) {
            if (is_array($token)) {
                if (in_array($token[0], $commentTokens)) {
                    if (true === $preserveWhiteSpace) {
                        $comment = $token[1];
                        $lineNb = $token[2];
                        $firstLine = $lines[$lineNb - 1];
                        $p = explode(PHP_EOL, $comment);
                        $nbLineComments = count($p);
                        if ($nbLineComments < 1) {
                            $nbLineComments = 1;
                        }
                        $firstCommentLine = array_shift($p);

                        $isStandAlone = (trim($firstLine) === trim($firstCommentLine));

                        if (false === $isStandAlone) {
                            if (2 === $nbLineComments) {
                                $s .= PHP_EOL;
                            }

                            continue; // Just remove inline comments
                        }

                        // Stand-alone case
                        $s .= str_repeat(PHP_EOL, $nbLineComments - 1);
                    }
                    continue;
                }
                $token = $token[1];
            }

            $s .= $token;
        }
        return $s;
    }

Note: this is for PHP 7+ (I didn't care about backward compatibility with older PHP versions).

Solution 9 - Php

/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/

if (! defined('T_ML_COMMENT')) {
	define('T_ML_COMMENT', T_COMMENT);
} else {
	define('T_DOC_COMMENT', T_ML_COMMENT);
}

/*
 * Remove all comment in $file
 */

function remove_comment($file) {
	$comment_token = array(T_COMMENT, T_ML_COMMENT, T_DOC_COMMENT);
	
	$input = file_get_contents($file);
	$tokens = token_get_all($input);
	$output = '';
	
	foreach ($tokens as $token) {
		if (is_string($token)) {
			$output .= $token;
		} else {
			list($id, $text) = $token;
	
			if (in_array($id, $comment_token)) {
				$output .= $text;
			}
		}
	}
	
	file_put_contents($file, $output);
}

/*
 * Glob recursive
 * @return ['dir/filename', ...]
 */

function glob_recursive($pattern, $flags = 0) {
	$file_list = glob($pattern, $flags);

	$sub_dir = glob(dirname($pattern) . '/*', GLOB_ONLYDIR);
	// If sub directory exist
	if (count($sub_dir) > 0) {
		$file_list = array_merge(
			glob_recursive(dirname($pattern) . '/*/' . basename($pattern), $flags),
			$file_list
		);
	}
   
	return $file_list;
}

// Remove all comment of '*.php', include sub directory
foreach (glob_recursive('*.php') as $file) {
	remove_comment($file);
}

Solution 10 - Php

For Ajax and JSON responses, I use the following PHP code, to remove comments from HTML/JavaScript code, so it would be smaller (about 15% gain for my code).

// Replace doubled spaces with single ones (ignored in HTML any way)
$html = preg_replace('@(\s){2,}@', '\1', $html);
// Remove single and multiline comments, tabs and newline chars
$html = preg_replace(
    '@(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|((?<!:)//.*)|[\t\r\n]@i',
    '',
    $html
);

It is short and effective, but it can produce unexpected results, if your code has bad syntax.

Solution 11 - Php

Run the command php --strip file.php in a command prompt (for example., cmd.exe), and then browse to WriteCodeOnline.

Here, file.php is your own file.

1

Solution 12 - Php

In 2019 it could work like this:

<?php
/*   hi there !!!
here are the comments */
//another try

echo removecomments('index.php');

/*   hi there !!!
here are the comments */
//another try
function removecomments($f){
    $w=Array(';','{','}');
    $ts = token_get_all(php_strip_whitespace($f));
    $s='';
    foreach($ts as $t){
        if(is_array($t)){
            $s .=$t[1];
        }else{
            $s .=$t;
            if( in_array($t,$w) ) $s.=chr(13).chr(10);
        }
    }

    return $s;
}

?>

If you want to see the results, just let's run it first in XAMPP, and then you get a blank page, but if you right click and click on view source, you get your PHP script ... it's loading itself and it's removing all comments and also tabs.

I prefer this solution too, because I use it to speed up my framework one file engine "m.php" and after php_strip_whitespace, all source without this script I observe is slowest: I did 10 benchmarks, and then I calculate the math average (I think PHP 7 is restoring back the missing cr_lf's when it is parsing or it is taking a while when these are missing).

Solution 13 - Php

The catch is that a less robust matching algorithm (simple regex, for instance) will start stripping here when it clearly shouldn't:

if (preg_match('#^/*' . $this->index . '#', $this->permalink_structure)) {  

It might not affect your code, but eventually someone will get bit by your script. So you will have to use a utility that understands more of the language than you might otherwise expect.

Solution 14 - Php

php -w or php_strip_whitespace($filename);

documentation

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionbenlumleyView Question on Stackoverflow
Solution 1 - PhpIonuț G. StanView Answer on Stackoverflow
Solution 2 - PhpPaul DixonView Answer on Stackoverflow
Solution 3 - PhpTom HaighView Answer on Stackoverflow
Solution 4 - PhpJohn TylerView Answer on Stackoverflow
Solution 5 - PhpZhiJia TangView Answer on Stackoverflow
Solution 6 - PhpMarco DemaioView Answer on Stackoverflow
Solution 7 - PhpPawel DubielView Answer on Stackoverflow
Solution 8 - PhplingView Answer on Stackoverflow
Solution 9 - PhpSteely WingView Answer on Stackoverflow
Solution 10 - PhpDeeleView Answer on Stackoverflow
Solution 11 - PhpRobi ParvezView Answer on Stackoverflow
Solution 12 - PhpConstantinView Answer on Stackoverflow
Solution 13 - PhpAdam DavisView Answer on Stackoverflow
Solution 14 - PhpGam SengieView Answer on Stackoverflow