Remove comments from C/C++ code

C++CComments

C++ Problem Overview


Is there an easy way to remove comments from a C/C++ source file without doing any preprocessing. (ie, I think you can use gcc -E but this will expand macros.) I just want the source code with comments stripped, nothing else should be changed.

EDIT:

Preference towards an existing tool. I don't want to have to write this myself with regexes, I foresee too many surprises in the code.

C++ Solutions


Solution 1 - C++

Run the following command on your source file:

gcc -fpreprocessed -dD -E test.c

Thanks to KennyTM for finding the right flags. Here’s the result for completeness:

test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo
/* comments? comments. */
// c++ style comments

gcc -fpreprocessed -dD -E test.c:

#define foo bar
foo foo foo
#ifdef foo
#undef foo
#define foo baz
#endif
foo foo

Solution 2 - C++

It depends on how perverse your comments are. I have a program scc to strip C and C++ comments. I also have a test file for it, and I tried GCC (4.2.1 on MacOS X) with the options in the currently selected answer - and GCC doesn't seem to do a perfect job on some of the horribly butchered comments in the test case.

NB: This isn't a real-life problem - people don't write such ghastly code.

Consider the (subset - 36 of 135 lines total) of the test case:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

On my Mac, the output from GCC (gcc -fpreprocessed -dD -E subset.c) is:

/\
*\
Regular
comment
*\
/
The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
/\
*/ This is a regular C comment *\
but this is just a routine continuation *\
and that was not the end either - but this is *\
\
/
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
/\
\
\
\
* C comment */

The output from 'scc' is:

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.
/\
\
\
/ But this is a C++/C99 comment!
The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
 
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.

The output from 'scc -C' (which recognizes double-slash comments) is:

The regular C comment number 1 has finished.

/\
\/ This is not a C++/C99 comment!

This is followed by C++/C99 comment number 3.

The C++/C99 comment number 3 has finished.

/\
\* This is not a C or C++ comment!

This is followed by regular C comment number 2.
 
The regular C comment number 2 has finished.

This is followed by regular C comment number 3.
 


Source for SCC now available on GitHub

The current version of SCC is 6.60 (dated 2016-06-12), though the Git versions were created on 2017-01-18 (in the US/Pacific time zone). The code is available from GitHub at https://github.com/jleffler/scc-snapshots. You can also find snapshots of the previous releases (4.03, 4.04, 5.05) and two pre-releases (6.16, 6.50) — these are all tagged release/x.yz.

The code is still primarily developed under RCS. I'm still working out how I want to use sub-modules or a similar mechanism to handle common library files like stderr.c and stderr.h (which can also be found in https://github.com/jleffler/soq).

SCC version 6.60 attempts to understand C++11, C++14 and C++17 constructs such as binary constants, numeric punctuation, raw strings, and hexadecimal floats. It defaults to C11 mode operation. (Note that the meaning of the -C flag — mentioned above — flipped between version 4.0x described in the main body of the answer and version 6.60 which is currently the latest release.)

Solution 3 - C++

gcc -fpreprocessed -dD -E did not work for me but this program does it:

#include <stdio.h>

static void process(FILE *f)
{
 int c;
 while ( (c=getc(f)) != EOF )
 {
  if (c=='\'' || c=='"')			/* literal */
  {
   int q=c;
   do
   {
    putchar(c);
    if (c=='\\') putchar(getc(f));
    c=getc(f);
   } while (c!=q);
   putchar(c);
  }
  else if (c=='/')				/* opening comment ? */
  {
   c=getc(f);
   if (c!='*')					/* no, recover */
   {
    putchar('/');
    ungetc(c,f);
   }
   else
   {
    int p;
    putchar(' ');				/* replace comment with space */
    do
    {
     p=c;
     c=getc(f);
    } while (c!='/' || p!='*');
   }
  }
  else
  {
   putchar(c);
  }
 }
}

int main(int argc, char *argv[])
{
 process(stdin);
 return 0;
}

Solution 4 - C++

There is a http://www.bdc.cx/software/stripcmt/">stripcmt</a> program than can do this:

> StripCmt is a simple utility written in C to remove comments from C, C++, and Java source files. In the grand tradition of Unix text processing programs, it can function either as a FIFO (First In - First Out) filter or accept arguments on the command line.

(per https://stackoverflow.com/users/23118/hlovdal">hlovdal</a>'s answer to: https://stackoverflow.com/questions/241327/python-snippet-to-remove-c-and-c-comments/1294188#1294188">question about Python code for this)

Solution 5 - C++

This is a perl script to remove //one-line and /* multi-line */ comments

  #!/usr/bin/perl
   
  undef $/;
  $text = <>;
  
  $text =~ s/\/\/[^\n\r]*(\n\r)?//g;
  $text =~ s/\/\*+([^*]|\*(?!\/))*\*+\///g;
   
  print $text;

It requires your source file as a command line argument. Save the script to a file, let say remove_comments.pl and call it using the following command: perl -w remove_comments.pl [your source file]

Hope it will be helpful

Solution 6 - C++

I had this problem as well. I found this tool (Cpp-Decomment) , which worked for me. However it ignores if the comment line extends to next line. Eg:

// this is my comment \
comment continues ...

In this case, I couldn't find a way in the program so just searched for ignored lines and fixed in manually. I believe there would be an option for that or maybe you could change the program's source file to do so.

Solution 7 - C++

Because you use C, you might want to use something that's "natural" to C. You can use the C preprocessor to just remove comments. The examples given below work with the C preprocessor from GCC. They should work the same or in similar ways with other C perprocessors as well.

For C, use

cpp -dD -fpreprocessed -o output.c input.c

It also works for removing comments from JSON, for example like this:

cpp -P -o - - <input.json >output.json

In case your C preprocessor is not accessible directly, you can try to replace cpp with cc -E, which calls the C compiler telling it to stop after the preprocessor stage. In case your C compiler binary is not cc you can replace cc with the name of your C compiler binary, for example clang. Note that not all preprocessors support -fpreprocessed.

Solution 8 - C++

I write a C program using standard C library, around 200 lines, which removes comments of C source code file. qeatzy/removeccomments

behavior

  1. C style comment that span multi-line or occupy entire line gets zeroed out.
  2. C style comment in the middle of a line remain unchanged. eg, void init(/* do initialization */) {...}
  3. C++ style comment that occupy entire line gets zeroed out.
  4. C string literal being respected, via checking " and \".
  5. handles line-continuation. If previous line ending with \, current line is part of previous line.
  6. line number remain the same. Zeroed out lines or part of line become empty.

testing & profiling

I tested with largest cpython source code that contains many comments. In this case it do the job correctly and fast, 2-5 faster than gcc

time gcc -fpreprocessed -dD -E Modules/unicodeobject.c > res.c 2>/dev/null
time ./removeccomments < Modules/unicodeobject.c > result.c

usage

/path/to/removeccomments < input_file > output_file

Solution 9 - C++

I Believe If you use one statement you can easily remove Comments from C

perl -i -pe ‘s/\\\*(.*)/g’ file.c This command Use for removing * C style comments 
perl -i -pe 's/\\\\(.*)/g' file.cpp This command Use for removing \ C++ Style Comments

Only Problem with this command it cant remove comments that contains more than one line.but by using this regEx you can easily implement logic for Multiline Removing comments

Solution 10 - C++

Recently I wrote some Ruby code to solve this problem. I have considered following exceptions:

  • comment in strings
  • multiple line comment on one line, fix greedy match.
  • multiple lines on multiple lines

Here is the code:

It uses following code to preprocess each line in case those comments appear in strings. If it appears in your code, uh, bad luck. You can replace it with a more complex strings.

  • MUL_REPLACE_LEFT = "MUL_REPLACE_LEFT"
  • MUL_REPLACE_RIGHT = "MUL_REPLACE_RIGHT"
  • SIG_REPLACE = "SIG_REPLACE"

USAGE: ruby -w inputfile outputfile

Solution 11 - C++

I know it's late, but I thought I'd share my code and my first attempt at writing a compiler.

Note: this does not account for "\*/" inside a multiline comment e.g /\*...."*/"...\*. Then again, gcc 4.8.1 doesn't either.

void function_removeComments(char *pchar_sourceFile, long long_sourceFileSize)
{
    long long_sourceFileIndex = 0;
 	long long_logIndex = 0;

    int int_EOF = 0;

	for (long_sourceFileIndex=0; long_sourceFileIndex < long_sourceFileSize;long_sourceFileIndex++)
	{
		if (pchar_sourceFile[long_sourceFileIndex] == '/' && int_EOF == 0)
    	{
	    	long_logIndex = long_sourceFileIndex;  // log "possible" start of comment

		    if (long_sourceFileIndex+1 < long_sourceFileSize)  // array bounds check given we want to peek at the next character
			{
    			if (pchar_sourceFile[long_sourceFileIndex+1] == '*') // multiline comment
	    		{
    				for (long_sourceFileIndex+=2;long_sourceFileIndex < long_sourceFileSize; long_sourceFileIndex++)
	    			{
		    			if (pchar_sourceFile[long_sourceFileIndex] == '*' && pchar_sourceFile[long_sourceFileIndex+1] == '/')
			    		{
                            // since we've found the end of multiline comment
                            // we want to increment the pointer position two characters
                            // accounting for "*" and "/"
				    		long_sourceFileIndex+=2;  

					    	break;  // terminating sequence found
						}
    				}

                    // didn't find terminating sequence so it must be eof.
                    // set file pointer position to initial comment start position
                    // so we can display file contents.
	    			if (long_sourceFileIndex >= long_sourceFileSize)
		    		{
			    		long_sourceFileIndex = long_logIndex;

				    	int_EOF = 1;
					}
    			}
	    		else if (pchar_sourceFile[long_sourceFileIndex+1] == '/')  // single line comment
		    	{
                    // since we know its a single line comment, increment file pointer
                    // until we encounter a new line or its the eof 
			    	for (long_sourceFileIndex++; pchar_sourceFile[long_sourceFileIndex] != '\n' && pchar_sourceFile[long_sourceFileIndex] != '\0'; long_sourceFileIndex++);
				}
    		}
	    }

		printf("%c",pchar_sourceFile[long_sourceFileIndex]);
     }
 }

Solution 12 - C++

#include<stdio.h>
{        
        char c;
        char tmp = '\0';
        int inside_comment = 0;  // A flag to check whether we are inside comment
        while((c = getchar()) != EOF) {
                if(tmp) {
                        if(c == '/') {
                                while((c = getchar()) !='\n');
                                tmp = '\0';
                                putchar('\n');
                                continue;
                        }else if(c == '*') {
                                inside_comment = 1;
                                while(inside_comment) {
                                        while((c = getchar()) != '*');
                                        c = getchar();
                                        if(c == '/'){
                                                tmp = '\0';
                                                inside_comment = 0;
                                        }
                                }
                                continue;
                        }else {
                                putchar(c);
                                tmp = '\0';
                                continue;
                        }
                }
                if(c == '/') {
                        tmp = c;
                } else {
                        putchar(c);
                }
        }
        return 0;
}

This program runs for both the conditions i.e // and /...../

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMikeView Question on Stackoverflow
Solution 1 - C++Josh LeeView Answer on Stackoverflow
Solution 2 - C++Jonathan LefflerView Answer on Stackoverflow
Solution 3 - C++lhfView Answer on Stackoverflow
Solution 4 - C++cheView Answer on Stackoverflow
Solution 5 - C++VladimirView Answer on Stackoverflow
Solution 6 - C++Halil KaskavalciView Answer on Stackoverflow
Solution 7 - C++Christian HujerView Answer on Stackoverflow
Solution 8 - C++qeatzyView Answer on Stackoverflow
Solution 9 - C++Poseidon_GeekView Answer on Stackoverflow
Solution 10 - C++chunyang.wenView Answer on Stackoverflow
Solution 11 - C++johnnyView Answer on Stackoverflow
Solution 12 - C++Vivek PatelView Answer on Stackoverflow