Regular expression for a string literal in flex/lex

CRegexLexString LiteralsFlex Lexer

C Problem Overview


I'm experimenting to learn flex and would like to match string literals. My code currently looks like:

"\""([^\n\"\\]*(\\[.\n])*)*"\""        {/*matches string-literal*/;}

I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.

I am probably just writing a poor regular expression or one incompatible with flex. Please advise!

C Solutions


Solution 1 - C

A string consists of a quote mark

"

followed by zero or more of either an escaped anything

\\.

or a non-quote character, non-backslash character

[^"\\]

and finally a terminating quote

"

Put it all together, and you've got

\"(\\.|[^"\\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.

Solution 2 - C

For a single line... you can use this:

\"([^\\\"]|\\.)*\"  {/*matches string-literal on a single line*/;}

Solution 3 - C

How about using a start state...

int enter_dblquotes = 0;

%x DBLQUOTES %%

" { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>" { if (enter_dblquotes){ handle_this_dblquotes(yytext); BEGIN(INITIAL); / revert back to normal */ enter_dblquotes--; } } ...more rules follow...

It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

Solution 4 - C

Paste my code snippet about handling string in flex, hope inspire your thinking.

Use Start Condition to handle string literal will be more scalable and clear.

%x SINGLE_STRING

%%

\"                          BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
  \n                        yyerror("the string misses \" to termiate before newline");
  <<EOF>>                   yyerror("the string misses \" to terminate before EOF");
  ([^\\\"]|\\.)*            {/* do your work like save in here */}
  \"                        BEGIN(INITIAL);
  .                         ;
}

Solution 5 - C

This is what we use in Zolang for single line string literals with embedded templates ${...}

\"(\$\{.*\}|\\.|[^\"\\])*\"

Solution 6 - C

An answer that arrives late but which can be useful for the next one who will need it:

\"(([^\"]|\\\")*[^\\])?\"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionThomasView Question on Stackoverflow
Solution 1 - CJonathan FeinbergView Answer on Stackoverflow
Solution 2 - CPeteView Answer on Stackoverflow
Solution 3 - Ct0mm13bView Answer on Stackoverflow
Solution 4 - CpwxcooView Answer on Stackoverflow
Solution 5 - CÞorvaldur RúnarssonView Answer on Stackoverflow
Solution 6 - CdavidView Answer on Stackoverflow