Regular expression for a string literal in flex/lex
CRegexLexString LiteralsFlex LexerC Problem Overview
I'm experimenting to learn flex and would like to match string literals. My code currently looks like:
"\""([^\n\"\\]*(\\[.\n])*)*"\"" {/*matches string-literal*/;}
I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.
I am probably just writing a poor regular expression or one incompatible with flex. Please advise!
C Solutions
Solution 1 - C
A string consists of a quote mark
"
followed by zero or more of either an escaped anything
\\.
or a non-quote character, non-backslash character
[^"\\]
and finally a terminating quote
"
Put it all together, and you've got
\"(\\.|[^"\\])*\"
The delimiting quotes are escaped because they are Flex meta-characters.
Solution 2 - C
For a single line... you can use this:
\"([^\\\"]|\\.)*\" {/*matches string-literal on a single line*/;}
Solution 3 - C
How about using a start state...
int enter_dblquotes = 0;%x DBLQUOTES %%
" { BEGIN(DBLQUOTES); enter_dblquotes++; }
<DBLQUOTES>" { if (enter_dblquotes){ handle_this_dblquotes(yytext); BEGIN(INITIAL); / revert back to normal */ enter_dblquotes--; } } ...more rules follow...
It was similar to that effect (flex uses %s
or %x
to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.
Solution 4 - C
Paste my code snippet about handling string in flex, hope inspire your thinking.
Use Start Condition to handle string literal will be more scalable and clear.
%x SINGLE_STRING
%%
\" BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
\n yyerror("the string misses \" to termiate before newline");
<<EOF>> yyerror("the string misses \" to terminate before EOF");
([^\\\"]|\\.)* {/* do your work like save in here */}
\" BEGIN(INITIAL);
. ;
}
Solution 5 - C
This is what we use in Zolang for single line string literals with embedded templates ${...}
\"(\$\{.*\}|\\.|[^\"\\])*\"
Solution 6 - C
An answer that arrives late but which can be useful for the next one who will need it:
\"(([^\"]|\\\")*[^\\])?\"