Why doesn't the compiler report a missing semicolon?

CFreeform

C Problem Overview


I have this simple program:

#include <stdio.h>

struct S
{
    int i;
};

void swap(struct S *a, struct S *b)
{
    struct S temp;
    temp = *a    /* Oops, missing a semicolon here... */
    *a = *b;
    *b = temp;
}

int main(void)
{
    struct S a = { 1 };
    struct S b = { 2 };

    swap(&a, &b);
}

As seen on e.g. ideone.com this gives an error:

> prog.c: In function 'swap': > prog.c:12:5: error: invalid operands to binary * (have 'struct S' and 'struct S *') > *a = *b; > ^

Why doesn't the compiler detect the missing semicolon?


Note: This question and its answer is motivated by this question. While there are other questions similar to this, I didn't find anything mentioning the free-form capacity of the C language which is what is causing this and related errors.

C Solutions


Solution 1 - C

C is a free-form language. That means you could format it in many ways and it will still be a legal program.

For example a statement like

a = b * c;

could be written like

a=b*c;

or like

a
=
b
*
c
;

So when the compiler see the lines

temp = *a
*a = *b;

it thinks it means

temp = *a * a = *b;

That is of course not a valid expression and the compiler will complain about that instead of the missing semicolon. The reason it's not valid is because a is a pointer to a structure, so *a * a is trying to multiply a structure instance (*a) with a pointer to a structure (a).

While the compiler can't detect the missing semicolon, it also reports the totally unrelated error on the wrong line. This is important to notice because no matter how much you look at the line where the error is reported, there is no error there. Sometimes problems like this will need you to look at previous lines to see if they are okay and without errors.

Sometimes you even have to look in another file to find the error. For example if a header file is defining a structure the last it does in the header file, and the semicolon terminating the structure is missing, then the error will not be in the header file but in the file that includes the header file.

And sometimes it gets even worse: if you include two (or more) header files, and the first one contains an incomplete declaration, most probably the syntax error will be indicated in the second header file.


Related to this is the concept of follow-up errors. Some errors, typically due to missing semicolons actually, are reported as multiple errors. This is why it's important to start from the top when fixing errors, as fixing the first error might make multiple errors disappear.

This of course can lead to fixing one error at a time and frequent recompiles which can be cumbersome with large projects. Recognizing such follow-up errors is something that comes with experience though, and after seeing them a few times it's easier to dig out the real errors and fix more than one error per recompile.

Solution 2 - C

> Why doesn't the compiler detect the missing semicolon?

There are three things to remember.

  1. Line endings in C are just ordinary whitespace.
  2. * in C can be both a unary and a binary operator. As a unary operator it means "dereference", as a binary operator it means "multiply".
  3. The difference between unary and binary operators is determined from the context in which they are seen.

The result of these two facts is when we parse.

 temp = *a    /* Oops, missing a semicolon here... */
 *a = *b;

The first and last * are interpreted as unary but the second * is interpreted as binary. From a syntax perspective, this looks OK.

It is only after parsing when the compiler tries to interpret the operators in the context of their operand types that an error is seen.

Solution 3 - C

Some good answers above, but I will elaborate.

temp = *a *a = *b;

This is actually a case of x = y = z; where both x and y are assigned the value of z.

What you are saying is the contents of address (a times a) become equal to the contents of b, as does temp.

In short, *a *a = <any integer value> is a valid statement. As previously pointed out, the first * dereferences a pointer, while the second multiplies two values.

Solution 4 - C

Most compilers parse source files in order, and report the line where they discover that something was wrong. The first 12 lines of your C program could be the start of a valid (error-free) C program. The first 13 lines of your program cannot. Some compilers will note the location of things they encounter which are not errors in and of themselves, and in most cases won't trigger errors later in the code, but might not be valid in combination with something else. For example:

int foo;
...
float foo;

The declaration int foo; by itself would be perfectly fine. Likewise the declaration float foo;. Some compilers may record the line number where the first declaration appeared, and associate an informational message with that line, to help the programmer identify cases where the earlier definition is actually the erroneous one. Compilers may also keep the line numbers associated with something like a do, which can be reported if the associated while does not appear in the right place. For cases where the likely location of the problem would be immediately preceding the line where the error is discovered, however, compilers generally don't bother adding an extra report for the position.

Solution 5 - C

There's a Polish movie titled "Nic Śmiesznego" ("Nothing Funny"). Here's an excerpt of relevant dialogue from a scene that shows exactly why the compiler developers may be a bit shy to proclaim such missing semicolons with reckless abandon.

Director: What do you mean "this one"?! Are you saying that this object is in my field of view? Point it out with your finger, because I want to believe I'm dreaming.

Adam: This, right here (points).

Director: This? What is this?!

Adam: What do you mean? It's a forest.

Director: Can you tell me why the bloody hell would I need a forest?

Adam: How come "bloody hell"? Here, in the screenplay, it says a forest, it says...

Director: In the screenplay? Find it in this screenplay for me.

Adam: Here: (reads) "When they came upon the crest of the road, in front of them appeared a forest"

Director: Flip the page.

Adam: Oh crap...

Director: Read it for me.

Adam: in front of them appeared a forest... of headstones.

See, it's not generally possible to tell in advance that you really meant a forest and not a forest of headstones.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSome programmer dudeView Question on Stackoverflow
Solution 1 - CSome programmer dudeView Answer on Stackoverflow
Solution 2 - CplugwashView Answer on Stackoverflow
Solution 3 - CMawg says reinstate MonicaView Answer on Stackoverflow
Solution 4 - CsupercatView Answer on Stackoverflow
Solution 5 - CKuba hasn't forgotten MonicaView Answer on Stackoverflow