Why can't dead code detection be fully solved by a compiler?

Compiler Theory

Compiler Theory Problem Overview


The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.

Compiler Theory Solutions


Solution 1 - Compiler Theory

The dead code problem is related to the https://en.wikipedia.org/wiki/Halting_problem">Halting problem.

Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.

How does this relate to dead code?

The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.

How do you transfer an algorithm for dead code into an algorithm for the Halting problem?

Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).


Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).

These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.

Solution 2 - Compiler Theory

Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!

C# program

using System;
using YourVendor.Compiler;

class Program
{
    static void Main(string[] args)
    {
        string quine_text = @"using System;
using YourVendor.Compiler;
            
class Program
{{
    static void Main(string[] args)
    {{
        string quine_text = @{0}{1}{0};
        quine_text = string.Format(quine_text, (char)34, quine_text);
            
        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {{
            System.Console.WriteLine({0}Dead code!{0});
        }}
    }}
}}";
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {
            System.Console.WriteLine("Dead code!");
        }
    }
}

If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.

But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.

So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.

Solution 3 - Compiler Theory

If the halting problem is too obscure, think of it this way.

Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):

 for (BigInt n = 4; ; n+=2) {
     if (!isGoldbachsConjectureTrueFor(n)) {
         print("Conjecture is false for at least one value of n\n");
         exit(0);
     }
 }

Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n

Now, logically the above must either be the equivalent of:

 for (; ;) {
 }

(i.e. an infinite loop) or

print("Conjecture is false for at least one value of n\n");

as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:

 String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
 for (BigInt n = 0; n < 2**2048; n++) {
     String s = n.toString();
     if (sha256(s).equals(target)) {
         print("Found SHA value\n");
         exit(0);
     }
 }
 print("Not found SHA value\n");

we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.

Solution 4 - Compiler Theory

I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.

Dim methodName As String

If foo Then
    methodName = "Bar"
Else
    methodName = "Qux"
End If

Application.Run(methodName)

The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.

Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying

Application.Run("Bar")

Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.

Solution 5 - Compiler Theory

A simple example:

int readValueFromPort(const unsigned int portNum);

int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
    std::cout << "Hey! X < 2" << std::endl;
}
else
{
    std::cout << "X is too big!" << std::endl;
}

Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.

However in this basic example:

bool boolVal = /*anything boolean*/;

if (boolVal)
{
  // Do A
}
else if (!boolVal)
{
  // Do B
}
else
{
  // Do C
}

Here the compiler can calculate out the the else block is a dead code. So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.

EDIT

Sometimes the data is just not available at the compilation time:

// File a.cpp
bool boolMethod();

bool boolVal = boolMethod();

if (boolVal)
{
  // Do A
}
else
{
  // Do B
}

//............
// File b.cpp
bool boolMethod()
{
    return true;
}

While compiling a.cpp the compiler cannot know that boolMethod always returns true.

Solution 6 - Compiler Theory

Unconditional dead code can be detected and removed by advanced compilers.

But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.

There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.

Solution 7 - Compiler Theory

The compiler will always lack some context information. E.g. you might know, that a double value never exeeds 2, because that is a feature of the mathematical function, you use from a library. The compiler does not even see the code in the library, and it can never know all features of all mathematical functions, and detect all weired and complicated ways to implement them.

Solution 8 - Compiler Theory

The compiler doesn't necessarily see the whole program. I could have a program that calls a shared library, which calls back into a function in my program which isn't called directly.

So a function which is dead with respect to the library it's compiled against could become alive if that library was changed at runtime.

Solution 9 - Compiler Theory

If a compiler could eliminate all dead code accurately, it would be called an interpreter.

Consider this simple scenario:

if (my_func()) {
  am_i_dead();
}

my_func() can contain arbitrary code and in order for the compiler to determine whether it returns true or false, it will either have to run the code or do something that is functionally equivalent to running the code.

The idea of a compiler is that it only performs a partial analysis of the code, thus simplifying the job of a separate running environment. If you perform a full analysis, that isn't a compiler any more.


If you consider the compiler as a function c(), where c(source)=compiled code, and the running environment as r(), where r(compiled code)=program output, then to determine the output for any source code you have to compute the value of r(c(source code)). If calculating c() requires the knowledge of the value of r(c()) for any input, there is no need for a separate r() and c(): you can just derive a function i() from c() such that i(source)=program output.

Solution 10 - Compiler Theory

Take a function

void DoSomeAction(int actnumber) 
{
    switch(actnumber) 
    {
        case 1: Action1(); break;
        case 2: Action2(); break;
        case 3: Action3(); break;
    }
}

Can you prove that actnumber will never be 2 so that Action2() is never called...?

Solution 11 - Compiler Theory

Others have commented on the halting problem and so forth. These generally apply to portions of functions. However it can be hard/impossible to know whether even an entire type (class/etc) is used or not.

In .NET/Java/JavaScript and other runtime driven environments there's nothing stopping types being loaded via reflection. This is popular with dependency injection frameworks, and is even harder to reason about in the face of deserialisation or dynamic module loading.

The compiler cannot know whether such types would be loaded. Their names could come from external config files at runtime.

You might like to search around for tree shaking which is a common term for tools that attempt to safely remove unused subgraphs of code.

Solution 12 - Compiler Theory

I disagree about the halting problem. I wouldn't call such code dead even though in reality it will never be reached.

Instead, lets consider:

for (int N = 3;;N++)
  for (int A = 2; A < int.MaxValue; A++)
    for (int B = 2; B < int.MaxValue; B++)
    {
      int Square = Math.Pow(A, N) + Math.Pow(B, N);
      float Test = Math.Sqrt(Square);
      if (Test == Math.Trunc(Test))
        FermatWasWrong();
    }

private void FermatWasWrong()
{
  Press.Announce("Fermat was wrong!");
  Nobel.Claim();
}

(Ignore the type and overflow errors) Dead code?

Solution 13 - Compiler Theory

Look at this example:

public boolean isEven(int i){

    if(i % 2 == 0)
        return true;
    if(i % 2 == 1)
        return false;
    return false;
}

The compiler can't know that an int can only be even or odd. Therefore the compiler must be able to understand the semantics of your code. How should this be implemented? The compiler can't ensure that the lowest return will never be executed. Therefore the compiler can't detect the dead code.

Categories

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLearnerView Question on Stackoverflow
Solution 1 - Compiler TheoryRealSkepticView Answer on Stackoverflow
Solution 2 - Compiler TheoryJoker_vDView Answer on Stackoverflow
Solution 3 - Compiler TheoryablighView Answer on Stackoverflow
Solution 4 - Compiler TheoryRubberDuckView Answer on Stackoverflow
Solution 5 - Compiler TheoryAlex Lop.View Answer on Stackoverflow
Solution 6 - Compiler TheorydspfnderView Answer on Stackoverflow
Solution 7 - Compiler TheorycdonatView Answer on Stackoverflow
Solution 8 - Compiler TheorySteve SanbegView Answer on Stackoverflow
Solution 9 - Compiler TheorybiziclopView Answer on Stackoverflow
Solution 10 - Compiler TheoryCiaPanView Answer on Stackoverflow
Solution 11 - Compiler TheoryDrew NoakesView Answer on Stackoverflow
Solution 12 - Compiler TheoryLoren PechtelView Answer on Stackoverflow
Solution 13 - Compiler TheoryuserView Answer on Stackoverflow