Antonin Carette

A journey into a wild pointer

The weird unreachable code

Posted at — Feb 12, 2023

Last week I heard about a weird behaviour of a C++ program on Twitter.
The code is:

#include <iostream>

int main() {
    while (1);
}

void unreachable() {
    std::cout << "wait... WHAT?!" << std::endl;
}

Pretty simple, right? Two functions, one infinite loop, and the unreachable function that is never called…
As you might expect the program will run until the user forces the program to quit, without any standard output, no catch at all.

However, under a certain condition, it could also be executed in less than a millisecond and print

wait... WHAT?!

I reproduced it on godbolt here for you if you do not believe me: https://godbolt.org/z/MYhb9Ezv1.

Now, let me explain what is really happening under the hood.

The singular points of the program

To explain the behaviour, let’s cut out the different parts of the program.

The main function

int main() {
    while (1);
}

The main function runs an infinite loop that is doing absolutely nothing, and no instruction to break it somewhere. The objective of the main program is very very simple: loop over and do not stop.
Also, there is no explicit return statement, but the compiler implicitely appends a return 0 statement at the end of the code block.

So, the final code, understood by the compiler, results as it is:

int main() {
    while (1) {};
    return 0;
}

The unreachable function

The unreachable is also very simple: print the message “wait… WHAT?!” and return. The important thing to look here is that the function definition and implementation is done after the main function, and this is very important to understand the behaviour of the final execution of the program.

The compiler settings

Here, we are using clang13.0 (c++ frontend), with the following arguments:

Let’s play

The main issue here is not coming from the program, or from the developer. Actually, it comes from the compiler.

In godbolt, looking for the code generated by clang, the issue is on the first lines:

main:                                   # @main
unreachable():                          # @unreachable()
        push    rbx
        mov     edi, offset std::cout
        mov     esi, offset .L.str
        ...

As you can see the @main label is empty here, which results in executing the instructions contained in the next label… which is @unreachable.

Let’s interpret what the compiler might want to understand here…

  1. I am executing the main function…
  2. Hey, I don’t need to return anything from the main function, as the loop is running forever! Ok, let’s remove it…
  3. Hey, the loop is doing nothing, and the dev asks me to optimize the code… let’s remove the loop then…

And then, we have a main function that does not contain any return statement, or any other instruction.
Just an empty function.

This behaviour is quiet complex to understand at first as developers may want to say: “but, this is the same than any other void functions right?”.
Absolutely… not.

If you take a look at the generated asm code from the following C++ function

void do_nothing() {
    return;
}

it does contain a return statement:

do_nothing():                        # @do_nothing()
        ret                                  # <<<--- INSTRUCTION TO EXIT do_nothing()

So, the following program, using clang13.0 and -O1 optimization, does not execute the loop but does not print anything:

#include <iostream>

int main() {
    while (1) {};
    return 0;
}

void do_nothing() {
    return;
}

void unreachable() {
    std::cout << "wait... WHAT?!" << std::endl;
}

as the generated asm code proves:

main:                                   # @main
do_nothing():                           # @do_nothing()
        ret                                     # <<<--- QUIT HERE
unreachable():                          # @unreachable() - NOT EXECUTED
        push    rbx
        mov     edi, offset std::cout
        mov     esi, offset .L.str
        ...

The strange case of an infinite loop

So, why does this occurs using an infinite loop ?

Because an infinite loop is simply… an undefined behavior.

The C standard (C11 to be more precise) says this:

An iteration statement whose controlling expression is not a constant expression that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression, may be assumed by the implementation to terminate… This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

In this case, while(1) is an example of a constant expression, and then it may not be assumed to terminate. So, the compiler may (or may not) decide to optimize it removing it from the code… so does clang.

This behaviour has been implemented for clang13.0, and is still used today in clang15.0.

A fix that could implement clang for this case would be:

  1. emit a label (like jmp loop) to jump over the infinite loop, and not remove it,
  2. or remove the infinite loop but emit another call to return 0 from main.