Technically, how do variadic functions work? How does printf work?

C++CVariadic Functions

C++ Problem Overview


I know I can use va_arg to write my own variadic functions, but how do variadic functions work under the hood, i.e. on the assembly instruction level?

E.g., how is it possible that printf takes a variable number of arguments?


  • No rule without exception. There is no language C/C++, however, this question can be answered for both of them

C++ Solutions


Solution 1 - C++

The C and C++ standard do not have any requirement on how it has to work. A complying compiler may well decide to emit chained lists, std::stack<boost::any> or even magical pony dust (as per @Xeo's comment) under the hood.

However, it is usually implemented as follows, even though transformations like inlining or passing arguments in the CPU registers may not leave anything of the discussed code.

Please also note that this answer specifically describes a downwards growing stack in the visuals below; also, this answer is a simplification just to demonstrate the scheme (please see https://en.wikipedia.org/wiki/Stack_frame).

How can a function be called with a non-fixed number of arguments

This is possible because the underlying machine architecture has a so-called "stack" for every thread. The stack is used to pass arguments to functions. For example, when you have:

foobar("%d%d%d", 3,2,1);

Then this compiles to an assembler code like this (exemplary and schematically, actual code might look different); note that the arguments are passed from right to left:

push 1
push 2
push 3
push "%d%d%d"
call foobar

Those push-operations fill up the stack:

              []   // empty stack
-------------------------------
push 1:       [1]  
-------------------------------
push 2:       [1]
              [2]
-------------------------------
push 3:       [1]
              [2]
              [3]  // there is now 1, 2, 3 in the stack
-------------------------------
push "%d%d%d":[1]
              [2]
              [3]
              ["%d%d%d"]
-------------------------------
call foobar   ...  // foobar uses the same stack!

The bottom stack element is called the "Top of Stack", often abbreviated "TOS".

The foobar function would now access the stack, beginning at the TOS, i.e. the format string, which as you remember was pushed last. Imagine stack is your stack pointer , stack[0] is the value at the TOS, stack[1] is one above the TOS, and so forth:

format_string <- stack[0]

... and then parses the format-string. While parsing, it recognozies the %d-tokens, and for each, loads one more value from the stack:

format_string <- stack[0]
offset <- 1
while (parsing):
    token = tokenize_one_more(format_string)
    if (needs_integer (token)):
        value <- stack[offset]
        offset = offset + 1
    ...

This is of course a very incomplete pseudo-code that demonstrates how the function has to rely on the arguments passed to find out how much it has to load and remove from the stack.

Security

This reliance on user-provided arguments is also one of the biggest security issues present (see https://cwe.mitre.org/top25/). Users may easily use a variadic function wrongly, either because they did not read the documentation, or forgot to adjust the format string or argument list, or because they are plain evil, or whatever. See also Format String Attack.

C Implementation

In C and C++, variadic functions are used together with the va_list interface. While the pushing onto the stack is intrinsic to those languages (in K+R C you could even forward-declare a function without stating its arguments, but still call it with any number and kind arguments), reading from such an unknown argument list is interfaced through the va_...-macros and va_list-type, which basically abstracts the low-level stack-frame access.

Solution 2 - C++

Variadic functions are defined by the standard, with very few explicit restrictions. Here is an example, lifted from cplusplus.com.

/* va_start example */
#include <stdio.h>      /* printf */
#include <stdarg.h>     /* va_list, va_start, va_arg, va_end */

void PrintFloats (int n, ...)
{
  int i;
  double val;
  printf ("Printing floats:");
  va_list vl;
  va_start(vl,n);
  for (i=0;i<n;i++)
  {
    val=va_arg(vl,double);
    printf (" [%.2f]",val);
  }
  va_end(vl);
  printf ("\n");
}

int main ()
{
  PrintFloats (3,3.14159,2.71828,1.41421);
  return 0;
}

The assumptions are roughly as follows.

  1. There must be (at least one) first, fixed, named argument. The ... actually does nothing, except tell the compiler to do the right thing.
  2. The fixed argument(s) provide information about how many variadic arguments there are, by an unspecified mechanism.
  3. From the fixed argument it is possible for the va_start macro to return an object that allows arguments to be retrieved. The type is va_list.
  4. From the va_list object it is possible for va_arg to iterate over each variadic argument, and coerce its value it into a compatible type.
  5. Something weird might have happened in va_start so va_end makes things right again.

In the most usual stack-based situation, the va_list is merely a pointer to the arguments sitting on the stack, and va_arg increments the pointer, casts it and dereferences it to a value. Then va_start initialises that pointer by some simple arithmetic (and inside knowledge) and va_end does nothing. There is no strange assembly language, just some inside knowledge of where things lie on the stack. Read the macros in the standard headers to find out what that is.

Some compilers (MSVC) will require a specific calling sequence, whereby the caller will release the stack rather than the callee.

Functions like printf work exactly like this. The fixed argument is a format string, which allows the number of arguments to be calculated.

Functions like vsprintf pass the va_list object as a normal argument type.

If you need more or lower level detail, please add to the question.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSebastian MachView Question on Stackoverflow
Solution 1 - C++Sebastian MachView Answer on Stackoverflow
Solution 2 - C++david.pfxView Answer on Stackoverflow