Why does this .c file #include itself?

CC Preprocessor

C Problem Overview


Why does this .c file #include itself?

vsimple.c

#define USIZE 8
#include "vsimple.c"
#undef USIZE

#define USIZE 16
#include "vsimple.c"
#undef USIZE

#define USIZE 32
#include "vsimple.c"
#undef USIZE

#define USIZE 64
#include "vsimple.c"
#undef USIZE

C Solutions


Solution 1 - C

The file includes itself so the same source code can be used to generate 4 different sets of functions for specific values of the macro USIZE.

The #include directives are actually enclosed in an #ifndef, which limits the recursion to a single level:

#ifndef USIZE

// common definitions
...
//

#define VSENC vsenc
#define VSDEC vsdec

#define USIZE 8
#include "vsimple.c"
#undef USIZE

#define USIZE 16
#include "vsimple.c"
#undef USIZE

#define USIZE 32
#include "vsimple.c"
#undef USIZE

#define USIZE 64
#include "vsimple.c"
#undef USIZE

#else // defined(USIZE)

// macro expanded size specific functions using token pasting

...

#define uint_t TEMPLATE3(uint, USIZE, _t)

unsigned char *TEMPLATE2(VSENC, USIZE)(uint_t *__restrict in, size_t n, unsigned char *__restrict out) {
   ...
}

unsigned char *TEMPLATE2(VSDEC, USIZE)(unsigned char *__restrict ip, size_t n, uint_t *__restrict op) {
   ...
}

#endif

The functions defined in this module are

// vsencNN: compress array with n unsigned (NN bits in[n]) values to the buffer out. Return value = end of compressed output buffer out
unsigned char *vsenc8( unsigned char  *__restrict in, size_t n, unsigned char  *__restrict out);
unsigned char *vsenc16(unsigned short *__restrict in, size_t n, unsigned char  *__restrict out);
unsigned char *vsenc32(unsigned       *__restrict in, size_t n, unsigned char  *__restrict out);
unsigned char *vsenc64(uint64_t       *__restrict in, size_t n, unsigned char  *__restrict out);

// vsdecNN: decompress buffer into an array of n unsigned values. Return value = end of compressed input buffer in
unsigned char *vsdec8( unsigned char  *__restrict in, size_t n, unsigned char  *__restrict out);
unsigned char *vsdec16(unsigned char  *__restrict in, size_t n, unsigned short *__restrict out);
unsigned char *vsdec32(unsigned char  *__restrict in, size_t n, unsigned       *__restrict out);
unsigned char *vsdec64(unsigned char  *__restrict in, size_t n, uint64_t       *__restrict out);

They are all expanded from the two function definitions in vsimple.c:

unsigned char *TEMPLATE2(VSENC, USIZE)(uint_t *__restrict in, size_t n, unsigned char *__restrict out) {
   ...
}

unsigned char *TEMPLATE2(VSDEC, USIZE)(unsigned char *__restrict ip, size_t n, uint_t *__restrict op) {
   ...
}

The TEMPLATE2 and TEMPLATE3 macros are defined in conf.h as

#define TEMPLATE2_(_x_, _y_) _x_##_y_
#define TEMPLATE2(_x_, _y_) TEMPLATE2_(_x_,_y_)

#define TEMPLATE3_(_x_,_y_,_z_) _x_##_y_##_z_
#define TEMPLATE3(_x_,_y_,_z_) TEMPLATE3_(_x_, _y_, _z_)

These macros are classic preprocessor constructions to create identifiers via token pasting. TEMPLATE2 and TEMPLATE2_ are more commonly called GLUE and XGLUE.

The function template starts as:

unsigned char *TEMPLATE2(VSENC, USIZE)(uint_t *__restrict in, size_t n, unsigned char *__restrict out) ...

It is expanded in the first recursive inclusion with USIZE defined as 8 into:

unsigned char *vsenc8(uint8_t *__restrict in, size_t n, unsigned char *__restrict out) ...

The second recursive inclusion, with USIZE defined as 16, expands the template as:

unsigned char *vsenc16(uint16_t *__restrict in, size_t n, unsigned char *__restrict out) ...

and 2 more inclusions define vsenc32 and vsenc64.

This usage of preprocessed source code is more common with separate files: one for the instantiating part that has all the common definitions, especially the macros, and a separate file for the code and data templates, which is included multiple times with different macro definitions.

A good example is the generation of enums, string and structures arrays from atom and opcode definitions in QuickJS.

Solution 2 - C

The accepted answer by @chqrlie 100% explains what is happening. This is just a complementary commentary.

If using C++ we could define two template functions to provide all the implementations of vsenc8, vsenc16, vsenc32, vsenc64 and vsdec8, vsdec16, vsdec32, vsdec64. In contrast, however, C is a very simple language and does not support templates. A common trick to have the same power (in uglier packaging) is to use the dumb macro facility of the language and let the C preprocessor do the equivalent job for us. Most C programmers of some experience will encounter and use this kind of construct repeatedly during their careers.

What makes this particular example a bit tedious to understand is that the implementation file is unconventionally parsed 5 times to first have some preparatory definitions and then the four variants of the two functions. The first pass (inside #ifndef USIZE preprocessor block) will have the needed macros and non-variant stuff defined and will recursively #include itself four times with different USIZE values (8, 16, 32, 64) as template values. When recursively included, the corresponding #else preprocessor block is parsed with the result of two functions generated according to the value of USIZE macro constant used for the pass.

More conventional, conceptually clearer, and instantly understandable way would be to include the template functions from a different files, say vsimple.impl:

#define USIZE 8
/* Generate vsenc8(), vsdec8()... */ 
#include "vsimple.impl"

#undef USIZE
#define USIZE 16
/* Generate vsenc16(), vsdec16()... */ 
#include "vsimple.impl"

#undef USIZE
#define USIZE 32
/* Generate vsenc32(), vsdec32()... */ 
#include "vsimple.impl"

#undef USIZE
#define USIZE 64
/* Generate vsenc64(), vsdec64()... */ 
#include "vsimple.impl"

The including file vsimple.c and the included file vsimple.impl could then also be organized to be much clearer in what they define and when. Most C programmers would recognize the implementational pattern and immediately know what is happening.

Recursively and repeatedly including itself this way invokes a feel of hocus-pocery which would attract applauds for an obfuscated C competition entry but not for mission critical production code.

Solution 3 - C

It is recursion. Recursion is useful here because C preprocessing doesn't have looping. Moreover, it's desirable to perpetrate a trick using one file rather than proliferating multiple files.

Suppose you were required to write a function which interpolates the integers from 1 to 5 into a template string, and prints that on standard output. Suppose you were required to write exactly one function and were prohibited from loops, or copy-pasted printf statements. You might do this:

void template_print(const char *fmt, int n)
{
   if (n == 0) {
     template_print(fmt, 1);
     template_print(fmt, 2);
     template_print(fmt, 3);
     template_print(fmt, 4);
     template_print(fmt, 5);
   } else {
     /* imagine there are 30 lines of statements here we don't want
        to repeat five times. */
     printf(fmt, n);
   }
}

The top-level call to this is then template_print("whatever %d\n", 0) distinguished by the zero argument of the n parameter.

The top-level call with 0 is like the initial processing of vsimple.c without USIZE being defined.

The requirement for one function is analogous to being required to produce a single, self-contained .c file rather than an "interface" file which #includes an implementation.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRodrigo BelliView Question on Stackoverflow
Solution 1 - CchqrlieView Answer on Stackoverflow
Solution 2 - CFooFView Answer on Stackoverflow
Solution 3 - CKazView Answer on Stackoverflow