"#include" a text file in a C program as a char[]

CIncludeC Preprocessor

C Problem Overview


Is there a way to include an entire text file as a string in a C program at compile-time?

something like:

  • file.txt:

      This is
      a little
      text file
    
  • main.c:

      #include <stdio.h>
      int main(void) {
         #blackmagicinclude("file.txt", content)
         /*
         equiv: char[] content = "This is\na little\ntext file";
         */
         printf("%s", content);
      }
    

obtaining a little program that prints on stdout "This is a little text file"

At the moment I used an hackish python script, but it's butt-ugly and limited to only one variable name, can you tell me another way to do it?

C Solutions


Solution 1 - C

I'd suggest using (unix util)xxd for this. you can use it like so

$ echo hello world > a
$ xxd -i a

outputs:

unsigned char a[] = {
  0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a
};
unsigned int a_len = 12;

Solution 2 - C

The question was about C but in case someone tries to do it with C++11 then it can be done with only little changes to the included text file thanks to the new raw string literals:

In C++ do this:

const char *s =
#include "test.txt"
;

In the text file do this:

R"(Line 1
Line 2
Line 3
Line 4
Line 5
Line 6)"

So there must only be a prefix at the top of the file and a suffix at the end of it. Between it you can do what you want, no special escaping is necessary as long as you don't need the character sequence )". But even this can work if you specify your own custom delimiter:

R"=====(Line 1
Line 2
Line 3
Now you can use "( and )" in the text file, too.
Line 5
Line 6)====="

Solution 3 - C

I like kayahr's answer. If you don't want to touch the input files however, and if you are using CMake, you can add the delimeter character sequences on the file. The following CMake code, for instance, copies the input files and wraps their content accordingly:

function(make_includable input_file output_file)
    file(READ ${input_file} content)
    set(delim "for_c++_include")
    set(content "R\"${delim}(\n${content})${delim}\"")
    file(WRITE ${output_file} "${content}")
endfunction(make_includable)

# Use like
make_includable(external/shaders/cool.frag generated/cool.frag)

Then include in c++ like this:

constexpr char *test =
#include "generated/cool.frag"
;

Solution 4 - C

You have two possibilities:

  1. Make use of compiler/linker extensions to convert a file into a binary file, with proper symbols pointing to the begin and end of the binary data. See this answer: Include binary file with GNU ld linker script.
  2. Convert your file into a sequence of character constants that can initialize an array. Note you can't just do "" and span multiple lines. You would need a line continuation character (\), escape " characters and others to make that work. Easier to just write a little program to convert the bytes into a sequence like '\xFF', '\xAB', ...., '\0' (or use the unix tool xxd described by another answer, if you have it available!):

Code:

#include <stdio.h>

int main() {
    int c;
    while((c = fgetc(stdin)) != EOF) {
        printf("'\\x%X',", (unsigned)c);
    }
    printf("'\\0'"); // put terminating zero
}

(not tested). Then do:

char my_file[] = {
#include "data.h"
};

Where data.h is generated by

cat file.bin | ./bin2c > data.h

Solution 5 - C

ok, inspired by Daemin's post i tested the following simple example :

a.data:

"this is test\n file\n"

test.c:

int main(void)
{
    char *test = 
#include "a.data"
    ;
    return 0;
}

gcc -E test.c output:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "test.c"

int main(void)
{
    char *test =
# 1 "a.data" 1
"this is test\n file\n"
# 6 "test.c" 2
    ;
    return 0;
}

So it's working but require data surrounded with quotation marks.

Solution 6 - C

You can do this using objcopy:

objcopy --input binary --output elf64-x86-64 myfile.txt myfile.o

Now you have an object file you can link into your executable which contains symbols for the beginning, end, and size of the content from myfile.txt.

Solution 7 - C

If you're willing to resort to some dirty tricks you can get creative with raw string literals and #include for certain types of files.

For example, say I want to include some SQL scripts for SQLite in my project and I want to get syntax highlighting but don't want any special build infrastructure. I can have this file test.sql which is valid SQL for SQLite where -- starts a comment:

--x, R"(--
SELECT * from TestTable
WHERE field = 5
--)"

And then in my C++ code I can have:

int main()
{
    auto x = 0;
    const char* mysql = (
#include "test.sql"
    );

    cout << mysql << endl;
}

The output is:

--
SELECT * from TestTable
WHERE field = 5
--

Or to include some Python code from a file test.py which is a valid Python script (because # starts a comment in Python and pass is a no-op):

#define pass R"(
pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass )"
pass

And then in the C++ code:

int main()
{
    const char* mypython = (
#include "test.py"
    );

    cout << mypython << endl;
}

Which will output:

pass
def myfunc():
    print("Some Python code")

myfunc()
#undef pass
#define pass

It should be possible to play similar tricks for various other types of code you might want to include as a string. Whether or not it is a good idea I'm not sure. It's kind of a neat hack but probably not something you'd want in real production code. Might be ok for a weekend hack project though.

Solution 8 - C

You need my xtr utility but you can do it with a bash script. This is a script I call bin2inc. The first parameter is the name of the resulting char[] variable. The second parameter is the name of the file. The output is C include file with the file content encoded (in lowercase hex) as the variable name given. The char array is zero terminated, and the length of the data is stored in $variableName_length

#!/bin/bash

fileSize ()

{

    [ -e "$1" ]  && {

        set -- `ls -l "$1"`;

        echo $5;

    }

}

echo unsigned char $1'[] = {'
./xtr -fhex -p 0x -s ', ' < "$2";
echo '0x00'
echo '};';
echo '';
echo unsigned long int ${1}_length = $(fileSize "$2")';'

YOU CAN GET XTR HERE xtr (character eXTRapolator) is GPLV3

Solution 9 - C

Why not link the text into the program and use it as a global variable! Here is an example. I'm considering using this to include Open GL shader files within an executable since GL shaders need to be compiled for the GPU at runtime.

Solution 10 - C

I reimplemented xxd in python3, fixing all of xxd's annoyances:

  • Const correctness
  • string length datatype: int → size_t
  • Null termination (in case you might want that)
  • C string compatible: Drop unsigned on the array.
  • Smaller, readable output, as you would have written it: Printable ascii is output as-is; other bytes are hex-encoded.

Here is the script, filtered by itself, so you can see what it does:

pyxxd.c

#include <stddef.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

const char pyxxd[] =
"#!/usr/bin/env python3\n"
"\n"
"import sys\n"
"import re\n"
"\n"
"def is_printable_ascii(byte):\n"
"    return byte >= ord(' ') and byte <= ord('~')\n"
"\n"
"def needs_escaping(byte):\n"
"    return byte == ord('\\\"') or byte == ord('\\\\')\n"
"\n"
"def stringify_nibble(nibble):\n"
"    if nibble < 10:\n"
"        return chr(nibble + ord('0'))\n"
"    return chr(nibble - 10 + ord('a'))\n"
"\n"
"def write_byte(of, byte):\n"
"    if is_printable_ascii(byte):\n"
"        if needs_escaping(byte):\n"
"            of.write('\\\\')\n"
"        of.write(chr(byte))\n"
"    elif byte == ord('\\n'):\n"
"        of.write('\\\\n\"\\n\"')\n"
"    else:\n"
"        of.write('\\\\x')\n"
"        of.write(stringify_nibble(byte >> 4))\n"
"        of.write(stringify_nibble(byte & 0xf))\n"
"\n"
"def mk_valid_identifier(s):\n"
"    s = re.sub('^[^_a-z]', '_', s)\n"
"    s = re.sub('[^_a-z0-9]', '_', s)\n"
"    return s\n"
"\n"
"def main():\n"
"    # `xxd -i` compatibility\n"
"    if len(sys.argv) != 4 or sys.argv[1] != \"-i\":\n"
"        print(\"Usage: xxd -i infile outfile\")\n"
"        exit(2)\n"
"\n"
"    with open(sys.argv[2], \"rb\") as infile:\n"
"        with open(sys.argv[3], \"w\") as outfile:\n"
"\n"
"            identifier = mk_valid_identifier(sys.argv[2]);\n"
"            outfile.write('#include <stddef.h>\\n\\n');\n"
"            outfile.write('extern const char {}[];\\n'.format(identifier));\n"
"            outfile.write('extern const size_t {}_len;\\n\\n'.format(identifier));\n"
"            outfile.write('const char {}[] =\\n\"'.format(identifier));\n"
"\n"
"            while True:\n"
"                byte = infile.read(1)\n"
"                if byte == b\"\":\n"
"                    break\n"
"                write_byte(outfile, ord(byte))\n"
"\n"
"            outfile.write('\";\\n\\n');\n"
"            outfile.write('const size_t {}_len = sizeof({}) - 1;\\n'.format(identifier, identifier));\n"
"\n"
"if __name__ == '__main__':\n"
"    main()\n"
"";

const size_t pyxxd_len = sizeof(pyxxd) - 1;

Usage (this extracts the script):

#include <stdio.h>

extern const char pyxxd[];
extern const size_t pyxxd_len;

int main()
{
    fwrite(pyxxd, 1, pyxxd_len, stdout);
}

Solution 11 - C

Here's a hack I use for Visual C++. I add the following Pre-Build Event (where file.txt is the input and file_txt.h is the output):

@(
  echo const char text[] = R"***(
  type file.txt
  echo ^^^)***";
) > file_txt.h

I then include file_txt.h where I need it.

This isn't perfect, as it adds \n at the start and \n^ at the end, but that's not a problem to handle and I like the simplicity of this solution. If anyone can refine is to get rid of the extra chars, that would be nice.

Solution 12 - C

What might work is if you do something like:

int main()
{
    const char* text = "
#include "file.txt"
";
    printf("%s", text);
    return 0;
}

Of course you'll have to be careful with what is actually in the file, making sure there are no double quotes, that all appropriate characters are escaped, etc.

Therefore it might be easier if you just load the text from a file at runtime, or embed the text directly into the code.

If you still wanted the text in another file you could have it in there, but it would have to be represented there as a string. You would use the code as above but without the double quotes in it. For example:

file.txt

"Something evil\n"\
"this way comes!"

main.cpp

int main()
{
    const char* text =
#include "file.txt"
;
    printf("%s", text);
    return 0;
}

So basically having a C or C++ style string in a text file that you include. It would make the code neater because there isn't this huge lot of text at the start of the file.

Solution 13 - C

Even if it can be done at compile time (I don't think it can in general), the text would likely be the preprocessed header rather than the files contents verbatim. I expect you'll have to load the text from the file at runtime or do a nasty cut-n-paste job.

Solution 14 - C

Hasturkun's answer using the xxd -i option is excellent. If you want to incorporate the conversion process (text -> hex include file) directly into your build the hexdump.c tool/library recently added a capability similar to xxd's -i option (it doesn't give you the full header - you need to provide the char array definition - but that has the advantage of letting you pick the name of the char array):

http://25thandclement.com/~william/projects/hexdump.c.html

It's license is a lot more "standard" than xxd and is very liberal - an example of using it to embed an init file in a program can be seen in the CMakeLists.txt and scheme.c files here:

https://github.com/starseeker/tinyscheme-cmake

There are pros and cons both to including generated files in source trees and bundling utilities - how to handle it will depend on the specific goals and needs of your project. hexdump.c opens up the bundling option for this application.

Solution 15 - C

I think it is not possible with the compiler and preprocessor alone. gcc allows this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

	printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
			STRGF(
#				define hostname my_dear_hostname
				hostname
			)
			"\n" );

But unfortunately not this:

#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)

	printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
			STRGF(
#				include "/etc/hostname"
			)
			"\n" );

The error is:

/etc/hostname: In function ‘init_module’:
/etc/hostname:1:0: error: unterminated argument list invoking macro "STRGF"

Solution 16 - C

I had similar issues, and for small files the aforementioned solution of Johannes Schaub worked like a charm for me.

However, for files that are a bit larger, it ran into issues with the character array limit of the compiler. Therefore, I wrote a small encoder application that converts file content into a 2D character array of equally sized chunks (and possibly padding zeros). It produces output textfiles with 2D array data like this:

const char main_js_file_data[8][4]= {
    {'\x69','\x73','\x20','\0'},
    {'\x69','\x73','\x20','\0'},
    {'\x61','\x20','\x74','\0'},
    {'\x65','\x73','\x74','\0'},
    {'\x20','\x66','\x6f','\0'},
    {'\x72','\x20','\x79','\0'},
    {'\x6f','\x75','\xd','\0'},
    {'\xa','\0','\0','\0'}};

where 4 is actually a variable MAX_CHARS_PER_ARRAY in the encoder. The file with the resulting C code, called, for example "main_js_file_data.h" can then easily be inlined into the C++ application, for example like this:

#include "main_js_file_data.h"

Here is the source code of the encoder:

#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>


#define MAX_CHARS_PER_ARRAY 2048


int main(int argc, char * argv[])
{
    // three parameters: input filename, output filename, variable name
    if (argc < 4)
    {
        return 1;
    }

    // buffer data, packaged into chunks
    std::vector<char> bufferedData;

    // open input file, in binary mode
    {    
        std::ifstream fStr(argv[1], std::ios::binary);
        if (!fStr.is_open())
        {
            return 1;
        }
        
        bufferedData.assign(std::istreambuf_iterator<char>(fStr), 
                            std::istreambuf_iterator<char>()     );
    }

    // write output text file, containing a variable declaration,
    // which will be a fixed-size two-dimensional plain array
    {
        std::ofstream fStr(argv[2]);
        if (!fStr.is_open())
        {
            return 1;
        }
        const std::size_t numChunks = std::size_t(std::ceil(double(bufferedData.size()) / (MAX_CHARS_PER_ARRAY - 1)));
        fStr << "const char " << argv[3] << "[" << numChunks           << "]"    <<
                                            "[" << MAX_CHARS_PER_ARRAY << "]= {" << std::endl;
        std::size_t count = 0;
        fStr << std::hex;
        while (count < bufferedData.size())
        {
            std::size_t n = 0;
            fStr << "{";
            for (; n < MAX_CHARS_PER_ARRAY - 1 && count < bufferedData.size(); ++n)
            {
                fStr << "'\\x" << int(unsigned char(bufferedData[count++])) << "',";
            }
            // fill missing part to reach fixed chunk size with zero entries
            for (std::size_t j = 0; j < (MAX_CHARS_PER_ARRAY - 1) - n; ++j)
            {
                fStr << "'\\0',";
            }
            fStr << "'\\0'}";
            if (count < bufferedData.size())
            {
                fStr << ",\n";
            }
        }
        fStr << "};\n";
    }

    return 0;
}

Solution 17 - C

This problem was irritating me and xxd doesn't work for my use case because it made the variable called something like __home_myname_build_prog_cmakelists_src_autogen when I tried to script it in, so I made a utility to solve this exact problem:

https://github.com/Exaeta/brcc

It generates a source and header file and allows you to explicitly set the name of each variable so then you can use them via std::begin(arrayname) and std::end(arrayname).

I incorporated it into my cmake project like so:

add_custom_command(
  OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/binary_resources.hpp ${CMAKE_CURRENT_BINARY_DIR}/binary_resources.cpp
  COMMAND brcc ${CMAKE_CURRENT_BINARY_DIR}/binary_resources RGAME_BINARY_RESOURCES_HH txt_vertex_shader ${CMAKE_CURRENT_BINARY_DIR}/src/vertex_shader1.glsl
  DEPENDS src/vertex_shader1.glsl)

With small tweaks I suppose it could be made to work for C as well.

Solution 18 - C

If you are using CMake, you probably may be interested in writing CMake preprocessing script like the following:

cmake/ConvertLayout.cmake

function(convert_layout file include_dir)
    get_filename_component(name ${file} NAME_WE)
    get_filename_component(directory ${file} DIRECTORY)
    get_filename_component(directory ${directory} NAME)
    string(TOUPPER ${name} NAME)
    string(TOUPPER ${directory} DIRECTORY)

    set(new_file ${include_dir}/${directory}/${name}.h)

    if (${file} IS_NEWER_THAN  ${new_file})
        file(READ ${file} content)

        string(REGEX REPLACE "\"" "\\\\\"" content "${content}")
        string(REGEX REPLACE "[\r\n]" "\\\\n\"\\\\\n\"" content "${content}")
        set(content "\"${content}\"")
        set(content "#ifndef ${DIRECTORY}_${NAME}\n#define ${DIRECTORY}_${NAME} ${content} \n#endif")
        message(STATUS "${content}")

        file(WRITE ${new_file} "${content}")

        message(STATUS "Generated layout include file ${new_file} from ${file}")
    endif()
endfunction()

function(convert_layout_directory layout_dir include_dir)
    file(GLOB layouts ${layout_dir}/*)
    foreach(layout ${layouts})
        convert_layout(${layout} ${include_dir})
    endforeach()
endfunction()

your CMakeLists.txt

include(cmake/ConvertLayout.cmake)
convert_layout_directory(layout ${CMAKE_BINARY_DIR}/include)
include_directories(${CMAKE_BINARY_DIR}/include)

somewhere in c++

#include "layout/menu.h"
Glib::ustring ui_info = LAYOUT_MENU;

Solution 19 - C

I like @Martin R.'s answer because, as it says, it doesn't touch the input file and automates the process. To improve on this, I added the capability to automatically split up large files that exceed compiler limits. The output file is written as an array of smaller strings which can then be reassembled in code. The resulting script, based on @Martin R.'s version, and an example is included here:

https://github.com/skillcheck/cmaketools.git

The relevant CMake setup is:

make_includable( LargeFile.h
    ${CMAKE_CURRENT_BINARY_DIR}/generated/LargeFile.h
    "c++-include" "L" LINE_COUNT FILE_SIZE
)

The source code is then:

static std::vector<std::wstring> const chunks = {
#include "generated/LargeFile.h"
};

std::string contents =
    std::accumulate( chunks.begin(), chunks.end(), std::wstring() );

Solution 20 - C

You can use assembly for this:

asm("fileData:    .incbin \"filename.ext\"");
asm("fileDataEnd: db 0x00");

extern char fileData[];
extern char fileDataEnd[];
const int fileDataSize = fileDataEnd - fileData + 1;

Solution 21 - C

in x.h

"this is a "
"buncha text"

in main.c

#include <stdio.h>
int main(void)
{
    char *textFileContents =
#include "x.h"
    ;

    printf("%s\n", textFileContents);

    return 0
}

ought to do the job.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionZeDView Question on Stackoverflow
Solution 1 - CHasturkunView Answer on Stackoverflow
Solution 2 - CkayahrView Answer on Stackoverflow
Solution 3 - CMartin R.View Answer on Stackoverflow
Solution 4 - CJohannes Schaub - litbView Answer on Stackoverflow
Solution 5 - CIlyaView Answer on Stackoverflow
Solution 6 - CJohn ZwinckView Answer on Stackoverflow
Solution 7 - CmattnewportView Answer on Stackoverflow
Solution 8 - Cuser735796View Answer on Stackoverflow
Solution 9 - CTechDragonView Answer on Stackoverflow
Solution 10 - Cuser2394284View Answer on Stackoverflow
Solution 11 - CET3DView Answer on Stackoverflow
Solution 12 - CDaeminView Answer on Stackoverflow
Solution 13 - CDaniel PaullView Answer on Stackoverflow
Solution 14 - CstarseekerView Answer on Stackoverflow
Solution 15 - Cnot-a-userView Answer on Stackoverflow
Solution 16 - CvolzotanView Answer on Stackoverflow
Solution 17 - CrnplView Answer on Stackoverflow
Solution 18 - CLotusBro98View Answer on Stackoverflow
Solution 19 - CSkillcheckView Answer on Stackoverflow
Solution 20 - CSomebodyView Answer on Stackoverflow
Solution 21 - CEvilTeachView Answer on Stackoverflow