Fast way to write data from a std::vector to a text file

C++ Problem Overview

I currently write a set of doubles from a vector to a text file like this:

std::ofstream fout;
fout.open("vector.txt");

for (l = 0; l < vector.size(); l++)
	fout << std::setprecision(10) << vector.at(l) << std::endl;

fout.close();

But this is taking a lot of time to finish. Is there a faster or more efficient way to do this? I would love to see and learn it.

C++ Solutions

Solution 1 - C++

std::ofstream fout("vector.txt");
fout << std::setprecision(10);

for(auto const& x : vector)
    fout << x << '\n';

Everything I changed had theoretically worse performance in your version of the code, but the std::endl was the real killer. std::vector::at (with bounds checking, which you don't need) would be the second, then the fact that you did not use iterators.

Why default-construct a std::ofstream and then call open, when you can do it in one step? Why call close when RAII (the destructor) takes care of it for you? You can also call

fout << std::setprecision(10)

just once, before the loop.

As noted in the comment below, if your vector is of elements of fundamental type, you might get a better performance with for(auto x : vector). Measure the running time / inspect the assembly output.

Just to point out another thing that caught my eyes, this:

for(l = 0; l < vector.size(); l++)

What is this l? Why declare it outside the loop? It seems you don't need it in the outer scope, so don't. And also the post-increment.

The result:

for(size_t l = 0; l < vector.size(); ++l)

I'm sorry for making code review out of this post.

Solution 2 - C++

Your algorithm has two parts:

Serialize double numbers to a string or character buffer.
Write results to a file.

The first item can be improved (> 20%) by using sprintf or fmt. The second item can be sped up by caching results to a buffer or extending the output file stream buffer size before writing results to the output file. You should not use std::endl because it is much slower than using "\n". If you still want to make it faster then write your data in binary format. Below is my complete code sample which includes my proposed solutions and one from Edgar Rokyan. I also included Ben Voigt and Matthieu M suggestions in test code.

#include <algorithm>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <vector>

// https://github.com/fmtlib/fmt
#include "fmt/format.h"

// http://uscilab.github.io/cereal/
#include "cereal/archives/binary.hpp"
#include "cereal/archives/json.hpp"
#include "cereal/archives/portable_binary.hpp"
#include "cereal/archives/xml.hpp"
#include "cereal/types/string.hpp"
#include "cereal/types/vector.hpp"

// https://github.com/DigitalInBlue/Celero
#include "celero/Celero.h"

template <typename T> const char* getFormattedString();
template<> const char* getFormattedString<double>(){return "%g\n";}
template<> const char* getFormattedString<float>(){return "%g\n";}
template<> const char* getFormattedString<int>(){return "%d\n";}
template<> const char* getFormattedString<size_t>(){return "%lu\n";}


namespace {
    constexpr size_t LEN = 32;

    template <typename T> std::vector<T> create_test_data(const size_t N) {
        std::vector<T> data(N);
        for (size_t idx = 0; idx < N; ++idx) {
            data[idx] = idx;
        }
        return data;
    }

    template <typename Iterator> auto toVectorOfChar(Iterator begin, Iterator end) {
        char aLine[LEN];
        std::vector<char> buffer;
        buffer.reserve(std::distance(begin, end) * LEN);
        const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
        std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {
            sprintf(aLine, fmtStr, value);
            for (size_t idx = 0; aLine[idx] != 0; ++idx) {
                buffer.push_back(aLine[idx]);
            }
        });
        return buffer;
    }

    template <typename Iterator>
    auto toStringStream(Iterator begin, Iterator end, std::stringstream &buffer) {
        char aLine[LEN];
        const char* fmtStr = getFormattedString<typename std::iterator_traits<Iterator>::value_type>();
        std::for_each(begin, end, [&buffer, &aLine, &fmtStr](const auto value) {            
            sprintf(aLine, fmtStr, value);
            buffer << aLine;
        });
    }

    template <typename Iterator> auto toMemoryWriter(Iterator begin, Iterator end) {
        fmt::MemoryWriter writer;
        std::for_each(begin, end, [&writer](const auto value) { writer << value << "\n"; });
        return writer;
    }

    // A modified version of the original approach.
    template <typename Container>
    void original_approach(const Container &data, const std::string &fileName) {
        std::ofstream fout(fileName);
        for (size_t l = 0; l < data.size(); l++) {
            fout << data[l] << std::endl;
        }
        fout.close();
    }

    // Replace std::endl by "\n"
    template <typename Iterator>
    void improved_original_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::ofstream fout(fileName);
        const size_t len = std::distance(begin, end) * LEN;
        std::vector<char> buffer(len);
        fout.rdbuf()->pubsetbuf(buffer.data(), len);
        for (Iterator it = begin; it != end; ++it) {
            fout << *it << "\n";
        }
        fout.close();
    }

    //
    template <typename Iterator>
    void edgar_rokyan_solution(Iterator begin, Iterator end, const std::string &fileName) {
        std::ofstream fout(fileName);
        std::copy(begin, end, std::ostream_iterator<double>(fout, "\n"));
    }
    
    // Cache to a string stream before writing to the output file
    template <typename Iterator>
    void stringstream_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::stringstream buffer;
        for (Iterator it = begin; it != end; ++it) {
            buffer << *it << "\n";
        }

        // Now write to the output file.
        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }

    // Use sprintf
    template <typename Iterator>
    void sprintf_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::stringstream buffer;
        toStringStream(begin, end, buffer);
        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }

    // Use fmt::MemoryWriter (https://github.com/fmtlib/fmt)
    template <typename Iterator>
    void fmt_approach(Iterator begin, Iterator end, const std::string &fileName) {
        auto writer = toMemoryWriter(begin, end);
        std::ofstream fout(fileName);
        fout << writer.str();
        fout.close();
    }

    // Use std::vector<char>
    template <typename Iterator>
    void vector_of_char_approach(Iterator begin, Iterator end, const std::string &fileName) {
        std::vector<char> buffer = toVectorOfChar(begin, end);
        std::ofstream fout(fileName);
        fout << buffer.data();
        fout.close();
    }

    // Use cereal (http://uscilab.github.io/cereal/).
    template <typename Container, typename OArchive = cereal::BinaryOutputArchive>
    void use_cereal(Container &&data, const std::string &fileName) {
        std::stringstream buffer;
        {
            OArchive oar(buffer);
            oar(data);
        }

        std::ofstream fout(fileName);
        fout << buffer.str();
        fout.close();
    }
}

// Performance test input data.
constexpr int NumberOfSamples = 5;
constexpr int NumberOfIterations = 2;
constexpr int N = 3000000;
const auto double_data = create_test_data<double>(N);
const auto float_data = create_test_data<float>(N);
const auto int_data = create_test_data<int>(N);
const auto size_t_data = create_test_data<size_t>(N);

CELERO_MAIN

BASELINE(DoubleVector, original_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("origsol.txt");
    original_approach(double_data, fileName);
}

BENCHMARK(DoubleVector, improved_original_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("improvedsol.txt");
    improved_original_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, edgar_rokyan_solution, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("edgar_rokyan_solution.txt");
    edgar_rokyan_solution(double_data.cbegin(), double_data.end(), fileName);
}

BENCHMARK(DoubleVector, stringstream_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("stringstream.txt");
    stringstream_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, sprintf_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("sprintf.txt");
    sprintf_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, fmt_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("fmt.txt");
    fmt_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, vector_of_char_approach, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("vector_of_char.txt");
    vector_of_char_approach(double_data.cbegin(), double_data.cend(), fileName);
}

BENCHMARK(DoubleVector, use_cereal, NumberOfSamples, NumberOfIterations) {
    const std::string fileName("cereal.bin");
    use_cereal(double_data, fileName);
}

// Benchmark double vector
BASELINE(DoubleVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(double_data.cbegin(), double_data.cend(), output);
}

BENCHMARK(DoubleVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(double_data.cbegin(), double_data.cend()));
}

BENCHMARK(DoubleVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(double_data.cbegin(), double_data.cend()));
}

// Benchmark float vector
BASELINE(FloatVectorConversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(float_data.cbegin(), float_data.cend(), output);
}

BENCHMARK(FloatVectorConversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(float_data.cbegin(), float_data.cend()));
}

BENCHMARK(FloatVectorConversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(float_data.cbegin(), float_data.cend()));
}

// Benchmark int vector
BASELINE(int_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(int_data.cbegin(), int_data.cend(), output);
}

BENCHMARK(int_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(int_data.cbegin(), int_data.cend()));
}

BENCHMARK(int_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(int_data.cbegin(), int_data.cend()));
}

// Benchmark size_t vector
BASELINE(size_t_conversion, toStringStream, NumberOfSamples, NumberOfIterations) {
    std::stringstream output;
    toStringStream(size_t_data.cbegin(), size_t_data.cend(), output);
}

BENCHMARK(size_t_conversion, toMemoryWriter, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toMemoryWriter(size_t_data.cbegin(), size_t_data.cend()));
}

BENCHMARK(size_t_conversion, toVectorOfChar, NumberOfSamples, NumberOfIterations) {
    celero::DoNotOptimizeAway(toVectorOfChar(size_t_data.cbegin(), size_t_data.cend()));
}

Below are the performance results obtained in my Linux box using clang-3.9.1 and -O3 flag. I use Celero to collect all performance results.

Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVector    | original_approa | Null            |              10 |               4 |         1.00000 |   3650309.00000 |            0.27 | 
DoubleVector    | improved_origin | Null            |              10 |               4 |         0.47828 |   1745855.00000 |            0.57 | 
DoubleVector    | edgar_rokyan_so | Null            |              10 |               4 |         0.45804 |   1672005.00000 |            0.60 | 
DoubleVector    | stringstream_ap | Null            |              10 |               4 |         0.41514 |   1515377.00000 |            0.66 | 
DoubleVector    | sprintf_approac | Null            |              10 |               4 |         0.35436 |   1293521.50000 |            0.77 | 
DoubleVector    | fmt_approach    | Null            |              10 |               4 |         0.34916 |   1274552.75000 |            0.78 | 
DoubleVector    | vector_of_char_ | Null            |              10 |               4 |         0.34366 |   1254462.00000 |            0.80 | 
DoubleVector    | use_cereal      | Null            |              10 |               4 |         0.04172 |    152291.25000 |            6.57 | 
Complete.

I also benchmark for numeric to string conversion algorithms to compare the performance of std::stringstream, fmt::MemoryWriter, and std::vector.

Timer resolution: 0.001000 us
-----------------------------------------------------------------------------------------------------------------------------------------------
     Group      |   Experiment    |   Prob. Space   |     Samples     |   Iterations    |    Baseline     |  us/Iteration   | Iterations/sec  | 
-----------------------------------------------------------------------------------------------------------------------------------------------
DoubleVectorCon | toStringStream  | Null            |              10 |               4 |         1.00000 |   1272667.00000 |            0.79 | 
FloatVectorConv | toStringStream  | Null            |              10 |               4 |         1.00000 |   1272573.75000 |            0.79 | 
int_conversion  | toStringStream  | Null            |              10 |               4 |         1.00000 |    248709.00000 |            4.02 | 
size_t_conversi | toStringStream  | Null            |              10 |               4 |         1.00000 |    252063.00000 |            3.97 | 
DoubleVectorCon | toMemoryWriter  | Null            |              10 |               4 |         0.98468 |   1253165.50000 |            0.80 | 
DoubleVectorCon | toVectorOfChar  | Null            |              10 |               4 |         0.97146 |   1236340.50000 |            0.81 | 
FloatVectorConv | toMemoryWriter  | Null            |              10 |               4 |         0.98419 |   1252454.25000 |            0.80 | 
FloatVectorConv | toVectorOfChar  | Null            |              10 |               4 |         0.97369 |   1239093.25000 |            0.81 | 
int_conversion  | toMemoryWriter  | Null            |              10 |               4 |         0.11741 |     29200.50000 |           34.25 | 
int_conversion  | toVectorOfChar  | Null            |              10 |               4 |         0.87105 |    216637.00000 |            4.62 | 
size_t_conversi | toMemoryWriter  | Null            |              10 |               4 |         0.13746 |     34649.50000 |           28.86 | 
size_t_conversi | toVectorOfChar  | Null            |              10 |               4 |         0.85345 |    215123.00000 |            4.65 | 
Complete.

From the above tables we can see that:

Edgar Rokyan solution is 10% slower than the stringstream solution. The solution that use fmt library is the best for three studied data types which are double, int, and size_t. sprintf + std::vector solution is 1% faster than the fmt solution for double data type. However, I do not recommend solutions that use sprintf for production code because they are not elegant (still written in C style) and do not work out of the box for different data types such as int or size_t.
The benchmark results also show that fmt is the superrior integral data type serialization since it is at least 7x faster than other approaches.
We can speed up this algorithm 10x if we use the binary format. This approach is significantly faster than writing to a formatted text file because we only do raw copy from the memory to the output. If you want to have more flexible and portable solutions then try cereal or boost::serialization or protocol-buffer. According to this performance study cereal seem to be the fastest.

Solution 3 - C++

You can also use a rather neat form of outputting contents of any vector into the file, with a help of iterators and copy function.

std::ofstream fout("vector.txt");
fout.precision(10);

std::copy(numbers.begin(), numbers.end(),
    std::ostream_iterator<double>(fout, "\n"));

This solutions is practically the same with LogicStuff's solution in terms of execution time. But it also illustrates how to print the contents just with a single copy function which, as I suppose, looks pretty well.

Solution 4 - C++

OK, I'm sad that there are three solutions that attempt to give you a fish, but no solution that attempts to teach you how to fish.

When you have a performance problem, the solution is to use a profiler, and fix whatever the problem the profiler shows.

Converting double-to-string for 300,000 doubles will not take 3 minutes on any computer that has shipped in the last 10 years.

Writing 3 MB of data to disk (an average size of 300,000 doubles) will not take 3 minutes on any computer that has shipped in the last 10 years.

If you profile this, my guess is that you'll find that fout gets flushed 300,000 times, and that flushing is slow, because it may involve blocking, or semi-blocking, I/O. Thus, you need to avoid the blocking I/O. The typical way of doing that is to prepare all your I/O to a single buffer (create a stringstream, write to that) and then write that buffer to a physical file in one go. This is the solution hungptit describes, except I think that what's missing is explaining WHY that solution is a good solution.

Or, to put it another way: What the profiler will tell you is that calling write() (on Linux) or WriteFile() (on Windows) is much slower than just copying a few bytes into a memory buffer, because it's a user/kernel level transition. If std::endl causes this to happen for each double, you're going to have a bad (slow) time. Replace it with something that just stays in user space and puts data in RAM!

If that's still not fast enough, it may be that the specific-precision version of operator<<() on strings is slow or involves unnecessary overhead. If so, you may be able to further speed up the code by using sprintf() or some other potentially faster function to generate data into the in-memory buffer, before you finally write the entire buffer to a file in one go.

Solution 5 - C++

You have two main bottlenecks in your program: output and formatting text.

To increase performance, you will want to increase the amount of data output per call. For example, 1 output transfer of 500 characters is faster than 500 transfers of 1 character.

My recommendation is you format the data to a big buffer, then block write the buffer.

Here's an example:

char buffer[1024 * 1024];
unsigned int buffer_index = 0;
const unsigned int size = my_vector.size();
for (unsigned int i = 0; i < size; ++i)
{
  signed int characters_formatted = snprintf(&buffer[buffer_index],
                                             (1024 * 1024) - buffer_index,
                                             "%.10f", my_vector[i]);
  if (characters_formatted > 0)
  {
      buffer_index += (unsigned int) characters_formatted;
  }
}
cout.write(&buffer[0], buffer_index);

You should first try changing optimization settings in your compiler before messing with the code.

Solution 6 - C++

Here is a slightly different solution: save your doubles in binary form.

int fd = ::open("/path/to/the/file", O_WRONLY /* whatever permission */);
::write(fd, &vector[0], vector.size() * sizeof(vector[0]));

Since you mentioned that you have 300k doubles, which equals to 300k * 8 bytes = 2.4M, you can save all of them to local disk file in less than 0.1 second. The only drawback of this method is saved file is not as readable as string representation, but a HexEditor can solve that problem.

If you prefer more robust way, there are plenty of serialization libraries/tools available on line. They provide more benefits, such as language-neutral, machine-independent, flexible compression algorithm, etc. Those are the two I usually use:

Content Type	Original Author	Original Content on Stackoverflow
Question	Diego Fernando Pava	View Question on Stackoverflow
Solution 1 - C++	LogicStuff	View Answer on Stackoverflow
Solution 2 - C++	hungptit	View Answer on Stackoverflow
Solution 3 - C++	Edgar Rokjān	View Answer on Stackoverflow
Solution 4 - C++	Jon Watte	View Answer on Stackoverflow
Solution 5 - C++	Thomas Matthews	View Answer on Stackoverflow
Solution 6 - C++	Jason L.	View Answer on Stackoverflow