How can I read and parse CSV files in C++?

C++ Problem Overview

I need to load and use CSV file data in C++. At this point it can really just be a comma-delimited parser (ie don't worry about escaping new lines and commas). The main need is a line-by-line parser that will return a vector for the next line each time the method is called.

I found this article which looks quite promising: http://www.boost.org/doc/libs/1_35_0/libs/spirit/example/fundamental/list_parser.cpp

I've never used Boost's Spirit, but am willing to try it. But only if there isn't a more straightforward solution I'm overlooking.

C++ Solutions

Solution 1 - C++

If you don't care about escaping comma and newline,
AND you can't embed comma and newline in quotes (If you can't escape then...)
then its only about three lines of code (OK 14 ->But its only 15 to read the whole file).

std::vector<std::string> getNextLineAndSplitIntoTokens(std::istream& str)
{
    std::vector<std::string>   result;
    std::string                line;
    std::getline(str,line);

    std::stringstream          lineStream(line);
    std::string                cell;

    while(std::getline(lineStream,cell, ','))
    {
        result.push_back(cell);
    }
    // This checks for a trailing comma with no data after it.
    if (!lineStream && cell.empty())
    {
        // If there was a trailing comma then add an empty element.
        result.push_back("");
    }
    return result;
}

I would just create a class representing a row.
Then stream into that object:

#include <iterator>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <string>

class CSVRow
{
    public:
        std::string_view operator[](std::size_t index) const
        {
            return std::string_view(&m_line[m_data[index] + 1], m_data[index + 1] -  (m_data[index] + 1));
        }
        std::size_t size() const
        {
            return m_data.size() - 1;
        }
        void readNextRow(std::istream& str)
        {
            std::getline(str, m_line);

            m_data.clear();
            m_data.emplace_back(-1);
            std::string::size_type pos = 0;
            while((pos = m_line.find(',', pos)) != std::string::npos)
            {
                m_data.emplace_back(pos);
                ++pos;
            }
            // This checks for a trailing comma with no data after it.
            pos   = m_line.size();
            m_data.emplace_back(pos);
        }
    private:
        std::string         m_line;
        std::vector<int>    m_data;
};

std::istream& operator>>(std::istream& str, CSVRow& data)
{
    data.readNextRow(str);
    return str;
}   
int main()
{
    std::ifstream       file("plop.csv");

    CSVRow              row;
    while(file >> row)
    {
        std::cout << "4th Element(" << row[3] << ")\n";
    }
}

But with a little work we could technically create an iterator:

class CSVIterator
{   
    public:
        typedef std::input_iterator_tag     iterator_category;
        typedef CSVRow                      value_type;
        typedef std::size_t                 difference_type;
        typedef CSVRow*                     pointer;
        typedef CSVRow&                     reference;

        CSVIterator(std::istream& str)  :m_str(str.good()?&str:NULL) { ++(*this); }
        CSVIterator()                   :m_str(NULL) {}

        // Pre Increment
        CSVIterator& operator++()               {if (m_str) { if (!((*m_str) >> m_row)){m_str = NULL;}}return *this;}
        // Post increment
        CSVIterator operator++(int)             {CSVIterator    tmp(*this);++(*this);return tmp;}
        CSVRow const& operator*()   const       {return m_row;}
        CSVRow const* operator->()  const       {return &m_row;}

        bool operator==(CSVIterator const& rhs) {return ((this == &rhs) || ((this->m_str == NULL) && (rhs.m_str == NULL)));}
        bool operator!=(CSVIterator const& rhs) {return !((*this) == rhs);}
    private:
        std::istream*       m_str;
        CSVRow              m_row;
};


int main()
{
    std::ifstream       file("plop.csv");

    for(CSVIterator loop(file); loop != CSVIterator(); ++loop)
    {
        std::cout << "4th Element(" << (*loop)[3] << ")\n";
    }
}

Now that we are in 2020 lets add a CSVRange object:

class CSVRange
{
    std::istream&   stream;
    public:
        CSVRange(std::istream& str)
            : stream(str)
        {}
        CSVIterator begin() const {return CSVIterator{stream};}
        CSVIterator end()   const {return CSVIterator{};}
};

int main()
{
    std::ifstream       file("plop.csv");

    for(auto& row: CSVRange(file))
    {
        std::cout << "4th Element(" << row[3] << ")\n";
    }
}

Solution 2 - C++

My version is not using anything but the standard C++11 library. It copes well with Excel CSV quotation:

spam eggs,"foo,bar","""fizz buzz"""
1.23,4.567,-8.00E+09

The code is written as a finite-state machine and is consuming one character at a time. I think it's easier to reason about.

#include <istream>
#include <string>
#include <vector>

enum class CSVState {
    UnquotedField,
    QuotedField,
    QuotedQuote
};

std::vector<std::string> readCSVRow(const std::string &row) {
    CSVState state = CSVState::UnquotedField;
    std::vector<std::string> fields {""};
    size_t i = 0; // index of the current field
    for (char c : row) {
        switch (state) {
            case CSVState::UnquotedField:
                switch (c) {
                    case ',': // end of field
                              fields.push_back(""); i++;
                              break;
                    case '"': state = CSVState::QuotedField;
                              break;
                    default:  fields[i].push_back(c);
                              break; }
                break;
            case CSVState::QuotedField:
                switch (c) {
                    case '"': state = CSVState::QuotedQuote;
                              break;
                    default:  fields[i].push_back(c);
                              break; }
                break;
            case CSVState::QuotedQuote:
                switch (c) {
                    case ',': // , after closing quote
                              fields.push_back(""); i++;
                              state = CSVState::UnquotedField;
                              break;
                    case '"': // "" -> "
                              fields[i].push_back('"');
                              state = CSVState::QuotedField;
                              break;
                    default:  // end of quote
                              state = CSVState::UnquotedField;
                              break; }
                break;
        }
    }
    return fields;
}

/// Read CSV file, Excel dialect. Accept "quoted fields ""with quotes"""
std::vector<std::vector<std::string>> readCSV(std::istream &in) {
    std::vector<std::vector<std::string>> table;
    std::string row;
    while (!in.eof()) {
        std::getline(in, row);
        if (in.bad() || in.fail()) {
            break;
        }
        auto fields = readCSVRow(row);
        table.push_back(fields);
    }
    return table;
}

Solution 3 - C++

Solution using Boost Tokenizer:

std::vector<std::string> vec;
using namespace boost;
tokenizer<escaped_list_separator<char> > tk(
   line, escaped_list_separator<char>('\\', ',', '\"'));
for (tokenizer<escaped_list_separator<char> >::iterator i(tk.begin());
   i!=tk.end();++i) 
{
   vec.push_back(*i);
}

Solution 4 - C++

The C++ String Toolkit Library (StrTk) has a token grid class that allows you to load data either from text files, strings or char buffers, and to parse/process them in a row-column fashion.

You can specify the row delimiters and column delimiters or just use the defaults.

void foo()
{
   std::string data = "1,2,3,4,5\n"
                      "0,2,4,6,8\n"
                      "1,3,5,7,9\n";

   strtk::token_grid grid(data,data.size(),",");

   for(std::size_t i = 0; i < grid.row_count(); ++i)
   {
      strtk::token_grid::row_type r = grid.row(i);
      for(std::size_t j = 0; j < r.size(); ++j)
      {
         std::cout << r.get<int>(j) << "\t";
      }
      std::cout << std::endl;
   }
   std::cout << std::endl;
}

More examples can be found Here

Solution 5 - C++

You can use Boost Tokenizer with escaped_list_separator.

> escaped_list_separator parses a superset of the csv. Boost::tokenizer

This only uses Boost tokenizer header files, no linking to boost libraries required.

Here is an example, (see Parse CSV File With Boost Tokenizer In C++ for details or Boost::tokenizer ):

#include <iostream>     // cout, endl
#include <fstream>      // fstream
#include <vector>
#include <string>
#include <algorithm>    // copy
#include <iterator>     // ostream_operator
#include <boost/tokenizer.hpp>

int main()
{
    using namespace std;
    using namespace boost;
    string data("data.csv");

    ifstream in(data.c_str());
    if (!in.is_open()) return 1;

    typedef tokenizer< escaped_list_separator<char> > Tokenizer;
    vector< string > vec;
    string line;

    while (getline(in,line))
    {
        Tokenizer tok(line);
        vec.assign(tok.begin(),tok.end());

        // vector now contains strings from one row, output to cout here
        copy(vec.begin(), vec.end(), ostream_iterator<string>(cout, "|"));

        cout << "\n----------------------" << endl;
    }
}

Solution 6 - C++

It is not overkill to use Spirit for parsing CSVs. Spirit is well suited for micro-parsing tasks. For instance, with Spirit 2.1, it is as easy as:

bool r = phrase_parse(first, last,

    //  Begin grammar
    (
        double_ % ','
    )
    ,
    //  End grammar

    space, v);

The vector, v, gets stuffed with the values. There is a series of tutorials touching on this in the new Spirit 2.1 docs that's just been released with Boost 1.41.

The tutorial progresses from simple to complex. The CSV parsers are presented somewhere in the middle and touches on various techniques in using Spirit. The generated code is as tight as hand written code. Check out the assembler generated!

Solution 7 - C++

If you DO care about parsing CSV correctly, this will do it...relatively slowly as it works one char at a time.

 void ParseCSV(const string& csvSource, vector<vector<string> >& lines)
    {
       bool inQuote(false);
       bool newLine(false);
       string field;
       lines.clear();
       vector<string> line;
    
       string::const_iterator aChar = csvSource.begin();
       while (aChar != csvSource.end())
       {
          switch (*aChar)
          {
          case '"':
             newLine = false;
             inQuote = !inQuote;
             break;
          
          case ',':
             newLine = false;
             if (inQuote == true)
             {
                field += *aChar;
             }
             else
             {
                line.push_back(field);
                field.clear();
             }
             break;
    
          case '\n':
          case '\r':
             if (inQuote == true)
             {
                field += *aChar;
             }
             else
             {
                if (newLine == false)
                {
                   line.push_back(field);
                   lines.push_back(line);
                   field.clear();
                   line.clear();
                   newLine = true;
                }
             }
             break;
    
          default:
             newLine = false;
             field.push_back(*aChar);
             break;
          }
    
          aChar++;
       }
    
       if (field.size())
          line.push_back(field);
    
       if (line.size())
          lines.push_back(line);
    }

Solution 8 - C++

When using the Boost Tokenizer escaped_list_separator for CSV files, then one should be aware of the following:

It requires an escape-character (default back-slash - \)
It requires a splitter/seperator-character (default comma - ,)
It requires an quote-character (default quote - ")

The CSV format specified by wiki states that data fields can contain separators in quotes (supported):

> 1997,Ford,E350,"Super, luxurious truck"

The CSV format specified by wiki states that single quotes should be handled with double-quotes (escaped_list_separator will strip away all quote characters):

> 1997,Ford,E350,"Super ""luxurious"" truck"

The CSV format doesn't specify that any back-slash characters should be stripped away (escaped_list_separator will strip away all escape characters).

A possible work-around to fix the default behavior of the boost escaped_list_separator:

First replace all back-slash characters (\) with two back-slash characters (\\) so they are not stripped away.
Secondly replace all double-quotes ("") with a single back-slash character and a quote (\")

This work-around has the side-effect that empty data-fields that are represented by a double-quote, will be transformed into a single-quote-token. When iterating through the tokens, then one must check if the token is a single-quote, and treat it like an empty string.

Not pretty but it works, as long there are not newlines within the quotes.

Solution 9 - C++

As all the CSV questions seem to get redirected here, I thought I'd post my answer here. This answer does not directly address the asker's question. I wanted to be able to read in a stream that is known to be in CSV format, and also the types of each field was already known. Of course, the method below could be used to treat every field to be a string type.

As an example of how I wanted to be able to use a CSV input stream, consider the following input (taken from wikipedia's page on CSV):

const char input[] =
"Year,Make,Model,Description,Price\n"
"1997,Ford,E350,\"ac, abs, moon\",3000.00\n"
"1999,Chevy,\"Venture \"\"Extended Edition\"\"\",\"\",4900.00\n"
"1999,Chevy,\"Venture \"\"Extended Edition, Very Large\"\"\",\"\",5000.00\n"
"1996,Jeep,Grand Cherokee,\"MUST SELL!\n\
air, moon roof, loaded\",4799.00\n"
;

Then, I wanted to be able to read in the data like this:

std::istringstream ss(input);
std::string title[5];
int year;
std::string make, model, desc;
float price;
csv_istream(ss)
    >> title[0] >> title[1] >> title[2] >> title[3] >> title[4];
while (csv_istream(ss)
       >> year >> make >> model >> desc >> price) {
    //...do something with the record...
}

This was the solution I ended up with.

struct csv_istream {
    std::istream &is_;
    csv_istream (std::istream &is) : is_(is) {}
    void scan_ws () const {
        while (is_.good()) {
            int c = is_.peek();
            if (c != ' ' && c != '\t') break;
            is_.get();
        }
    }
    void scan (std::string *s = 0) const {
        std::string ws;
        int c = is_.get();
        if (is_.good()) {
            do {
                if (c == ',' || c == '\n') break;
                if (s) {
                    ws += c;
                    if (c != ' ' && c != '\t') {
                        *s += ws;
                        ws.clear();
                    }
                }
                c = is_.get();
            } while (is_.good());
            if (is_.eof()) is_.clear();
        }
    }
    template <typename T, bool> struct set_value {
        void operator () (std::string in, T &v) const {
            std::istringstream(in) >> v;
        }
    };
    template <typename T> struct set_value<T, true> {
        template <bool SIGNED> void convert (std::string in, T &v) const {
            if (SIGNED) v = ::strtoll(in.c_str(), 0, 0);
            else v = ::strtoull(in.c_str(), 0, 0);
        }
        void operator () (std::string in, T &v) const {
            convert<is_signed_int<T>::val>(in, v);
        }
    };
    template <typename T> const csv_istream & operator >> (T &v) const {
        std::string tmp;
        scan(&tmp);
        set_value<T, is_int<T>::val>()(tmp, v);
        return *this;
    }
    const csv_istream & operator >> (std::string &v) const {
        v.clear();
        scan_ws();
        if (is_.peek() != '"') scan(&v);
        else {
            std::string tmp;
            is_.get();
            std::getline(is_, tmp, '"');
            while (is_.peek() == '"') {
                v += tmp;
                v += is_.get();
                std::getline(is_, tmp, '"');
            }
            v += tmp;
            scan();
        }
        return *this;
    }
    template <typename T>
    const csv_istream & operator >> (T &(*manip)(T &)) const {
        is_ >> manip;
        return *this;
    }
    operator bool () const { return !is_.fail(); }
};

With the following helpers that may be simplified by the new integral traits templates in C++11:

template <typename T> struct is_signed_int { enum { val = false }; };
template <> struct is_signed_int<short> { enum { val = true}; };
template <> struct is_signed_int<int> { enum { val = true}; };
template <> struct is_signed_int<long> { enum { val = true}; };
template <> struct is_signed_int<long long> { enum { val = true}; };

template <typename T> struct is_unsigned_int { enum { val = false }; };
template <> struct is_unsigned_int<unsigned short> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned int> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned long> { enum { val = true}; };
template <> struct is_unsigned_int<unsigned long long> { enum { val = true}; };

template <typename T> struct is_int {
    enum { val = (is_signed_int<T>::val || is_unsigned_int<T>::val) };
};

[Try it online!](https://tio.run/##rVZtT@NGEP7uXzG4UrDpFuXK0VNev/ToqRVUlaAnVQRFxt6EFc6u5d2YIi6/nc6sHbKOHTig5iWb3Zlnnnn1xln20zyOH38QMk6XCYehUNrkPFqMvc2eXm95hi@yNDK4Z@4zLqMFh4sx4PEyNiD0VIu55MlUSAMPwOVygR9FlMIIZlGqOawG@OugtOsO9Y3KzXgbAiX5dwLg33vUUyXn79WHF0BejOVSviearvZw/QXeFth2rLfEuB3pTeHeDfXuyNuAe4BPDSGoJ/li3O/TwbdvDTrVUUjWVoPHCjnWxVSUrVTBa5P0@@utDqIM7LYrGGzLhNAnewGtHlZWvlACUxtHcnqnIQghVlKvPaDn7kak3NI/nCuVBGHoHNJDDsfoIUlknN8G4aB@PoMghr0R7ONPpwPlemL2Q7hGXrdb0mSHGxek5LnFtvINXROYtAONBLpN8q7Mnd5Aupy3rBHd3c4mamtj4@EIvWL7lNFyPZHtHq41dNgCZSOu4ccRxIPWw2ei2Y5Gz4FFdAPQtHkYpzzKt5NXz8HzOzsCaoVbyqhZJXTI1YyiTssGIbcQ2tqQwbVS6VMzam6m2EhL7kTG1o/KeB4ZlVO518pISAYX0CmahVRrOBQtWyoQMoTxGIoGx8FOkk16QyROA2bs2Ntokktw/vuXP08@j0v6yK3guXkNdYpuiRFCgWmyakalKXpwGE/xSxAy6OLvVlo4vSocjeXzKqv/JdCVg8PWkTkOSLkIXxXy0oo7GjsbcpjAoJ2MS9ksso1JmkBBB7ccGrV8li@CDecwQOE67ZybZS7hwNyIqjFLR17k6rJqY100O6ea721zrhzZdor4OERKz1yetgRaOqEtLjsm@JMO7qdCcjLMwEaEjNYFnUlRcRtV3JoDrqC51mDwdLKDyqvo1Odcu0EbtNY22JXk1jL9vvRjqQYHi0iKLKR12CwAdIckrczL9fYEbifN5gawVtijKM4ikaKHqIJXEnp7LiIhSfjBqwjfRDm2dbY0l1cw8vx/sALZWXTL2ZlKeMo@cx3nIjNCSfZXLmI@kb7nf@j1PrHfVJ6wk6PjLpv4UcwgutYMFkrJic@Out3uYbe7Fu6xX294cY@CX7lEdhwm/sQ/@ddwmeAF7iQRZID2UNf@@9h7AwCDrzy/h9Mon3MX7LjO5hf2B@cZ@5JHMgHEzdUt5yh59vf5BZyfnJ7uTeTEi0ReugO5UjOGF8wITRG1T71ehYb3yubbBbQObESxsmodJ0zKL4@vBh4l4h4DXT9fUNTRIoUdEoz7wJuhUQMZxX3gObUVaB3aIsByKWG7V5v1B2f9s7M@ctYfkUbVsW24JTRxLCvylttP4kYLYkefllnoXmxjtTQwHILfJ@XRpU9fLA5tXvlbDW0lCb6StJaekSQCa1FLZrcscaxELd3dktaLStSuS1lMMN7kPW/1@B8 "C++ (gcc) – Try It Online")

Solution 10 - C++

I wrote a header-only, C++11 CSV parser. It's well tested, fast, supports the entire CSV spec (quoted fields, delimiter/terminator in quotes, quote escaping, etc.), and is configurable to account for the CSVs that don't adhere to the specification.

Configuration is done through a fluent interface:

// constructor accepts any input stream
CsvParser parser = CsvParser(std::cin)
  .delimiter(';')    // delimited by ; instead of ,
  .quote('\'')       // quoted fields use ' instead of "
  .terminator('\0'); // terminated by \0 instead of by \r\n, \n, or \r

Parsing is just a range based for loop:

#include <iostream>
#include "../parser.hpp"

using namespace aria::csv;

int main() {
  std::ifstream f("some_file.csv");
  CsvParser parser(f);

  for (auto& row : parser) {
    for (auto& field : row) {
      std::cout << field << " | ";
    }
    std::cout << std::endl;
  }
}

Solution 11 - C++

You might want to look at my FOSS project CSVfix (updated link), which is a CSV stream editor written in C++. The CSV parser is no prize, but does the job and the whole package may do what you need without you writing any code.

See alib/src/a_csv.cpp for the CSV parser, and csvlib/src/csved_ioman.cpp (IOManager::ReadCSV) for a usage example.

Solution 12 - C++

Another CSV I/O library can be found here:

http://code.google.com/p/fast-cpp-csv-parser/

#include "csv.h"

int main(){
  io::CSVReader<3> in("ram.csv");
  in.read_header(io::ignore_extra_column, "vendor", "size", "speed");
  std::string vendor; int size; double speed;
  while(in.read_row(vendor, size, speed)){
    // do stuff with the data
  }
}

Solution 13 - C++

Another solution similar to Loki Astari's answer, in C++11. Rows here are std::tuples of a given type. The code scans one line, then scans until each delimiter, and then converts and dumps the value directly into the tuple (with a bit of template code).

for (auto row : csv<std::string, int, float>(file, ',')) {
    std::cout << "first col: " << std::get<0>(row) << std::endl;
}

Advanges:

quite clean and simple to use, only C++11.
automatic type conversion into std::tuple<t1, ...> via operator>>.

What's missing:

escaping and quoting
no error handling in case of malformed CSV.

The main code:

#include <iterator>
#include <sstream>
#include <string>

namespace csvtools {
    /// Read the last element of the tuple without calling recursively
    template <std::size_t idx, class... fields>
    typename std::enable_if<idx >= std::tuple_size<std::tuple<fields...>>::value - 1>::type
    read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
        std::string cell;
        std::getline(in, cell, delimiter);
        std::stringstream cell_stream(cell);
        cell_stream >> std::get<idx>(out);
    }

    /// Read the @p idx-th element of the tuple and then calls itself with @p idx + 1 to
    /// read the next element of the tuple. Automatically falls in the previous case when
    /// reaches the last element of the tuple thanks to enable_if
    template <std::size_t idx, class... fields>
    typename std::enable_if<idx < std::tuple_size<std::tuple<fields...>>::value - 1>::type
    read_tuple(std::istream &in, std::tuple<fields...> &out, const char delimiter) {
        std::string cell;
        std::getline(in, cell, delimiter);
        std::stringstream cell_stream(cell);
        cell_stream >> std::get<idx>(out);
        read_tuple<idx + 1, fields...>(in, out, delimiter);
    }
}

/// Iterable csv wrapper around a stream. @p fields the list of types that form up a row.
template <class... fields>
class csv {
    std::istream &_in;
    const char _delim;
public:
    typedef std::tuple<fields...> value_type;
    class iterator;

    /// Construct from a stream.
    inline csv(std::istream &in, const char delim) : _in(in), _delim(delim) {}

    /// Status of the underlying stream
    /// @{
    inline bool good() const {
        return _in.good();
    }
    inline const std::istream &underlying_stream() const {
        return _in;
    }
    /// @}

    inline iterator begin();
    inline iterator end();
private:

    /// Reads a line into a stringstream, and then reads the line into a tuple, that is returned
    inline value_type read_row() {
        std::string line;
        std::getline(_in, line);
        std::stringstream line_stream(line);
        std::tuple<fields...> retval;
        csvtools::read_tuple<0, fields...>(line_stream, retval, _delim);
        return retval;
    }
};

/// Iterator; just calls recursively @ref csv::read_row and stores the result.
template <class... fields>
class csv<fields...>::iterator {
    csv::value_type _row;
    csv *_parent;
public:
    typedef std::input_iterator_tag iterator_category;
    typedef csv::value_type         value_type;
    typedef std::size_t             difference_type;
    typedef csv::value_type *       pointer;
    typedef csv::value_type &       reference;

    /// Construct an empty/end iterator
    inline iterator() : _parent(nullptr) {}
    /// Construct an iterator at the beginning of the @p parent csv object.
    inline iterator(csv &parent) : _parent(parent.good() ? &parent : nullptr) {
        ++(*this);
    }

    /// Read one row, if possible. Set to end if parent is not good anymore.
    inline iterator &operator++() {
        if (_parent != nullptr) {
            _row = _parent->read_row();
            if (!_parent->good()) {
                _parent = nullptr;
            }
        }
        return *this;
    }

    inline iterator operator++(int) {
        iterator copy = *this;
        ++(*this);
        return copy;
    }

    inline csv::value_type const &operator*() const {
        return _row;
    }

    inline csv::value_type const *operator->() const {
        return &_row;
    }

    bool operator==(iterator const &other) {
        return (this == &other) or (_parent == nullptr and other._parent == nullptr);
    }
    bool operator!=(iterator const &other) {
        return not (*this == other);
    }
};

template <class... fields>
typename csv<fields...>::iterator csv<fields...>::begin() {
    return iterator(*this);
}

template <class... fields>
typename csv<fields...>::iterator csv<fields...>::end() {
    return iterator();
}

I put a tiny working example on GitHub; I've been using it for parsing some numerical data and it served its purpose.

Solution 14 - C++

Here is another implementation of a Unicode CSV parser (works with wchar_t). I wrote part of it, while Jonathan Leffler wrote the rest.

Note: This parser is aimed at replicating Excel's behavior as closely as possible, specifically when importing broken or malformed CSV files.

This is the original question - https://stackoverflow.com/questions/15520113/parsing-csv-file-with-multiline-fields-and-escaped-double-quotes

This is the code as a SSCCE (Short, Self-Contained, Correct Example).

#include <stdbool.h>
#include <wchar.h>
#include <wctype.h>

extern const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline);

// Returns a pointer to the start of the next field,
// or zero if this is the last field in the CSV
// p is the start position of the field
// sep is the separator used, i.e. comma or semicolon
// newline says whether the field ends with a newline or with a comma
const wchar_t *nextCsvField(const wchar_t *p, wchar_t sep, bool *newline)
{
    // Parse quoted sequences
    if ('"' == p[0]) {
        p++;
        while (1) {
            // Find next double-quote
            p = wcschr(p, L'"');
            // If we don't find it or it's the last symbol
            // then this is the last field
            if (!p || !p[1])
                return 0;
            // Check for "", it is an escaped double-quote
            if (p[1] != '"')
                break;
            // Skip the escaped double-quote
            p += 2;
        }
    }

    // Find next newline or comma.
    wchar_t newline_or_sep[4] = L"\n\r ";
    newline_or_sep[2] = sep;
    p = wcspbrk(p, newline_or_sep);

    // If no newline or separator, this is the last field.
    if (!p)
        return 0;

    // Check if we had newline.
    *newline = (p[0] == '\r' || p[0] == '\n');

    // Handle "\r\n", otherwise just increment
    if (p[0] == '\r' && p[1] == '\n')
        p += 2;
    else
        p++;

    return p;
}

static wchar_t *csvFieldData(const wchar_t *fld_s, const wchar_t *fld_e, wchar_t *buffer, size_t buflen)
{
    wchar_t *dst = buffer;
    wchar_t *end = buffer + buflen - 1;
    const wchar_t *src = fld_s;

    if (*src == L'"')
    {
        const wchar_t *p = src + 1;
        while (p < fld_e && dst < end)
        {
            if (p[0] == L'"' && p+1 < fld_s && p[1] == L'"')
            {
                *dst++ = p[0];
                p += 2;
            }
            else if (p[0] == L'"')
            {
                p++;
                break;
            }
            else
                *dst++ = *p++;
        }
        src = p;
    }
    while (src < fld_e && dst < end)
        *dst++ = *src++;
    if (dst >= end)
        return 0;
    *dst = L'\0';
    return(buffer);
}

static void dissect(const wchar_t *line)
{
    const wchar_t *start = line;
    const wchar_t *next;
    bool     eol;
    wprintf(L"Input %3zd: [%.*ls]\n", wcslen(line), wcslen(line)-1, line);
    while ((next = nextCsvField(start, L',', &eol)) != 0)
    {
        wchar_t buffer[1024];
        wprintf(L"Raw Field: [%.*ls] (eol = %d)\n", (next - start - eol), start, eol);
        if (csvFieldData(start, next-1, buffer, sizeof(buffer)/sizeof(buffer[0])) != 0)
            wprintf(L"Field %3zd: [%ls]\n", wcslen(buffer), buffer);
        start = next;
    }
}

static const wchar_t multiline[] =
   L"First field of first row,\"This field is multiline\n"
    "\n"
    "but that's OK because it's enclosed in double quotes, and this\n"
    "is an escaped \"\" double quote\" but this one \"\" is not\n"
    "   \"This is second field of second row, but it is not multiline\n"
    "   because it doesn't start \n"
    "   with an immediate double quote\"\n"
    ;

int main(void)
{
    wchar_t line[1024];

    while (fgetws(line, sizeof(line)/sizeof(line[0]), stdin))
        dissect(line);
    dissect(multiline);

    return 0;
}

Solution 15 - C++

This is an old thread but its still at the top of search results, so I'm adding my solution using std::stringstream and a simple string replace method by Yves Baumes I found here.

The following example will read a file line by line, ignore comment lines starting with // and parse the other lines into a combination of strings, ints and doubles. Stringstream does the parsing, but expects fields to be delimited by whitespace, so I use stringreplace to turn commas into spaces first. It handles tabs ok, but doesn't deal with quoted strings.

Bad or missing input is simply ignored, which may or may not be good, depending on your circumstance.

#include <string>
#include <sstream>
#include <fstream>

void StringReplace(std::string& str, const std::string& oldStr, const std::string& newStr)
// code by  Yves Baumes
// http://stackoverflow.com/questions/1494399/how-do-i-search-find-and-replace-in-a-standard-string
{
  size_t pos = 0;
  while((pos = str.find(oldStr, pos)) != std::string::npos)
  {
     str.replace(pos, oldStr.length(), newStr);
     pos += newStr.length();
  }
}

void LoadCSV(std::string &filename) {
   std::ifstream stream(filename);
   std::string in_line;
   std::string Field;
   std::string Chan;
   int ChanType;
   double Scale;
   int Import;
   while (std::getline(stream, in_line)) {
      StringReplace(in_line, ",", " ");
      std::stringstream line(in_line);
      line >> Field >> Chan >> ChanType >> Scale >> Import;
      if (Field.substr(0,2)!="//") {
         // do your stuff 
         // this is CBuilder code for demonstration, sorry
         ShowMessage((String)Field.c_str() + "\n" + Chan.c_str() + "\n" + IntToStr(ChanType) + "\n" +FloatToStr(Scale) + "\n" +IntToStr(Import));
      }
   }
}

Solution 16 - C++

I needed an easy-to-use C++ library for parsing CSV files but couldn't find any available, so I ended up building one. Rapidcsv is a C++11 header-only library which gives direct access to parsed columns (or rows) as vectors, in datatype of choice. For example:

#include <iostream>
#include <vector>
#include <rapidcsv.h>

int main()
{
  rapidcsv::Document doc("../tests/msft.csv");

  std::vector<float> close = doc.GetColumn<float>("Close");
  std::cout << "Read " << close.size() << " values." << std::endl;
}

Solution 17 - C++

You can use the header-only Csv::Parser library.

It fully supports RFC 4180, including quoted values, escaped quotes, and newlines in field values.
It requires only standard C++ (C++17).
It supports reading CSV data from std::string_view at compile-time.
It's extensively tested using Catch2.

Solution 18 - C++

Here is code for reading a matrix, note you also have a csvwrite function in matlab

void loadFromCSV( const std::string& filename )
{
    std::ifstream       file( filename.c_str() );
    std::vector< std::vector<std::string> >   matrix;
    std::vector<std::string>   row;
    std::string                line;
    std::string                cell;

    while( file )
    {
        std::getline(file,line);
        std::stringstream lineStream(line);
        row.clear();

        while( std::getline( lineStream, cell, ',' ) )
            row.push_back( cell );

        if( !row.empty() )
            matrix.push_back( row );
    }

    for( int i=0; i<int(matrix.size()); i++ )
    {
        for( int j=0; j<int(matrix[i].size()); j++ )
            std::cout << matrix[i][j] << " ";

        std::cout << std::endl;
    }
}

Solution 19 - C++

Excuse me, but this all seems like a great deal of elaborate syntax to hide a few lines of code.

Why not this:

/**

  Read line from a CSV file

  @param[in] fp file pointer to open file
  @param[in] vls reference to vector of strings to hold next line

  */
void readCSV( FILE *fp, std::vector<std::string>& vls )
{
	vls.clear();
	if( ! fp )
		return;
	char buf[10000];
	if( ! fgets( buf,999,fp) )
		return;
	std::string s = buf;
	int p,q;
	q = -1;
	// loop over columns
	while( 1 ) {
		p = q;
		q = s.find_first_of(",\n",p+1);
		if( q == -1 ) 
			break;
		vls.push_back( s.substr(p+1,q-p-1) );
	}
}

int _tmain(int argc, _TCHAR* argv[])
{
	std::vector<std::string> vls;
	FILE * fp = fopen( argv[1], "r" );
	if( ! fp )
		return 1;
	readCSV( fp, vls );
	readCSV( fp, vls );
	readCSV( fp, vls );
	std::cout << "row 3, col 4 is " << vls[3].c_str() << "\n";

	return 0;
}

Solution 20 - C++

You can open and read .csv file using fopen ,fscanf functions ,but the important thing is to parse the data.Simplest way to parse the data using delimiter.In case of .csv , delimiter is ','.

Suppose your data1.csv file is as follows :

A,45,76,01
B,77,67,02
C,63,76,03
D,65,44,04

you can tokenize data and store in char array and later use atoi() etc function for appropriate conversions

FILE *fp;
char str1[10], str2[10], str3[10], str4[10];

fp = fopen("G:\\data1.csv", "r");
if(NULL == fp)
{
	printf("\nError in opening file.");
	return 0;
}
while(EOF != fscanf(fp, " %[^,], %[^,], %[^,], %s, %s, %s, %s ", str1, str2, str3, str4))
{
	printf("\n%s %s %s %s", str1, str2, str3, str4);
}
fclose(fp);

[^,], ^ -it inverts logic , means match any string that does not contain comma then last , says to match comma that terminated previous string.

Solution 21 - C++

The first thing you need to do is make sure the file exists. To accomplish this you just need to try and open the file stream at the path. After you have opened the file stream use stream.fail() to see if it worked as expected, or not.

bool fileExists(string fileName)
{

ifstream test;

test.open(fileName.c_str());

if (test.fail())
{
	test.close();
	return false;
}
else
{
	test.close();
	return true;
}
}

You must also verify that the file provided is the correct type of file. To accomplish this you need to look through the file path provided until you find the file extension. Once you have the file extension make sure that it is a .csv file.

bool verifyExtension(string filename)
{
int period = 0;

for (unsigned int i = 0; i < filename.length(); i++)
{
	if (filename[i] == '.')
		period = i;
}

string extension;

for (unsigned int i = period; i < filename.length(); i++)
	extension += filename[i];

if (extension == ".csv")
	return true;
else
	return false;
}

This function will return the file extension which is used later in an error message.

string getExtension(string filename)
{
int period = 0;

for (unsigned int i = 0; i < filename.length(); i++)
{
	if (filename[i] == '.')
		period = i;
}

string extension;

if (period != 0)
{
	for (unsigned int i = period; i < filename.length(); i++)
		extension += filename[i];
}
else
	extension = "NO FILE";

return extension;
}

This function will actually call the error checks created above and then parse through the file.

void parseFile(string fileName)
{
	if (fileExists(fileName) && verifyExtension(fileName))
	{
		ifstream fs;
		fs.open(fileName.c_str());
		string fileCommand;

		while (fs.good())
		{
			string temp;
			
			getline(fs, fileCommand, '\n');

			for (unsigned int i = 0; i < fileCommand.length(); i++)
			{
				if (fileCommand[i] != ',')
					temp += fileCommand[i];
				else
					temp += " ";
			}

			if (temp != "\0")
			{
				// Place your code here to run the file.
			}
		}
		fs.close();
	}
	else if (!fileExists(fileName))
	{
		cout << "Error: The provided file does not exist: " << fileName << endl;
		 
		if (!verifyExtension(fileName))
		{
			if (getExtension(fileName) != "NO FILE")
				cout << "\tCheck the file extension." << endl;
			else
				cout << "\tThere is no file in the provided path." << endl;
		}
	}
	else if (!verifyExtension(fileName)) 
	{
		if (getExtension(fileName) != "NO FILE")
			cout << "Incorrect file extension provided: " << getExtension(fileName) << endl;
		else
			cout << "There is no file in the following path: " << fileName << endl;
	}
}

Solution 22 - C++

Since i'm not used to boost right now, I will suggest a more simple solution. Lets suppose that your .csv file has 100 lines with 10 numbers in each line separated by a ','. You could load this data in the form of an array with the following code:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
using namespace std;

int main()
{
	int A[100][10];
	ifstream ifs;
	ifs.open("name_of_file.csv");
	string s1;
	char c;
	for(int k=0; k<100; k++)
	{
		getline(ifs,s1);
		stringstream stream(s1);
		int j=0;
		while(1)
		{
			stream >>A[k][j];
			stream >> c;
			j++;
			if(!stream) {break;}
		}
	}
	
	
}

Solution 23 - C++

You can use this library: https://github.com/vadamsky/csvworker

Code for example:

#include <iostream>
#include "csvworker.h"

using namespace std;

int main()
{
    //
    CsvWorker csv;
    csv.loadFromFile("example.csv");
    cout << csv.getRowsNumber() << "  " << csv.getColumnsNumber() << endl;

    csv.getFieldRef(0, 2) = "0";
    csv.getFieldRef(1, 1) = "0";
    csv.getFieldRef(1, 3) = "0";
    csv.getFieldRef(2, 0) = "0";
    csv.getFieldRef(2, 4) = "0";
    csv.getFieldRef(3, 1) = "0";
    csv.getFieldRef(3, 3) = "0";
    csv.getFieldRef(4, 2) = "0";

    for(unsigned int i=0;i<csv.getRowsNumber();++i)
    {
        //cout << csv.getRow(i) << endl;
        for(unsigned int j=0;j<csv.getColumnsNumber();++j)
        {
            cout << csv.getField(i, j) << ".";
        }
        cout << endl;
    }

    csv.saveToFile("test.csv");

    //
    CsvWorker csv2(4,4);

    csv2.getFieldRef(0, 0) = "a";
    csv2.getFieldRef(0, 1) = "b";
    csv2.getFieldRef(0, 2) = "r";
    csv2.getFieldRef(0, 3) = "a";
    csv2.getFieldRef(1, 0) = "c";
    csv2.getFieldRef(1, 1) = "a";
    csv2.getFieldRef(1, 2) = "d";
    csv2.getFieldRef(2, 0) = "a";
    csv2.getFieldRef(2, 1) = "b";
    csv2.getFieldRef(2, 2) = "r";
    csv2.getFieldRef(2, 3) = "a";

    csv2.saveToFile("test2.csv");

    return 0;
}

Solution 24 - C++

You gotta feel proud when you use something so beautiful as boost::spirit

Here my attempt of a parser (almost) complying with the CSV specifications on this link CSV specs (I didn't need line breaks within fields. Also the spaces around the commas are dismissed).

After you overcome the shocking experience of waiting 10 seconds for compiling this code :), you can sit back and enjoy.

// csvparser.cpp
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

#include <iostream>
#include <string>

namespace qi = boost::spirit::qi;
namespace bascii = boost::spirit::ascii;

template <typename Iterator>
struct csv_parser : qi::grammar<Iterator, std::vector<std::string>(), 
    bascii::space_type>
{
    qi::rule<Iterator, char()                                           > COMMA;
    qi::rule<Iterator, char()                                           > DDQUOTE;
    qi::rule<Iterator, std::string(),               bascii::space_type  > non_escaped;
    qi::rule<Iterator, std::string(),               bascii::space_type  > escaped;
    qi::rule<Iterator, std::string(),               bascii::space_type  > field;
    qi::rule<Iterator, std::vector<std::string>(),  bascii::space_type  > start;

    csv_parser() : csv_parser::base_type(start)
    {
        using namespace qi;
        using qi::lit;
        using qi::lexeme;
        using bascii::char_;

        start       = field % ',';
        field       = escaped | non_escaped;
        escaped     = lexeme['"' >> *( char_ -(char_('"') | ',') | COMMA | DDQUOTE)  >> '"'];
        non_escaped = lexeme[       *( char_ -(char_('"') | ',')                  )        ];
        DDQUOTE     = lit("\"\"")       [_val = '"'];
        COMMA       = lit(",")          [_val = ','];
    }

};

int main()
{
    std::cout << "Enter CSV lines [empty] to quit\n";

    using bascii::space;
    typedef std::string::const_iterator iterator_type;
    typedef csv_parser<iterator_type> csv_parser;

    csv_parser grammar;
    std::string str;
    int fid;
    while (getline(std::cin, str))
    {
        fid = 0;

        if (str.empty())
            break;

        std::vector<std::string> csv;
        std::string::const_iterator it_beg = str.begin();
        std::string::const_iterator it_end = str.end();
        bool r = phrase_parse(it_beg, it_end, grammar, space, csv);

        if (r && it_beg == it_end)
        {
            std::cout << "Parsing succeeded\n";
            for (auto& field: csv)
            {
                std::cout << "field " << ++fid << ": " << field << std::endl;
            }
        }
        else
        {
            std::cout << "Parsing failed\n";
        }
    }

    return 0;
}

Compile:

make csvparser

Test (example stolen from Wikipedia):

./csvparser
Enter CSV lines [empty] to quit

1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
Parsing succeeded
field 1: 1999
field 2: Chevy
field 3: Venture "Extended Edition, Very Large"
field 4: 
field 5: 5000.00

1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00"
Parsing failed

Solution 25 - C++

This solution detects these 4 cases

complete class is at

https://github.com/pedro-vicente/csv-parser

1,field 2,field 3,
1,field 2,"field 3 quoted, with separator",
1,field 2,"field 3
with newline",
1,field 2,"field 3
with newline and separator,",

It reads the file character by character, and reads 1 row at a time to a vector (of strings), therefore suitable for very large files.

Usage is

Iterate until an empty row is returned (end of file). A row is a vector where each entry is a CSV column.

read_csv_t csv;
csv.open("../test.csv");
std::vector<std::string> row;
while (true)
{
  row = csv.read_row();
  if (row.size() == 0)
  {
    break;
  }
}

the class declaration

class read_csv_t
{
public:
  read_csv_t();
  int open(const std::string &file_name);
  std::vector<std::string> read_row();
private:
  std::ifstream m_ifs;
};

the implementation

std::vector<std::string> read_csv_t::read_row()
{
  bool quote_mode = false;
  std::vector<std::string> row;
  std::string column;
  char c;
  while (m_ifs.get(c))
  {
    switch (c)
    {
      /////////////////////////////////////////////////////////////////////////////////////////////////////
      //separator ',' detected. 
      //in quote mode add character to column
      //push column if not in quote mode
      /////////////////////////////////////////////////////////////////////////////////////////////////////

    case ',':
      if (quote_mode == true)
      {
        column += c;
      }
      else
      {
        row.push_back(column);
        column.clear();
      }
      break;

      /////////////////////////////////////////////////////////////////////////////////////////////////////
      //quote '"' detected. 
      //toggle quote mode
      /////////////////////////////////////////////////////////////////////////////////////////////////////

    case '"':
      quote_mode = !quote_mode;
      break;

      /////////////////////////////////////////////////////////////////////////////////////////////////////
      //line end detected
      //in quote mode add character to column
      //return row if not in quote mode
      /////////////////////////////////////////////////////////////////////////////////////////////////////

    case '\n':
    case '\r':
      if (quote_mode == true)
      {
        column += c;
      }
      else
      {
        return row;
      }
      break;

      /////////////////////////////////////////////////////////////////////////////////////////////////////
      //default, add character to column
      /////////////////////////////////////////////////////////////////////////////////////////////////////

    default:
      column += c;
      break;
    }
  }

  //return empty vector if end of file detected 
  m_ifs.close();
  std::vector<std::string> v;
  return v;
}

Solution 26 - C++

Parsing CSV file lines with Stream

I wrote a small example of parsing CSV file lines, it can be developed with for and while loops if desired:

#include <iostream>
#include <fstream>
#include <string.h>

using namespace std;

int main() {


ifstream fin("Infile.csv");
ofstream fout("OutFile.csv");
string strline, strremain, strCol1 , strout;

string delimeter =";";

int d1;

to continue until the end of the file:

while (!fin.eof()){

get first line from InFile :

	getline(fin,strline,'\n');

find delimeter position in line:

	d1 = strline.find(';');

and parse first column:

	strCol1 = strline.substr(0,d1); // parse first Column
	d1++;
	strremain = strline.substr(d1); // remaining line

create output line in CSV format:

	strout.append(strCol1);
	strout.append(delimeter);

write line to Out File:

	fout << strout << endl; //out file line
	
} 

fin.close();
fout.close();

return(0);
}

This code is compiled and running. Good luck!

Solution 27 - C++

You could also take a look at capabilities of Qt library.

It has regular expressions support and QString class has nice methods, e.g. split() returning QStringList, list of strings obtained by splitting the original string with a provided delimiter. Should suffice for csv file..

To get a column with a given header name I use following: https://stackoverflow.com/questions/970330/c-inheritance-qt-problem-qstring/1011601#1011601

Solution 28 - C++

If you don't want to deal with including boost in your project (it is considerably large if all you are going to use it for is CSV parsing...)

I have had luck with the CSV parsing here:

http://www.zedwood.com/article/112/cpp-csv-parser

It handles quoted fields - but does not handle inline \n characters (which is probably fine for most uses).

Solution 29 - C++

For what it is worth, here is my implementation. It deals with wstring input, but could be adjusted to string easily. It does not handle newline in fields (as my application does not either, but adding its support isn't too difficult) and it does not comply with "\r\n" end of line as per RFC (assuming you use std::getline), but it does handle whitespace trimming and double-quotes correctly (hopefully).

using namespace std;

// trim whitespaces around field or double-quotes, remove double-quotes and replace escaped double-quotes (double double-quotes)
wstring trimquote(const wstring& str, const wstring& whitespace, const wchar_t quotChar)
{
	wstring ws;
    wstring::size_type strBegin = str.find_first_not_of(whitespace);
    if (strBegin == wstring::npos)
        return L"";

    wstring::size_type strEnd = str.find_last_not_of(whitespace);
    wstring::size_type strRange = strEnd - strBegin + 1;

	if((str[strBegin] == quotChar) && (str[strEnd] == quotChar))
	{
		ws = str.substr(strBegin+1, strRange-2);
		strBegin = 0;
		while((strEnd = ws.find(quotChar, strBegin)) != wstring::npos)
		{
			ws.erase(strEnd, 1);
			strBegin = strEnd+1;
		}

	}
	else
		ws = str.substr(strBegin, strRange);
    return ws;
}

pair<unsigned, unsigned> nextCSVQuotePair(const wstring& line, const wchar_t quotChar, unsigned ofs = 0)
{
	pair<unsigned, unsigned> r;
	r.first = line.find(quotChar, ofs);
	r.second = wstring::npos;
	if(r.first != wstring::npos)
	{
		r.second = r.first;
		while(((r.second = line.find(quotChar, r.second+1)) != wstring::npos)
			&& (line[r.second+1] == quotChar)) // WARNING: assumes null-terminated string such that line[r.second+1] always exist
			r.second++;

	}
	return r;
}

unsigned parseLine(vector<wstring>& fields, const wstring& line)
{
	unsigned ofs, ofs0, np;
	const wchar_t delim = L',';
	const wstring whitespace = L" \t\xa0\x3000\x2000\x2001\x2002\x2003\x2004\x2005\x2006\x2007\x2008\x2009\x200a\x202f\x205f";
	const wchar_t quotChar = L'\"';
	pair<unsigned, unsigned> quot;

	fields.clear();

	ofs = ofs0 = 0;
	quot = nextCSVQuotePair(line, quotChar);
	while((np = line.find(delim, ofs)) != wstring::npos)
	{
		if((np > quot.first) && (np < quot.second))
		{ // skip delimiter inside quoted field
			ofs = quot.second+1;
			quot = nextCSVQuotePair(line, quotChar, ofs);
			continue;
		}
		fields.push_back( trimquote(line.substr(ofs0, np-ofs0), whitespace, quotChar) );
		ofs = ofs0 = np+1;
	}
	fields.push_back( trimquote(line.substr(ofs0), whitespace, quotChar) );

	return fields.size();
}

Solution 30 - C++

Here is a ready-to use function if all you need is to load a data file of doubles (no integers, no text).

#include <sstream>
#include <fstream>
#include <iterator>
#include <string>
#include <vector>
#include <algorithm>

using namespace std;

/**
 * Parse a CSV data file and fill the 2d STL vector "data".
 * Limits: only "pure datas" of doubles, not encapsulated by " and without \n inside.
 * Further no formatting in the data (e.g. scientific notation)
 * It however handles both dots and commas as decimal separators and removes thousand separator.
 * 
 * returnCodes[0]: file access 0-> ok 1-> not able to read; 2-> decimal separator equal to comma separator
 * returnCodes[1]: number of records
 * returnCodes[2]: number of fields. -1 If rows have different field size
 * 
 */
vector<int>
readCsvData (vector <vector <double>>& data, const string& filename, const string& delimiter, const string& decseparator){
 
 int vv[3] = { 0,0,0 };
 vector<int> returnCodes(&vv[0], &vv[0]+3);

 string rowstring, stringtoken;
 double doubletoken;
 int rowcount=0;
 int fieldcount=0;
 data.clear();
 
 ifstream iFile(filename, ios_base::in);
 if (!iFile.is_open()){
   returnCodes[0] = 1;
   return returnCodes;
 }
 while (getline(iFile, rowstring)) {
    if (rowstring=="") continue; // empty line
    rowcount ++; //let's start with 1
    if(delimiter == decseparator){
      returnCodes[0] = 2;
      return returnCodes;
    }
    if(decseparator != "."){
     // remove dots (used as thousand separators)
     string::iterator end_pos = remove(rowstring.begin(), rowstring.end(), '.');
     rowstring.erase(end_pos, rowstring.end());
     // replace decimal separator with dots.
     replace(rowstring.begin(), rowstring.end(),decseparator.c_str()[0], '.'); 
    } else {
     // remove commas (used as thousand separators)
     string::iterator end_pos = remove(rowstring.begin(), rowstring.end(), ',');
     rowstring.erase(end_pos, rowstring.end());
    }
    // tokenize..
    vector<double> tokens;
    // Skip delimiters at beginning.
    string::size_type lastPos = rowstring.find_first_not_of(delimiter, 0);
    // Find first "non-delimiter".
    string::size_type pos     = rowstring.find_first_of(delimiter, lastPos);
    while (string::npos != pos || string::npos != lastPos){
        // Found a token, convert it to double add it to the vector.
        stringtoken = rowstring.substr(lastPos, pos - lastPos);
        if (stringtoken == "") {
	  tokens.push_back(0.0);
	} else {
          istringstream totalSString(stringtoken);
	  totalSString >> doubletoken;
	  tokens.push_back(doubletoken);
	}     
        // Skip delimiters.  Note the "not_of"
        lastPos = rowstring.find_first_not_of(delimiter, pos);
        // Find next "non-delimiter"
        pos = rowstring.find_first_of(delimiter, lastPos);
    }
    if(rowcount == 1){
      fieldcount = tokens.size();
      returnCodes[2] = tokens.size();
    } else {
      if ( tokens.size() != fieldcount){
	returnCodes[2] = -1;
      }
    }
    data.push_back(tokens);
 }
 iFile.close();
 returnCodes[1] = rowcount;
 return returnCodes;
}

Solution 31 - C++

Another quick and easy way is to use Boost.Fusion I/O:

#include <iostream>
#include <sstream>

#include <boost/fusion/adapted/boost_tuple.hpp>
#include <boost/fusion/sequence/io.hpp>

namespace fusion = boost::fusion;

struct CsvString
{
    std::string value;

    // Stop reading a string once a CSV delimeter is encountered.
    friend std::istream& operator>>(std::istream& s, CsvString& v) {
        v.value.clear();
        for(;;) {
            auto c = s.peek();
            if(std::istream::traits_type::eof() == c || ',' == c || '\n' == c)
                break;
            v.value.push_back(c);
            s.get();
        }
        return s;
    }

    friend std::ostream& operator<<(std::ostream& s, CsvString const& v) {
        return s << v.value;
    }
};

int main() {
    std::stringstream input("abc,123,true,3.14\n"
                            "def,456,false,2.718\n");

    typedef boost::tuple<CsvString, int, bool, double> CsvRow;

    using fusion::operator<<;
    std::cout << std::boolalpha;

    using fusion::operator>>;
    input >> std::boolalpha;
    input >> fusion::tuple_open("") >> fusion::tuple_close("\n") >> fusion::tuple_delimiter(',');

    for(CsvRow row; input >> row;)
        std::cout << row << '\n';
}

Outputs:

(abc 123 true 3.14)
(def 456 false 2.718)

Solution 32 - C++

I wrote a nice way of parsing CSV files and I thought I should add it as an answer:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>

struct CSVDict
{
  std::vector< std::string > inputImages;
  std::vector< double > inputLabels;
};

/**
\brief Splits the string

\param str String to split
\param delim Delimiter on the basis of which splitting is to be done
\return results Output in the form of vector of strings
*/
std::vector<std::string> stringSplit( const std::string &str, const std::string &delim )
{
  std::vector<std::string> results;

  for (size_t i = 0; i < str.length(); i++)
  {
    std::string tempString = "";
    while ((str[i] != *delim.c_str()) && (i < str.length()))
    {
      tempString += str[i];
      i++;
    }
    results.push_back(tempString);
  }

  return results;
}

/**
\brief Parse the supplied CSV File and obtain Row and Column information. 

Assumptions:
1. Header information is in first row
2. Delimiters are only used to differentiate cell members

\param csvFileName The full path of the file to parse
\param inputColumns The string of input columns which contain the data to be used for further processing
\param inputLabels The string of input labels based on which further processing is to be done
\param delim The delimiters used in inputColumns and inputLabels
\return Vector of Vector of strings: Collection of rows and columns
*/
std::vector< CSVDict > parseCSVFile( const std::string &csvFileName, const std::string &inputColumns, const std::string &inputLabels, const std::string &delim )
{
  std::vector< CSVDict > return_CSVDict;
  std::vector< std::string > inputColumnsVec = stringSplit(inputColumns, delim), inputLabelsVec = stringSplit(inputLabels, delim);
  std::vector< std::vector< std::string > > returnVector;
  std::ifstream inFile(csvFileName.c_str());
  int row = 0;
  std::vector< size_t > inputColumnIndeces, inputLabelIndeces;
  for (std::string line; std::getline(inFile, line, '\n');)
  {
    CSVDict tempDict;
    std::vector< std::string > rowVec;
    line.erase(std::remove(line.begin(), line.end(), '"'), line.end());
    rowVec = stringSplit(line, delim);
    
    // for the first row, record the indeces of the inputColumns and inputLabels
    if (row == 0)
    {
      for (size_t i = 0; i < rowVec.size(); i++)
      {
        for (size_t j = 0; j < inputColumnsVec.size(); j++)
        {
          if (rowVec[i] == inputColumnsVec[j])
          {
            inputColumnIndeces.push_back(i);
          }
        }
        for (size_t j = 0; j < inputLabelsVec.size(); j++)
        {
          if (rowVec[i] == inputLabelsVec[j])
          {
            inputLabelIndeces.push_back(i);
          }
        }
      }
    }
    else
    {
      for (size_t i = 0; i < inputColumnIndeces.size(); i++)
      {
        tempDict.inputImages.push_back(rowVec[inputColumnIndeces[i]]);
      }
      for (size_t i = 0; i < inputLabelIndeces.size(); i++)
      {
        double test = std::atof(rowVec[inputLabelIndeces[i]].c_str());
        tempDict.inputLabels.push_back(std::atof(rowVec[inputLabelIndeces[i]].c_str()));
      }
      return_CSVDict.push_back(tempDict);
    }
    row++;
  }

  return return_CSVDict;
}

Solution 33 - C++

It is possible to use std::regex .

Depending on the size of your file and the memory available to you , it is possible read it either line by line or entirely in an std::string.

To read the file one can use :

std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
                 std::istreambuf_iterator<char>());

then you can match with this which is actually customizable to your needs.

std::regex word_regex(",\\s]+");
auto what = 
    std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();

std::vector<std::string> v;
for (;what!=wend ; wend) {
    std::smatch match = *what;
    v.push_back(match.str());
}

Solution 34 - C++

If you're using Visual Studio / MFC, the following solution may make your life easier. It supports both Unicode and MBCS, has comments, doesn't have dependencies other than CString, and works well enough for me. It doesn't support line breaks embedded within a quoted string, but I don't care so long as it doesn't crash in that case, which it doesn't.

The overall strategy is, handle quoted and empty strings as special cases, and use Tokenize for the rest. For quoted strings, the strategy is, find the real closing quote, keeping track of whether pairs of consecutive quotes were encountered. If they were, use Replace to convert the pairs to singles. No doubt there are more efficient methods but performance wasn't sufficiently critical in my case to justify further optimization.

class CParseCSV {
public:
// Construction
	CParseCSV(const CString& sLine);

// Attributes
	bool	GetString(CString& sDest);

protected:
	CString	m_sLine;	// line to extract tokens from
	int		m_nLen;		// line length in characters
	int		m_iPos;		// index of current position
};

CParseCSV::CParseCSV(const CString& sLine) : m_sLine(sLine)
{
	m_nLen = m_sLine.GetLength();
	m_iPos = 0;
}

bool CParseCSV::GetString(CString& sDest)
{
	if (m_iPos < 0 || m_iPos > m_nLen)	// if position out of range
		return false;
	if (m_iPos == m_nLen) {	// if at end of string
		sDest.Empty();	// return empty token
		m_iPos = -1;	// really done now
		return true;
	}
	if (m_sLine[m_iPos] == '\"') {	// if current char is double quote
		m_iPos++;	// advance to next char
		int	iTokenStart = m_iPos;
		bool	bHasEmbeddedQuotes = false;
		while (m_iPos < m_nLen) {	// while more chars to parse
			if (m_sLine[m_iPos] == '\"') {	// if current char is double quote
				// if next char exists and is also double quote
				if (m_iPos < m_nLen - 1 && m_sLine[m_iPos + 1] == '\"') {
					// found pair of consecutive double quotes
					bHasEmbeddedQuotes = true;	// request conversion
					m_iPos++;	// skip first quote in pair
				} else	// next char doesn't exist or is normal
					break;	// found closing quote; exit loop
			}
			m_iPos++;	// advance to next char
		}
		sDest = m_sLine.Mid(iTokenStart, m_iPos - iTokenStart);
		if (bHasEmbeddedQuotes)	// if string contains embedded quote pairs
			sDest.Replace(_T("\"\""), _T("\""));	// convert pairs to singles
		m_iPos += 2;	// skip closing quote and trailing delimiter if any
	} else if (m_sLine[m_iPos] == ',') {	// else if char is comma
		sDest.Empty();	// return empty token
		m_iPos++;	// advance to next char
	} else {	// else get next comma-delimited token
		sDest = m_sLine.Tokenize(_T(","), m_iPos);
	}
	return true;
}

// calling code should look something like this:

	CStdioFile	fIn(pszPath, CFile::modeRead);
	CString	sLine, sToken;
	while (fIn.ReadString(sLine)) {	// for each line of input file
		if (!sLine.IsEmpty()) {	// ignore blank lines
			CParseCSV	csv(sLine);
			while (csv.GetString(sToken)) {
				// do something with sToken here
			}
		}
	}

Solution 35 - C++

I've got a way quicker solution, was originally intended for this question:

https://stackoverflow.com/questions/53858950/how-to-pull-specific-part-of-different-strings

But it was closed obviously. I'm not about to throw this away though:

#include <iostream>
#include <string>
#include <regex>

std::string text = "\"4,\"\"3\"\",\"\"Mon May 11 03:17:40 UTC 2009\"\",\"\"kindle2\"\",\"\"tpryan\"\",\"\"TEXT HERE\"\"\";;;;";

int main()
{
    std::regex r("(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")(\".*\")");
    std::smatch m;
    std::regex_search(text, m, r);
    std::cout<<"FOUND: "<<m[9]<<std::endl;

    return 0;
}

Just pick out whichever match you want from the smatch collection by index. Regex is bliss.

Solution 36 - C++

Like everyone puts his solution, here is mine using template, lambda and tuple.

It can convert any CSV with wanted columns to a C++ vector of tuple.

It works by defining each CSV line element type in a tuple.

You also need to define the std::string to type conversion Formatter lambda for each element (using std::atod for example).

Then you got a vector of this struct corresponding to your CSV data.

You can reuse this easily to match any CSV structure.

StringsHelpers.hpp

#include <string>
#include <fstream>
#include <vector>
#include <functional>

namespace StringHelpers
{
    template<typename Tuple>
    using Formatter = std::function<Tuple(const std::vector<std::string> &)>;

    std::vector<std::string> split(const std::string &string, const std::string &delimiter);

    template<typename Tuple>
    std::vector<Tuple> readCsv(const std::string &path, const std::string &delimiter, Formatter<Tuple> formatter);
};

StringsHelpers.cpp

#include "StringHelpers.hpp"

namespace StringHelpers
{
    /**
     * Split a string with the given delimiter into several strings
     *
     * @param string - The string to extract the substrings from
     * @param delimiter - The substrings delimiter
     *
     * @return The substrings
     */
    std::vector<std::string> split(const std::string &string, const std::string &delimiter)
    {
        std::vector<std::string> result;
        size_t                   last = 0,
                                 next = 0;

        while ((next = string.find(delimiter, last)) != std::string::npos) {
            result.emplace_back(string.substr(last, next - last));
            last = next + 1;
        }

        result.emplace_back(string.substr(last));

        return result;
    }

    /**
     * Read a CSV file and store its values into the given structure (Tuple with Formatter constructor)
     *
     * @tparam Tuple - The CSV line structure format
     *
     * @param path - The CSV file path
     * @param delimiter - The CSV values delimiter
     * @param formatter - The CSV values formatter that take a vector of strings in input and return a Tuple
     *
     * @return The CSV as vector of Tuple
     */
    template<typename Tuple>
    std::vector<Tuple> readCsv(const std::string &path, const std::string &delimiter, Formatter<Tuple> formatter)
    {
        std::ifstream      file(path, std::ifstream::in);
        std::string        line;
        std::vector<Tuple> result;

        if (file.fail()) {
            throw std::runtime_error("The file " + path + " could not be opened");
        }

        while (std::getline(file, line)) {
            result.emplace_back(formatter(split(line, delimiter)));
        }

        file.close();

        return result;
    }

    // Forward template declarations

    template std::vector<std::tuple<double, double, double>> readCsv<std::tuple<double, double, double>>(const std::string &, const std::string &, Formatter<std::tuple<double, double, double>>);
} // End of StringHelpers namespace

main.cpp (some usage)

#include "StringHelpers.hpp"

/**
 * Example of use with a CSV file which have (number,Red,Green,Blue) as line values. We do not want to use the 1st value
 * of the line.
 */
int main(int argc, char **argv)
{
    // Declare CSV line type, formatter and template type
    typedef std::tuple<double, double, double>                          CSV_format;
    typedef std::function<CSV_format(const std::vector<std::string> &)> formatterT;

    enum RGB { Red = 1, Green, Blue };

    const std::string COLOR_MAP_PATH = "/some/absolute/path";

    // Load the color map
    auto colorMap = StringHelpers::readCsv<CSV_format>(COLOR_MAP_PATH, ",", [](const std::vector<std::string> &values) {
        return CSV_format {
                // Here is the formatter lambda that convert each value from string to what you want
                std::strtod(values[Red].c_str(), nullptr),
                std::strtod(values[Green].c_str(), nullptr),
                std::strtod(values[Blue].c_str(), nullptr)
        };
    });

    // Use your colorMap as you  wish...
}

Solution 37 - C++

A minor edition to @sastanin's solution, so that it can deal with newlines within quotes.

std::vector<std::vector<std::string>> readCSV(std::istream &in) {
    std::vector<std::vector<std::string>> table;
    
    while (!in.eof()) {
        CSVState state = CSVState::UnquotedField;
        std::vector<std::string> fields {""};
        size_t i = 0; // index of the current field
        for (char c : row) {
            switch (state) {
                case CSVState::UnquotedField:
                    switch (c) {
                        case ',': // end of field
                                  fields.push_back(""); i++;
                                  break;
                        case '"': state = CSVState::QuotedField;
                                  break;
                        default:  fields[i].push_back(c);
                                  break; }
                    break;
                case CSVState::QuotedField:
                    switch (c) {
                        case '"': state = CSVState::QuotedQuote;
                                  break;
                        default:  fields[i].push_back(c);
                                  break; }
                    break;
                case CSVState::QuotedQuote:
                    switch (c) {
                        case ',': // , after closing quote
                                  fields.push_back(""); i++;
                                  state = CSVState::UnquotedField;
                                  break;
                        case '"': // "" -> "
                                  fields[i].push_back('"');
                                  state = CSVState::QuotedField;
                                  break;
                        case '\n': // newline
                                  table.push_back(fields);
                                  state = CSVState::UnquotedField;
                                  fields = vector<string>{""};
                                  i = 0;
                        default:  // end of quote
                                  state = CSVState::UnquotedField;
                                  break; }
                    break;
            }
        }
    }
    return table;
}

Solution 38 - C++

CSV file are text files consist of lines, each line is consist of tokens sererated by comma. while there are something you should know when parsing:

(0) The file is encoded with "CP_ACP" code page. you should use the same encoding page to decode the file contents.

(1) the CSV lost "composite cells" information (likes rowspan > 1) , so when it is read back to excel, the composite cell information is lost.

(2) the cell text can be quoted by """ at head and tail, and literal quote char will become double quotes. so the closing matching quote char must be a quote char not followed by another quote char. for eg, if a cell has a comma, it must be quoted in csv, because comma make sense in csv.

(3) when the cell content have multiple lines, it will be quoted in CSV, in this case, you parser must keep reading next sereral lines in CSV file, until you got a closing quote char matching the first quote char, make sure the current logical line is read complete before parsing the line's tokens.

for eg: in csv file, the following 3 physical lines is one logical line consist of 3 tokens:

    --+----------
    1 |a,"b-first part
    2 |b-second part
    3 |b-third part",c
    --+----------

Content Type	Original Author	Original Content on Stackoverflow
Question	User1	View Question on Stackoverflow
Solution 1 - C++	Martin York	View Answer on Stackoverflow
Solution 2 - C++	sastanin	View Answer on Stackoverflow
Solution 3 - C++	dtw	View Answer on Stackoverflow
Solution 4 - C++	Matthieu N.	View Answer on Stackoverflow
Solution 5 - C++	stefanB	View Answer on Stackoverflow
Solution 6 - C++	Joel de Guzman	View Answer on Stackoverflow
Solution 7 - C++	Michael	View Answer on Stackoverflow
Solution 8 - C++	Rolf Kristensen	View Answer on Stackoverflow
Solution 9 - C++	jxh	View Answer on Stackoverflow
Solution 10 - C++	m0meni	View Answer on Stackoverflow
Solution 11 - C++	anon	View Answer on Stackoverflow
Solution 12 - C++	Heygard Flisch	View Answer on Stackoverflow
Solution 13 - C++	Pietro Saccardi	View Answer on Stackoverflow
Solution 14 - C++	sashoalm	View Answer on Stackoverflow
Solution 15 - C++	marcp	View Answer on Stackoverflow
Solution 16 - C++	d99kris	View Answer on Stackoverflow
Solution 17 - C++	Alexander	View Answer on Stackoverflow
Solution 18 - C++	Jim M.	View Answer on Stackoverflow
Solution 19 - C++	ravenspoint	View Answer on Stackoverflow
Solution 20 - C++	Amruta Ghodke	View Answer on Stackoverflow
Solution 21 - C++	Elizabeth Card	View Answer on Stackoverflow
Solution 22 - C++	nikos_k	View Answer on Stackoverflow
Solution 23 - C++	vadamsky	View Answer on Stackoverflow
Solution 24 - C++	jav	View Answer on Stackoverflow
Solution 25 - C++	Pedro Vicente	View Answer on Stackoverflow
Solution 26 - C++	Olikay Gokce	View Answer on Stackoverflow
Solution 27 - C++	MadH	View Answer on Stackoverflow
Solution 28 - C++	NPike	View Answer on Stackoverflow
Solution 29 - C++	Fabien	View Answer on Stackoverflow
Solution 30 - C++	Antonello	View Answer on Stackoverflow
Solution 31 - C++	Maxim Egorushkin	View Answer on Stackoverflow
Solution 32 - C++	scap3y	View Answer on Stackoverflow
Solution 33 - C++	g24l	View Answer on Stackoverflow
Solution 34 - C++	victimofleisure	View Answer on Stackoverflow
Solution 35 - C++	Jack Of Blades	View Answer on Stackoverflow
Solution 36 - C++	Romain Laneuville	View Answer on Stackoverflow
Solution 37 - C++	user2891006	View Answer on Stackoverflow
Solution 38 - C++	Exlife	View Answer on Stackoverflow