How to safely read a line from an std::istream?

C++

C++ Problem Overview


I want to safely read a line from an std::istream. The stream could be anything, e.g., a connection on a Web server or something processing files submitted by unknown sources. There are many answers starting to do the moral equivalent of this code:

void read(std::istream& in) {
    std::string line;
    if (std::getline(in, line)) {
        // process the line
    }
}

Given the possibly dubious source of in, using the above code would lead to a vulnerability: a malicious agent may mount a denial of service attack against this code using a huge line. Thus, I would like to limit the line length to some rather high value, say 4 millions chars. While a few large lines may be encountered, it isn't viable to allocate a buffer for each file and use std::istream::getline().

How can the maximum size of the line be limited, ideally without distorting the code too badly and without allocating large chunks of memory up front?

C++ Solutions


Solution 1 - C++

You could write your own version of std::getline with a maximum number of characters read parameter, something called getline_n or something.

#include <string>
#include <iostream>

template<typename CharT, typename Traits, typename Alloc>
auto getline_n(std::basic_istream<CharT, Traits>& in, std::basic_string<CharT, Traits, Alloc>& str, std::streamsize n) -> decltype(in) {
    std::ios_base::iostate state = std::ios_base::goodbit;
    bool extracted = false;
    const typename std::basic_istream<CharT, Traits>::sentry s(in, true);
    if(s) {
        try {
            str.erase();
            typename Traits::int_type ch = in.rdbuf()->sgetc();
            for(; ; ch = in.rdbuf()->snextc()) {
                if(Traits::eq_int_type(ch, Traits::eof())) {
                    // eof spotted, quit
                    state |= std::ios_base::eofbit;
                    break;
                }
                else if(str.size() == n) {
                    // maximum number of characters met, quit
                    extracted = true;
                    in.rdbuf()->sbumpc();
                    break;
                }
                else if(str.max_size() <= str.size()) {
                    // string too big
                    state |= std::ios_base::failbit;
                    break;
                }
                else {
                    // character valid
                    str += Traits::to_char_type(ch);
                    extracted = true;
                }
            }
        }
        catch(...) {
            in.setstate(std::ios_base::badbit);
        }
    }

    if(!extracted) {
        state |= std::ios_base::failbit;
    }
    
    in.setstate(state);
    return in;
}

int main() {
    std::string s;
    getline_n(std::cin, s, 10); // maximum of 10 characters
    std::cout << s << '\n';
}

Might be overkill though.

Solution 2 - C++

There is already such a getline function as a member function of istream, you just need to wrap it for buffer management.

#include <assert.h>
#include <istream>
#include <stddef.h>         // ptrdiff_t
#include <string>           // std::string, std::char_traits

typedef ptrdiff_t Size;

namespace my {
    using std::istream;
    using std::string;
    using std::char_traits;

    istream& getline(
        istream& stream, string& s, Size const buf_size, char const delimiter = '\n'
        )
    {
        s.resize( buf_size );  assert( s.size() > 1 );
        stream.getline( &s[0], buf_size, delimiter );
        if( !stream.fail() )
        {
            Size const n = char_traits<char>::length( &s[0] );
            s.resize( n );      // Downsizing.
        }
        return stream;
    }
}  // namespace my

Solution 3 - C++

Replace std::getline by creating a wrapper around std::istream::getline:

std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim )
    {
    try
       {
       str.resize(n);
       is.getline(&str[0],n,delim);
       str.resize(is.gcount());
       return is;
       }
    catch(...) { str.resize(0); throw; }
    }

If you want to avoid excessive temporary memory allocations, you could use a loop which grows the allocation as needed (probably doubling in size on each pass). Don't forget that exceptions might or might not be enabled on the istream object.

Here's a version with the more efficient allocation strategy:

std::istream& my::getline( std::istream& is, std::streamsize n, std::string& str, char delim )
    {
    std::streamsize base=0;
    do {
       try
          {
          is.clear();
          std::streamsize chunk=std::min(n-base,std::max(static_cast<std::streamsize>(2),base));
          if ( chunk == 0 ) break;
          str.resize(base+chunk);
          is.getline(&str[base],chunk,delim);
          }
       catch( std::ios_base::failure ) { if ( !is.gcount () ) str.resize(0), throw; }
       base += is.gcount();
       } while ( is.fail() && is.gcount() );
    str.resize(base);
    return is;
    }

Solution 4 - C++

Based on the comments and answers, there seem to be three approaches:

  1. Write a custom version of getline() possibly using the std::istream::getline() member internally to get the actual characters.
  2. Use a filtering stream buffer to limit the amount of data potentially received.
  3. Instead of reading a std::string, use a string instantiation with a custom allocator limiting the amount of memory stored in the string.

Not all of the suggestions came with code. This answer provides code for all approaches and a bit of discussion of all three approaches. Before going into implementation details it is first worth pointing out that there are multiple choices of what should happen if an excessively long input is received:

  1. Reading an overlong line could result in a successful read of a partial line, i.e., the resulting string contains the read content and the stream doesn't have any error flags set. Doing so means, however, that it isn't possible to distinguish between a line hitting exactly the limit or being too long. Since the limit is somewhat arbitrary anyway it probably doesn't really matter, though.
  2. Reading an overlong line could be considered a failure (i.e., setting std::ios_base::failbit and/or std::ios_base::bad_bit) and, since reading failed, yield an empty string. Yielding an empty string, obviously, prevents potentially looking at the string read so far to possibly see what's going on.
  3. Reading an overlong line could provide the partial line read and also set error flags on the stream. This seems reasonable behavior both detecting that there is something up and also providing the input for potential inspection.

Although there are multiple code examples implementing a limited version of getline() already, here is another one! I think it is simpler (albeit possibly slower; performance can be dealt with when necessary) which also retains's std::getline()s interface: it use the stream's width() to communicate a limit (maybe taking width() into account is a reasonable extension to std::getline()):

template <typename cT, typename Traits, typename Alloc>
std::basic_istream<cT, Traits>&
safe_getline(std::basic_istream<cT, Traits>& in,
             std::basic_string<cT, Traits, Alloc>& value,
             cT delim)
{
    typedef std::basic_string<cT, Traits, Alloc> string_type;
    typedef typename string_type::size_type size_type;

    typename std::basic_istream<cT, Traits>::sentry cerberos(in);
    if (cerberos) {
        value.clear();
        size_type width(in.width(0));
        if (width == 0) {
            width = std::numeric_limits<size_type>::max();
        }
        std::istreambuf_iterator<char> it(in), end;
        for (; value.size() != width && it != end; ++it) {
            if (!Traits::eq(delim, *it)) {
                value.push_back(*it);
            }
            else {
                ++it;
                break;
            }
        }
        if (value.size() == width) {
            in.setstate(std::ios_base::failbit);
        }
    }
    return in;
}

This version of getline() is used just like std::getline() but when it seems reasonable to limit the amount of data read, the width() is set, e.g.:

std::string line;
if (safe_getline(in >> std::setw(max_characters), line)) {
    // do something with the input
}

Another approach is to just use a filtering stream buffer to limit the amount of input: the filter would just count the number of characters processed and limit the amount to a suitable number of characters. This approach is actually easier applied to an entire stream than an individual line: when processing just one line, the filter can't just obtain buffers full of characters from the underlying stream because there is no reliable way to put the characters back. Implementing an unbuffered version is still simple but probably not particularly efficient:

template <typename cT, typename Traits = std::char_traits<char> >
class basic_limitbuf
    : std::basic_streambuf <cT, Traits> {
public:
    typedef Traits                    traits_type;
    typedef typename Traits::int_type int_type;

private:
    std::streamsize                   size;
    std::streamsize                   max;
    std::basic_istream<cT, Traits>*   stream;
    std::basic_streambuf<cT, Traits>* sbuf;

    int_type underflow() {
        if (this->size < this->max) {
            return this->sbuf->sgetc();
        }
        else {
            this->stream->setstate(std::ios_base::failbit);
            return traits_type::eof();
        }
    }
    int_type uflow()     {
        if (this->size < this->max) {
            ++this->size;
            return this->sbuf->sbumpc();
        }
        else {
            this->stream->setstate(std::ios_base::failbit);
            return traits_type::eof();
        }
    }
public:
    basic_limitbuf(std::streamsize max,
                   std::basic_istream<cT, Traits>& stream)
        : size()
        , max(max)
        , stream(&stream)
        , sbuf(this->stream->rdbuf(this)) {
    }
    ~basic_limitbuf() {
        std::ios_base::iostate state = this->stream->rdstate();
        this->stream->rdbuf(this->sbuf);
        this->stream->setstate(state);
    }
};

This stream buffer is already set up to insert itself upon construction and remove itself upon destruction. That is, it can be used simply like this:

std::string line;
basic_limitbuf<char> sbuf(max_characters, in);
if (std::getline(in, line)) {
    // do something with the input
}

It would be easy to add a manipulator setting up the limit, too. One advantage of this approach is that none of the reading code needs to be touched if the total size of the stream could be limited: the filter could be set up right after creating the stream. When there is no need to back out the filter, the filter could also use a buffer which would greatly improve the performance.

The third approach suggested is to use a std::basic_string with a custom allocator. There are two aspects which are a bit awkward about the allocator approach:

  1. The string being read actually has a type which isn't immediately convertible to std::string (although it also isn't hard to do the conversion).
  2. The maximum array size can be easily limited but the string will have some more or less random size smaller than that: when the stream fails allocating an exception is thrown and there is no attempt to grow the string by a smaller size.

Here is the necessary code for an allocator limiting the allocated size:

template <typename T>
struct limit_alloc
{
private:
    std::size_t max_;
public:
    typedef T value_type;
    limit_alloc(std::size_t max): max_(max) {}
    template <typename S>
    limit_alloc(limit_alloc<S> const& other): max_(other.max()) {}
    std::size_t max() const { return this->max_; }
    T* allocate(std::size_t size) {
        return size <= max_
            ? static_cast<T*>(operator new[](size))
            : throw std::bad_alloc();
    }
    void  deallocate(void* ptr, std::size_t) {
        return operator delete[](ptr);
    }
};

template <typename T0, typename T1>
bool operator== (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) {
    return a0.max() == a1.max();
}
template <typename T0, typename T1>
bool operator!= (limit_alloc<T0> const& a0, limit_alloc<T1> const& a1) {
    return !(a0 == a1);
}

The allocator would be used something like this (the code compiles OK with a recent version of clang but not with gcc):

std::basic_string<char, std::char_traits<char>, limit_alloc<char> >
    tmp(limit_alloc<char>(max_chars));
if (std::getline(in, tmp)) {
    std::string(tmp.begin(), tmp.end());
    // do something with the input
}

In summary, there are multiple approach each with its own small drawback but each reasonably viable for the stated goal of limiting denial of service attacks based on overlong lines:

  1. Using a custom version of getline() means the reading code needs to be changed.
  2. Using a custom stream buffer is slow unless the entire stream's size can be limited.
  3. Using a custom allocator gives less control and requires some changes to reading code.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDietmar K&#252;hlView Question on Stackoverflow
Solution 1 - C++RapptzView Answer on Stackoverflow
Solution 2 - C++Cheers and hth. - AlfView Answer on Stackoverflow
Solution 3 - C++Brent BradburnView Answer on Stackoverflow
Solution 4 - C++Dietmar KühlView Answer on Stackoverflow