Splitting a string by a character

C++StringAlgorithm

C++ Problem Overview


I know this is a quite easy problem but I just want to solve it for myself once and for all

I would simply like to split a string into an array using a character as the split delimiter. (Much like the C#'s famous .Split() function. I can of course apply the brute-force approach but I wonder if there anything better than that.

So far the I've searched and probably the closest solution approach is the usage of strtok(), however due to it's inconvenience(converting your string to a char array etc.) I do not like using it. Is there any easier way to implement this?

Note: I wanted to emphasize this because people might ask "How come brute-force doesn't work". My brute-force solution was to create a loop, and use the substr() function inside. However since it requires the starting point and the length, it fails when I want to split a date. Because user might enter it as 7/12/2012 or 07/3/2011, where I can really tell the length before calculating the next location of '/' delimiter.

C++ Solutions


Solution 1 - C++

Using vectors, strings and stringstream. A tad cumbersome but it does the trick.

#include <string>
#include <vector>
#include <sstream>

std::stringstream test("this_is_a_test_string");
std::string segment;
std::vector<std::string> seglist;

while(std::getline(test, segment, '_'))
{
   seglist.push_back(segment);
}

Which results in a vector with the same contents as

std::vector<std::string> seglist{ "this", "is", "a", "test", "string" };

Solution 2 - C++

Boost has the split() you are seeking in algorithm/string.hpp:

std::string sample = "07/3/2011";
std::vector<std::string> strs;
boost::split(strs, sample, boost::is_any_of("/"));

Solution 3 - C++

Another way (C++11/boost) for people who like RegEx. Personally I'm a big fan of RegEx for this kind of data. IMO it's far more powerful than simply splitting strings using a delimiter since you can choose to be be a lot smarter about what constitutes "valid" data if you wish.

#include <string>
#include <algorithm>    // copy
#include <iterator>     // back_inserter
#include <regex>        // regex, sregex_token_iterator
#include <vector>
 
int main()
{
    std::string str = "08/04/2012";
    std::vector<std::string> tokens;
    std::regex re("\\d+");
 
    //start/end points of tokens in str
    std::sregex_token_iterator
        begin(str.begin(), str.end(), re),
        end;
 
    std::copy(begin, end, std::back_inserter(tokens));
}

Solution 4 - C++

Since nobody has posted this yet: The [tag:c++20] solution is very simple using ranges. You can use a std::ranges::views::split to break up the input, and then transform the input into std::string or std::string_view elements.

#include <ranges>


...

// The input to transform
const auto str = std::string{"Hello World"};

// Function to transform a range into a std::string
// Replace this with 'std::string_view' to make it a view instead.
auto to_string = [](auto&& r) -> std::string {
    const auto data = &*r.begin();
    const auto size = static_cast<std::size_t>(std::ranges::distance(r));

    return std::string{data, size};
};

const auto range = str | 
                   std::ranges::views::split(' ') | 
                   std::ranges::views::transform(to_string);

for (auto&& token : str | range) {
    // each 'token' is the split string
}

This approach can realistically compose into just about anything, even a simple split function that returns a std::vector<std::string>:

auto split(const std::string& str, char delimiter) -> std::vector<std::string>
{
    const auto range = str | 
                       std::ranges::views::split(delimiter) | 
                       std::ranges::views::transform(to_string);

    return {std::ranges::begin(range), std::ranges::end(range)};
}

Live Example

Solution 5 - C++

Another possibility is to imbue a stream with a locale that uses a special ctype facet. A stream uses the ctype facet to determine what's "whitespace", which it treats as separators. With a ctype facet that classifies your separator character as whitespace, the reading can be pretty trivial. Here's one way to implement the facet:

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        // we'll assume dates are either a/b/c or a-b-c:
        rc['/'] = std::ctype_base::space;
        rc['-'] = std::ctype_base::space;
        return &rc[0];
    }
};

We use that by using imbue to tell a stream to use a locale that includes it, then read the data from that stream:

std::istringstream in("07/3/2011");
in.imbue(std::locale(std::locale(), new field_reader);

With that in place, the splitting becomes almost trivial -- just initialize a vector using a couple of istream_iterators to read the pieces from the string (that's embedded in the istringstream):

std::vector<std::string>((std::istream_iterator<std::string>(in),
                          std::istream_iterator<std::string>());

Obviously this tends toward overkill if you only use it in one place. If you use it much, however, it can go a long ways toward keeping the rest of the code quite clean.

Solution 6 - C++

I inherently dislike stringstream, although I'm not sure why. Today, I wrote this function to allow splitting a std::string by any arbitrary character or string into a vector. I know this question is old, but I wanted to share an alternative way of splitting std::string.

This code omits the part of the string you split by from the results altogether, although it could be easily modified to include them.

#include <string>
#include <vector>

void split(std::string str, std::string splitBy, std::vector<std::string>& tokens)
{
	/* Store the original string in the array, so we can loop the rest
	 * of the algorithm. */
	tokens.push_back(str);

	// Store the split index in a 'size_t' (unsigned integer) type.
	size_t splitAt;
	// Store the size of what we're splicing out.
	size_t splitLen = splitBy.size();
	// Create a string for temporarily storing the fragment we're processing.
	std::string frag;
	// Loop infinitely - break is internal.
	while(true)
	{
		/* Store the last string in the vector, which is the only logical
		 * candidate for processing. */
		frag = tokens.back();
		/* The index where the split is. */
		splitAt = frag.find(splitBy);
		// If we didn't find a new split point...
		if(splitAt == std::string::npos)
		{
			// Break the loop and (implicitly) return.
			break;
		}
		/* Put everything from the left side of the split where the string
		 * being processed used to be. */
		tokens.back() = frag.substr(0, splitAt);
		/* Push everything from the right side of the split to the next empty
		 * index in the vector. */
		tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));
	}
}

To use, just call like so...

std::string foo = "This is some string I want to split by spaces.";
std::vector<std::string> results;
split(foo, " ", results);

You can now access all the results in the vector at will. Simple as that - no stringstream, no third-party libraries, no dropping back to C!

Solution 7 - C++

Take a look at boost::tokenizer

If you'd like to roll up your own method, you can use std::string::find() to determine the splitting points.

Solution 8 - C++

Is there a reason you don't want to convert a string to a character array (char*) ? It's rather easy to call .c_str(). You can also use a loop and the .find() function.

string class
string .find()
string .c_str()

Solution 9 - C++

For those who don't have (want, need) C++20 this C++11 solution might be an option.

It is templated on an output iterator so you can supply your own destination where the split items should be appended to and provides a choice of how to handle multiple consecutive separation characters.

Yes it uses std::regex but well, if you're already in C++11 happy land why not use it.

////////////////////////////////////////////////////////////////////////////
//
// Split string "s" into substrings delimited by the character "sep"
// skip_empty indicates what to do with multiple consecutive separation
// characters:
//
// Given s="aap,,noot,,,mies"
//       sep=','
//
// then output gets the following written into it:
//      skip_empty=true  => "aap" "noot" "mies"
//      skip_empty=false => "aap" "" "noot" "" "" "mies"
//
////////////////////////////////////////////////////////////////////////////
template <typename OutputIterator>
void string_split(std::string const& s, char sep, OutputIterator output, bool skip_empty=true) {
    std::regex  rxSplit( std::string("\\")+sep+(skip_empty ? "+" : "") );

    std::copy(std::sregex_token_iterator(std::begin(s), std::end(s), rxSplit, -1),
              std::sregex_token_iterator(), output);
}

Solution 10 - C++

I know this solution is not rational, but it is effective. This method is provided here in order to be a variant of the solution of the current problem.

#include <iostream>
#include <vector>
#include <string>
using namespace std;
const int maximumSize=40;
vector<int> visited(maximumSize, 0);
string word;
void showContentVectorString(vector<string>& input)
{
    for(int i=0; i<input.size(); ++i)
    {
	    cout<<input[i]<<", ";
    }
    return;
}
void dfs(int current, int previous, string& input, vector<string>& output, char symbol)
{
    if(visited[current]==1)
    {
	    return;
    }
    visited[current]=1;
    string stringSymbol;
    stringSymbol.push_back(symbol);
    if(input[current]!=stringSymbol[0])
    {
	    word.push_back(input[current]);
    }
    else
    {
	    output.push_back(word);
	    word.clear();
    }
    if(current==(input.size()-1))
    {
	    output.push_back(word);
	    word.clear();
    }
    for(int next=(current+1); next<input.size(); ++next)
    {
	    if(next==previous)
	    {
		    continue;
	    }
	    dfs(next, current, input, output, symbol);
    }
    return;
}
void solve()
{
    string testString="this_is_a_test_string";
    vector<string> vectorOfStrings;
    dfs(0, -1, testString, vectorOfStrings, '_');
    cout<<"vectorOfStrings <- ";
    showContentVectorString(vectorOfStrings);
    return;
}
int main()
{
    solve();
    return 0;
}

Here is the result:

vectorOfStrings <- this, is, a, test, string,

Solution 11 - C++

One solution I have been using quite a while is a split that can be used with vectors and lists alike

#include <vector>
#include <string>
#include <list>

template< template<typename,typename> class Container, typename Separator >
Container<std::string,std::allocator<std::string> > split( const std::string& line, Separator sep ) {
    std::size_t pos = 0;
    std::size_t next = 0;
    Container<std::string,std::allocator<std::string> > fields;
    while ( next != std::string::npos ) {
        next = line.find_first_of( sep, pos );
        std::string field = next == std::string::npos ? line.substr(pos) : line.substr(pos,next-pos);
        fields.push_back(  field );
        pos = next + 1;
    }
    return fields;
}

int main() {
    auto res1 = split<std::vector>( "abc,def", ",:" );
    auto res2 = split<std::list>( "abc,def", ',' );
}

Solution 12 - C++

What about erase() function? If you know exakt position in string where to split, then you can "extract" fields in string with erase().

std::string date("01/02/2019");
std::string day(date);
std::string month(date);
std::string year(date);

day.erase(2, string::npos); // "01"
month.erase(0, 3).erase(2); // "02"
year.erase(0,6); // "2019"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAliView Question on Stackoverflow
Solution 1 - C++thelazydeveloperView Answer on Stackoverflow
Solution 2 - C++chrisaycockView Answer on Stackoverflow
Solution 3 - C++Ben CottrellView Answer on Stackoverflow
Solution 4 - C++Human-CompilerView Answer on Stackoverflow
Solution 5 - C++Jerry CoffinView Answer on Stackoverflow
Solution 6 - C++CodeMouse92View Answer on Stackoverflow
Solution 7 - C++Rafał RawickiView Answer on Stackoverflow
Solution 8 - C++xikkubView Answer on Stackoverflow
Solution 9 - C++emveeView Answer on Stackoverflow
Solution 10 - C++Vadim ChernetsovView Answer on Stackoverflow
Solution 11 - C++user8143588View Answer on Stackoverflow
Solution 12 - C++Mubin IcyerView Answer on Stackoverflow