How to efficiently get a `string_view` for a substring of `std::string`

C++ViewC++17Stdstring

C++ Problem Overview


Using http://en.cppreference.com/w/cpp/string/basic_string_view as a reference, I see no way to do this more elegantly:

std::string s = "hello world!";
std::string_view v = s;
v = v.substr(6, 5); // "world"

Worse, the naive approach is a pitfall and leaves v a dangling reference to a temporary:

std::string s = "hello world!";
std::string_view v(s.substr(6, 5)); // OOPS!

I seem to remember something like there might be an addition to the standard library to return a substring as a view:

auto v(s.substr_view(6, 5));

I can think of the following workarounds:

std::string_view(s).substr(6, 5);
std::string_view(s.data()+6, 5);
// or even "worse":
std::string_view(s).remove_prefix(6).remove_suffix(1);

Frankly, I don't think any of these are very nice. Right now the best thing I can think of is using aliases to simply make things less verbose.

using sv = std::string_view;
sv(s).substr(6, 5);

C++ Solutions


Solution 1 - C++

There's the free-function route, but unless you also provide overloads for std::string it's a snake-pit.

#include <string>
#include <string_view>

std::string_view sub_string(
  std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;

  // this is fine and elegant...
  auto bar = sub_string(source, 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

IMHO the whole design of string_view is a horror show which will take us back to a world of segfaults and angry customers.

update:

Even adding overloads for std::string is a horror show. See if you can spot the subtle segfault timebomb...

#include <string>
#include <string_view>

std::string_view sub_string(std::string_view s, 
  std::size_t p, 
  std::size_t n = std::string_view::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string&& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

std::string sub_string(std::string const& s, 
  std::size_t p, 
  std::size_t n = std::string::npos)
{
  return s.substr(p, n);
}

int main()
{
  using namespace std::literals;

  auto source = "foobar"s;
  auto bar = sub_string(std::string_view(source), 3);

  // but uh-oh...
  bar = sub_string("foobar"s, 3);
}

The compiler found nothing to warn about here. I am certain that a code review would not either.

I've said it before and I'll say it again, in case anyone on the c++ committee is watching, allowing implicit conversions from std::string to std::string_view is a terrible error which will only serve to bring c++ into disrepute.

Update

Having raised this (to me) rather alarming property of string_view on the cpporg message board, my concerns have been met with indifference.

The consensus of advice from this group is that std::string_view must never be returned from a function, which means that my first offering above is bad form.

There is of course no compiler help to catch times when this happens by accident (for example through template expansion).

As a result, std::string_view should be used with the utmost care, because from a memory management point of view it is equivalent to a copyable pointer pointing into the state of another object, which may no longer exist. However, it looks and behaves in all other respects like a value type.

Thus code like this:

auto s = get_something().get_suffix();

Is safe when get_suffix() returns a std::string (either by value or reference)

but is UB if get_suffix() is ever refactored to return a std::string_view.

Which in my humble view means that any user code that stores returned strings using auto will break if the libraries they are calling are ever refactored to return std::string_view in place of std::string const&.

So from now on, at least for me, "almost always auto" will have to become, "almost always auto, except when it's strings".

Solution 2 - C++

You can use the conversion operator from std::string to std::string_view:

std::string s = "hello world!";
std::string_view v = std::string_view(s).substr(6, 5);

Solution 3 - C++

This is how you can efficiently create a sub-string string_view.

#include <string>
inline std::string_view substr_view(const std::string& source, size_t offset = 0,
                std::string_view::size_type count = 
                std::numeric_limits<std::string_view::size_type>::max()) {
    if (offset < source.size()) 
        return std::string_view(source.data() + offset, 
                        std::min(source.size() - offset, count));
    return {};
}

#include <iostream>
int main(void) {
  std::cout << substr_view("abcd",3,11) << "\n";

  std::string s {"0123456789"};
  std::cout << substr_view(s,3,2) << "\n";

  // be cautious about lifetime, as illustrated at https://en.cppreference.com/w/cpp/string/basic_string_view
  std::string_view bad = substr_view("0123456789"s, 3, 2); // "bad" holds a dangling pointer
  std::cout << bad << "\n"; // possible access violation

  return 0;
}

Solution 4 - C++

I realize that the question is about C++17, but it's worth noting that C++20 introduced a string_view constructor that accepts two iterators to char (or whatever the base type is) which allows writing

std::string_view v{ s.begin() +6, s.begin()+6 +5 };

Not sure if there is a cleaner syntax, but it's not difficult to

#define RANGE(_container,_start,_length) (_container).begin() + (_start), (_container).begin() + (_start) + (_length)

for a final

std::string_view v{ RANGE(s,6,5) };

PS: I called RANGE's first parameter _container instead of _string for a reason: the macro can be used with any Container (or class supporting at least begin() and end()), even as part of a function call like

auto pisPosition= std::find( RANGE(myDoubleVector,11,23), std::numbers::pi );

PPS: When possible, prefer C++20's actual ranges library to this poor person's solution.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionseheView Question on Stackoverflow
Solution 1 - C++Richard HodgesView Answer on Stackoverflow
Solution 2 - C++CAFView Answer on Stackoverflow
Solution 3 - C++AlexanderView Answer on Stackoverflow
Solution 4 - C++Mario RossiView Answer on Stackoverflow