Last Update:
Speeding Up string_view String Split Implementation
Table of Contents
Thank you for all the comments about the string_view
performance! Last
week I got a lot of feedback on how to improve the initial string split
code.
Have a look at how can we update the code and get some better performance.
Intro
Last week I showed a few examples of
string_view
.
Obviously, in most of the cases string_view
was much faster than the
standard string
. A view is a non-owning reference, so there’s no need
to copy the data - only [ptr, len]
is needed to mark the referenced
range. Moreover, string_view
was added into the Standard Library
because of the performance.
Maybe my string_view
vs string
tests were not needed because the
results were too obvious?
As always it’s not that easy. Running benchmarks is hard, and sometimes results might be entirely unexpected.
For example, the last time one string
implementation was faster than
the string_view
counterpart…
Here’s the simple benchmark of string split algorithm, results from GCC 8.1
As you can see, the string_view
version is slower!
Let’s try to understand why.
The Series
This article is part of my series about C++17 Library Utilities. Here’s the list of the other topics that I’ll cover:
- Refactoring with
std::optional
- Using
std::optional
- Error handling and
std::optional
- About
std::variant
- About
std::any
- In place construction for
std::optional
,std::variant
andstd::any
std::string_view
Performance- Follow up Post (this post)
- C++17 string searchers & conversion utilities
- Working with
std::filesystem
- Something more?
Resources about C++17 STL:
- C++17 In Detail by Bartek!
- C++17 - The Complete Guide by Nicolai Josuttis
- C++ Fundamentals Including C++ 17 by Kate Gregory
- Practical C++14 and C++17 Features - by Giovanni Dicanio
- C++17 STL Cookbook by Jacek Galowicz
The Case
The algorithm that I tested last week was a string split implementation.
As you saw in the image above, the performance of string_view
was not
perfect.
Here’s the code:
std::vector<std::string>
split(const std::string& str, const std::string& delims = " ")
{
std::vector<std::string> output;
auto first = std::cbegin(str);
while (first != std::cend(str))
{
const auto second = std::find_first_of(first, std::cend(str),
std::cbegin(delims), std::cend(delims));
if (first != second)
output.emplace_back(first, second);
if (second == std::cend(str))
break;
first = std::next(second);
}
return output;
}
Now the string_view
version:
std::vector<std::string_view>
splitSV(std::string_view strv, std::string_view delims = " ")
{
std::vector<std::string_view> output;
size_t first = 0;
while (first < strv.size())
{
const auto second = strv.find_first_of(delims, first);
if (first != second)
output.emplace_back(strv.substr(first, second-first));
if (second == std::string_view::npos)
break;
first = second + 1;
}
return output;
}
The readers pointed out that the initial implementations used different combinations of features:
string
implementation used iterators andstd::find_first_of
string_view
usedstd::string_view::find_first_of
- a member function.
When you change the string_view
view version, so that it uses the
std::find_first_of
then the performance is much better!
For example:
See the benchmark; @QuickBench
One possible reason why the member function is slower than the
std::find_first_of
is that the member method uses memchr
. See this
comment by “en-em”.
The generic std::find_first_of
can be fully inlined by the compiler,
while the member function is not. It would be an interesting experiment
to figure out exactly why the generic std::
function is faster than a
member method. Is memchr
that slow (At least in GCC implementation)?
The second improvement comes from JFT who also implemented the algorithms using pointers and not iterators. That also gave a lot of speed increase.
Another idea was that we might preallocate some space at the beginning -
so that we have fewer vector reallocations. For example, we can assume
each word is 5…6 words and then use .reserve()
. While it works nicely,
we might end up with a bit larger vector - and later you’d probably want
to shrink_to_fit()
. And in total, I’ve noticed that it doesn’t bring
much performance gain. Some more tests would be needed here.
Final Benchmark
Here are the results from running 6 versions of the benchmark:
StringSplit
-string
withstd::string::find_first_of
- member functionStringSplitStd
-string
withstd::find_first_of
with iteratorsStringSplitPtr
-string
withstd::find_first_of
with pointersStringViewSplit
-string_view
withstd::string_view::find_first_of
- member functionStringViewSplitStd
-string_view
withstd::find_first_of
with iteratorsStringViewSplitPtr
-string_view
withstd::find_first_of
with pointers
GCC 8.1:
See at Quick Bench
And Clang 6.0 version:
The benchmark uses a static string, so there’s a chance that the compiler could optimise its usage somehow.
And here are the results from MSVC 2017.7. I’ve used a large string - 547412 characters, loaded from a file.
string length: 547412
test iterations: 100
string split: 731.008 ms
string split std: 586.843 ms
string split ptr: 562.683 ms
string_view split: 406.436 ms
string_view split std: 223.27 ms
string_view split ptr: 208.758 ms
In both experiments, we can see that the version of string_view, with
std::find_first_of
and pointer implementation is the fastest.
Summary
Once again thanks for all the comments under the last article. I hope I gathered all the essential details from the feedback :)
Here’s the GitHub to the MSVC tests:
github/StringViewTests
The results of those quick benchmarks have to taken with care. It’s always best to measure the final scenario, rather than sometimes artificial examples. Such benchmarks might give you a general direction towards the final solution (see Don’t trust quick-bench results you see on the internet).
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here: