Last Update:
How to Measure String SSO Length with constinit and constexpr
In this text you’ll learn about a few techniques and experiments with constexpr
and constinit
keywords. By exploring the string implementation, you’ll also see why constinit
is so powerful.
What is SSO
Just briefly, SSO stands for Short String Optimization. It’s usually implemented as a small buffer (an array or something similar) occurring in the same storage as the string object. When the string is short, this buffer is used instead of a separate dynamic memory allocation.
See a simplified diagram below:
The diagram illustrates two strings and where they “land” in the string object. If the string is long (longer than N characters), it needs a costly dynamic memory allocation, and the address to that new buffer will be stored in ptr
. On the other hand, if the string is short, we can put it inside the object in the buf[N]
. Usually, buf
and ptr
might be implemented as union
to save space, as we use one or the other, but not both simultaneously.
Let’s start with a basic test and see what’s the stack size of std::string
using sizeof()
:
int main() {
return sizeof(std::string);
}
Run at Compiler Explorer
GCC and MSVC show 32, while the libc++
implementation for Clang returns 24!
And now, it’s time to check the length of that short string; how can we check it? We have several options:
- at runtime
constexpr
since C++20constinit
since C++20- just checking for
std::string{}.capacity();
- and we can always look into real implementation and check the code :)
Let’s start with the first obvious option:
Checking length by using capacity()
As pointed out in comments at reddit (thanks VinnieFalco) - you can check the size of the SSO via capacity()
of the empty string:
#include <string>
int main() {
constexpr auto ssoLen = std::string{}.capacity();
static_assert(ssoLen >= 15);
return static_cast<int>(ssoLen);
}
- GCC and MSVC shows
Program returned: 15
- Clang prints
Program returned: 23
Let’s have a look at some other experiments.
Checking length at runtime
To check the length of the small buffer, we can write a new()
handler and simply watch when new
is used when creating a string object:
#include <string>
#include <iostream>
void* operator new(std::size_t size) {
auto ptr = malloc(size);
if (!ptr)
throw std::bad_alloc{};
std::cout << "new: " << size << ", ptr: " << ptr << '\n';
return ptr;
}
// operator delete...
int main() {
std::string x { "123456789012345"}; // 15 characters + null
std::cout << x << '\n';
}
Here’s the code @Compiler Explorer
When you run the application, you’ll see that only the string is printed to the output.
But if you change the string to:
std::string x { "1234567890123456"}; // 16 characters + null
GCC reports:
new: 17, ptr: 0x8b82b0
1234567890123456
Similarly, MSVC (running local MSVC release, as it doesn’t work under Compiler Explorer)
new: 32, ptr: 000001CD37720B00
1234567890123456
Clang is still “silent,”… but let’s change the string to:
std::string x { "12345678901234567890123"}; // 23 characters + null
Now, the libc++
implementation requests some dynamic memory. (Here’s a good overview of how it’s achieved: libc++’s implementation of std::string | Joel Laity)
In summary
- GCC and MSVC can hold 15 characters (assuming
char
type, notwchar_t
), - The Clang implementation (
-stdlib=libc++
) can store 23 characters! It’s very impressive, as the size of the whole string is only 24 bytes!
That was a simple and “classic” experiment… but in C++20, we can also check it at compile time!
constexpr
strings
Let’s start with constexpr
. In C++20, strings and also vectors are constexpr
ready.
What’s more, we have even constexpr
dynamic memory allocations in C++20.
The dynamic allocation at compile time can occur only in the context of a function execution, and the allocated memory buffer cannot “move” to the runtime. In other words, it’s not “transitive”. I wrote about it in a separate blog post: constexpr Dynamic Memory Allocation, C++20 - C++ Stories
In short, we can try the following code:
#include <string>
#include <iostream>
constexpr std::string str15 {"123456789012345"};
//constexpr std::string str16 {"1234567890123456"}; // doesn't compile
int main() {
std::cout << str15 << '\n';
}
Run at Compiler Explorer
The above code creates a string using constexpr
with 15 characters, and since it fits into an SSO buffer, it doesn’t violate any constexpr
requirements. On the other hand, str16
would need a dynamic memory allocation, and thus the compiler reports:
/opt/compiler-explorer/gcc-trunk-20221121/include/c++/13.0.0/bits/allocator.h:195:52: error: 'std::__cxx11::basic_string<char>(((const char*)"1234567890123456"), std::allocator<char>())' is not a constant expression because it refers to a result of 'operator new'
195 | return static_cast<_Tp*>(::operator new(__n));
| ~~~~~~~~~~~~~~^~~~~
Currently (Nov 2022), the libc++
implementation doesn’t seem to compile, so it might have some C++20 issues.
But it’s not all in C++20, as we can do more:
constant initialization
In C++20, we also have a new keyword, constinit
- it forces constant initialization of non-local objects. In short, our object will be initialized at compile time, but we can later change it like a regular global variable.
We can rewrite our previous example to:
#include <string>
#include <iostream>
constinit std::string global {"123456789012345"};
int main() {
std::cout << global << '\n';
// but allow to change later...
global = "abc";
std::cout << global;
}
If you extend the string and add one more letter:
constinit std::string global {"1234567890123456"};
You’ll get the following error:
error: 'constinit' variable 'global' does not have a constant initializer
Summary
It was a fun experiment! In C++20, you can rely on constant initialization and constexpr
strings to check SSO length.
I’m not advocating using global objects, but if you need them, then constexpr
might be good. As you can see, if you have short strings, then they can be safely initialized at compile time.
As pointed out by 2kaud in comments, to store constexpr
string literals you can also leverage string_view
that can hold any length of a string literal:
constexpr std::string_view resName { "a very important resource long name..." };
As a side note:
The other name for this kind of optimization is SBO - Small Buffer Optimization. This can be applied not only to strings but, for example, to objects like std::any
or even containers (std::vector
by design doesn’t offer this optimization, but we can imagine a similar non-standard container with a small buffer).
References
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here: