Last Update:
Solving Undefined Behavior in Factories with constinit from C++20
Table of Contents
A few years ago (see here, I showed an interesting implementation for self-registering classes in factories. It works, but one step might be at the edge of Undefined behavior. Fortunately, with C++20, its new constinit
keyword, we can update the code and ensure it’s super safe.
Intro
Let’s bring back the topic:
Here’s a typical factory function. It creates unique_ptr
with ZipCompression
or BZCompression
based on the passed name/filename:
static unique_ptr<ICompressionMethod> Create(const string& fileName) {
auto extension = GetExtension(filename);
if (extension == "zip")
return make_unique<ZipCompression>();
else if (extension = "bz")
return make_unique<BZCompression>();
return nullptr;
}
Here are some issues with this approach:
- Each time you write a new class, and you want to include it in the factory, you have to add another if in the
Create()
method. Easy to forget in a complex system. - All the types must be known to the factory.
- In
Create()
, we arbitrarily used strings to represent types. Such representation is only visible in that single method. What if you’d like to use it somewhere else? Strings might be easily misspelled, especially if you have several places where they are compared.
All in all, we get a strong dependency between the factory and the classes.
But what if classes could register themselves? Would that help?
- The factory would do its job: create new objects based on some matching.
- If you write a new class, there’s no need to change parts of the factory class. Such a class would register automatically.
Google Test
To give you more motivation, I’d like to show one real-life example. When you use Google Test library, and you write:
TEST(MyModule, InitTest) {
// impl...
}
Behind this single TEST
macro, a lot of things happen! For starters, your test is expanded into a separate class - so each test is a new class. But then, there’s a problem: you have all the tests, so how the test runner knows about them? It’s the same problem were’ trying to solve in this section. The classes need to be auto-registered.
Have a look at this code: from googletest/…/gtest-internal.h:
// (some parts of the code cut out)
#define GTEST_TEST_(test_case_name, test_name, parent_class, parent_id)\
class GTEST_TEST_CLASS_NAME_(test_case_name, test_name) \
: public parent_class { \
virtual void TestBody();\
static ::testing::TestInfo* const test_info_ GTEST_ATTRIBUTE_UNUSED_;\
};\
\
::testing::TestInfo* const GTEST_TEST_CLASS_NAME_(test_case_name, test_name)\
::test_info_ =\
::testing::internal::MakeAndRegisterTestInfo(\
#test_case_name, #test_name, NULL, NULL, \
new ::testing::internal::TestFactoryImpl<\
GTEST_TEST_CLASS_NAME_(test_case_name, test_name)>);\
void GTEST_TEST_CLASS_NAME_(test_case_name, test_name)::TestBody()
I cut some parts of the code to make it shorter, but basically, GTEST_TEST_
is used in the TEST
macro, and this will expand to a new class. In the lower section, you might see the name MakeAndRegisterTestInfo
. So here’s the place where the class registers!
After the registration, the runner knows all the existing tests and can invoke them.
Implementing the factory
Here are the steps to implement a similar system:
- Some Interface - we’d like to create classes derived from one interface. It’s the exact requirement as a “normal” factory method.
- Factory class that also holds a map of available types.
- A proxy that will be used to create a given class. The factory doesn’t know how to create a given type now, so we have to provide some proxy classes.
For the interface, we can use ICompressionMethod
:
class ICompressionMethod {
public:
ICompressionMethod() = default;
virtual ~ICompressionMethod() = default;
virtual void Compress() = 0;
};
And then the factory:
class CompressionMethodFactory {
public:
using TCreateMethod = unique_ptr<ICompressionMethod>(*)();
public:
CompressionMethodFactory() = delete;
static bool Register(const string& name, TCreateMethod funcCreate);
static unique_ptr<ICompressionMethod> Create(const string& name);
private:
static Map<string, TCreateMethod> s_methods;
};
The factory holds the map of registered types. The main point is that the factory now uses some method (TCreateMethod
) to create the desired type (our proxy). The name of a type and that creation method must be initialized in a different place.
The implementation of the factory:
class CompressionMethodFactory {
public:
using TCreateMethod = unique_ptr<ICompressionMethod>(*)();
public:
CompressionMethodFactory() = delete;
static constexpr bool Register(string_view name,
TCreateMethod createFunc) {
if (auto val = s_methods.at(name, nullptr); val == nullptr) {
if (s_methods.insert(name, createFunc)) {
std::cout << name << " registered\n";
return true;
}
}
return false;
}
static std::unique_ptr<ICompressionMethod> Create(string_view name) {
if (auto val = s_methods.at(name, nullptr); val != nullptr) {
std::cout << "calling " << name << "\n";
return val();
}
return nullptr;
}
private:
static inline constinit Map<string_view, TCreateMethod, 4> s_methods;
};
Now we can implement a derived class from ICompressionMethod
that will register in the factory:
class ZipCompression : public ICompressionMethod {
public:
virtual void Compress() override;
static unique_ptr<ICompressionMethod> CreateMethod() {
return std::make_unique<ZipCompression>();
}
static string_view GetFactoryName() { return "ZIP"; }
private:
static inline bool s_registered =
CompressionMethodFactory::Register(ZipCompression::GetFactoryName(),
CreateMethod);
};
The downside of self-registration is that there’s a bit more work for a class. As you can see, we must have a static CreateMethod
defined.
To register such a class, all we have to do is to define s_registered
:
bool ZipCompression::s_registered =
CompressionMethodFactory::Register(ZipCompression::GetFactoryName(),
ZipCompression::CreateMethod);
The basic idea for this mechanism is that we rely on static variables. They will be initialized before main()
is called through dynamic initialization.
Because the order of initialization of static variables in different compilation units is unspecified, we might end up with a different order of elements in the factory container. Each name/type is not dependent on other already registered types in our example, so we’re safe here.
Tricky case with the map
But what about the first insertion? Can we be sure that the Map
is created and ready for use?
I asked this question at SO: C++ static initialization order: adding into a map - Stack Overflow. Here’s the rough summary:
The behavior in this scenario is technically undefined, according to the C++ standard, because the initialization order of static variables across different translation units (i.e., .cpp files) is not specified. The map
Factory::s_map
and the static variable registered are in different translation units in the example. As a result, it’s possible for the registered variable to be initialized (and thusFactory::Register
to be called) beforeFactory::s_map
is initialized. If this happens, the program will attempt to insert an element into an uninitialized map, leading to undefined behavior.
But in C++20, we can do better.
What if we could force the compiler to use constant initialization for the map? That way, no matter the order of compilation units, the value will be already present and ready to use.
This can be achieved through the constinit
keyword. I described it in my other post, but for a quick summary:
This new keyword for C++20 forces constant initialization. It will ensure that the value will already be present and initialized no matter the compilation order. What’s more, as opposed to
constexpr
, we only force initialization, and the variable itself is not constant. So you can change it later.
I implemented a special version of Map
, which has a constexpr
constructor (implicit) and, thanks to constinit
, will be initialized before s_registered
is initialized (for some first registered classes).
My current implementation uses std::array
, which can be used in constant expressions. We could potentially use std::map,
but it would be at the edge of Undefined Behavior, so it’s not guaranteed to work. In the final code, you can also experiment with std::vector
, which got constexpr
support in C++20.
template <typename Key, typename Value, size_t Size>
struct Map {
std::array<std::pair<Key, Value>, Size> data;
size_t slot_ { 0 };
constexpr bool insert(const Key &key, const Value& val) {
if (slot_ < Size) {
data[slot_] = std::make_pair(key, val);
++slot_;
return true;
}
return false;
}
[[nodiscard]] constexpr Value at(const Key &key, const Value& none) const {
const auto itr =
std::find_if(begin(data), end(data),
[&key](const auto &v) { return v.first == key; });
if (itr != end(data)) {
return itr->second;
} else {
return none;
}
}
};
And the Factory
class CompressionMethodFactory {
public:
using TCreateMethod = std::unique_ptr<ICompressionMethod>(*)();
public:
CompressionMethodFactory() = delete;
static constexpr bool Register(std::string_view name, TCreateMethod createFunc) {
if (auto val = s_methods.at(name, nullptr); val == nullptr) {
if (s_methods.insert(name, createFunc)) {
std::cout << name << " registered\n";
return true;
}
}
return false;
}
static std::unique_ptr<ICompressionMethod> Create(std::string_view name) {
if (auto val = s_methods.at(name, nullptr); val != nullptr) {
std::cout << "calling " << name << "\n";
return val();
}
return nullptr;
}
private:
static inline constinit Map<std::string_view, TCreateMethod, 4> s_methods;
};
Optimizing s_registered
?
We should also ask one question: Can the compiler eliminate s_registered
? Fortunately, we’re also on the safe side. From the latest draft of C++: [basic.stc.static#2]:
If a variable with static storage duration has initialization or a destructor with side effects, it shall not be eliminated even if it appears to be unused, except that a class object or its copy/move may be eliminated as specified in class.copy.elision.
Since s_registered
has an initialization with side effects (calling Register()
), the compiler cannot optimize it.
(See how it works in a library: Static Variables Initialization in a Static Library, Example - C++ Stories)
Final Demo
See the full example @Wandbox
#include "ICompressionMethod.h"
#include "ZipCompression.h"
#include <iostream>
int main() {
std::cout << "main starts...\n";
if (auto pMethod = CompressionMethodFactory::Create("ZIP"); pMethod)
pMethod->Compress();
else
std::cout << "Cannot find ZIP...\n";
if (auto pMethod = CompressionMethodFactory::Create("BZ"); pMethod)
pMethod->Compress();
else
std::cout << "Cannot find BZ...\n";
if (auto pMethod = CompressionMethodFactory::Create("7Z"); pMethod)
pMethod->Compress();
else
std::cout << "Cannot find 7Z...\n";
}
Here’s the sequence diagram for this demo:
The expected output:
comparing: ZIP to
comparing: ZIP to
comparing: ZIP to
comparing: ZIP to
inserting at 0
ZIP|0x5586bceb06f0
ZIP registered
comparing: BZ to ZIP
comparing: BZ to
comparing: BZ to
comparing: BZ to
inserting at 1
BZ|0x5586bceb21f0
BZ registered
main starts...
comparing: ZIP to ZIP
calling ZIP
Zip compression...
comparing: BZ to ZIP
comparing: BZ to BZ
calling BZ
BZ compression...
comparing: 7Z to ZIP
comparing: 7Z to BZ
comparing: 7Z to
comparing: 7Z to
Cannot find 7Z...
Other ideas & notes
The solution with constinit
is not the only way to solve the issue with dependencies between static variables. Here are some other options and suggestions. Thanks for the comments!
- We can use “Meyer’s Singleton” (in short, create a function with a static variable inside) - this allows lazy initialization of a static variable, also in a thread-safe manner. That way, the first
s_registered
variable is created; it can call this function and ensure the map is ready to use.
This would look something like:
std::map<...>& GetRegistryMap() {
static std::map<...> mp;
return mp;
}
- We could try using CRTP and move
s_registered
into a separate base class. But this solution might allow the compiler to optimize away the variable and might not register a class. See the complete discussion here, in my initial post
More in the book
Learn more about various initialization rules, tricks and examples in my book:
Print version @Amazon
C++ Initialization Story Ebook @Leanpub
Summary
In this blog post, we delved into the intricate details of a technique for creating self-registering classes in C++20, enabling the creation of more flexible and scalable factory patterns. A central component of this solution is the constinit
keyword, a feature introduced in C++20 that guarantees the initialization of a variable at compile-time.
Our approach starts with the traditional factory function, which was shown to have some inherent shortcomings, namely: the need to update the factory with each new class, the factory’s requirement to be aware of all types, and the dependency between the factory and the classes.
To overcome these issues, we built a self-registration system inspired by how Google Test handles its test cases. The system consists of an interface from which the classes are derived, a factory class that holds a map of available types, and a proxy used to create a given class. The mechanism relies on static variables, which are initialized before the main()
function is called.
However, the order of initialization of static variables across different compilation units is unspecified, which could lead to issues. The central trick to avoiding these problems lies in the constinit
keyword, ensuring that the Map
we use to hold the registered classes is initialized at compile-time, before any static class registration occurs.
We demonstrated how this mechanism works through the example of a CompressionMethodFactory
and the ZipCompression
class. The system works in a way that each class registers itself in the factory, thus negating the need for the factory to have prior knowledge about it.
By using constinit
, we ensure that our system will behave consistently, regardless of the order in which static variables are initialized. This feature, coupled with a simple yet effective map implementation that uses a constexpr
constructor, ensures our approach is safe and well within the realm of defined behavior.
In conclusion, the use of constinit
in C++20 offers a powerful means to manage static initialization order, creating opportunities for more robust and maintainable code structures. It provides a strong foundation for self-registering classes and factories, simplifying the process and reducing dependency issues. The technique explored in this post signifies the strength of modern C++ features and their capability to solve complex problems in an elegant and efficient manner.
Back to you
- Have you tried self-registering classes?
- Do you know any other patterns helpful with factories?
Share your feedback below in the comments.
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here: