Table of Contents

In this blog post, we’ll show how to implement a custom pipe operator and apply it to a data processing example. Thanks to C++23 and std::expectedwe can write a rather efficient framework that easily handles unexpected outcomes.

This is a collaborative guest post by prof. Bogusław Cyganek:

Prof. Cyganek's Book

Prof. Cyganek is a researcher and lecturer at the Department of Electronics, AGH University of Science and Technology in Cracow, Poland. He has worked as a software engineer for a number of companies such as Nisus Writer USA, Compression Techniques USA, Manta Corp. USA, Visual Atoms UK, Wroclaw University in Poland, and Diagnostyka Inc. Poland. His research interests include computer vision and pattern recognition, as well as the development of embedded systems. See his recent book at Amazon and his home page. Prof. Cyganek also provides commercial training for Modern C++, Standard Library, and more.

A(R)

is equivalent to

R | A

The range adaptor closure objects can be chained by operator |. If A and B are RACO, then

A | B

is another RACO C that fulfills the following condition:

  • C stores copies of A and B, each directly initialized from std::forward<decltype((T))>(T), for T being A or B, respectively.
  • If a and b are those stored copies of A and B, respectively, and R is a range object, then the following expressions are equivalent:
b(a(R))
R | a | b
C(R)
R | C
R | (A | B)

A Basic Example  

Below, there’s an example of an overloaded operator |. The example is inspired by the CppCon talk: Functional Composable Operations with Unix-Style Pipes in C++ - Ankur Satle - CppCon 2022 Its left operator is a function f, and its right parameter is s, passed by the right-reference std::string &&. The | operator simply calls f providing it with s, as done on line [4]. Let’s observe that s needs to be std::move’d, since it is a named object here. Hence, the callable f must be able to accept std::string && as its parameter and return std::string – for simplicity, exactly this is defined on line [1] as an alias Function:

1
2
3
4
5
using Function = std::function<std::string(std::string &&)>;

auto operator | (std::string &&s, Function f) -> std::string {
    return f(std::move(s));
}

To see the pipeline in action, let’s define a number of functions, starting on line [8], each processing std::string. That is, each of them extends the input string s, prints a diagnostic message, and finally returns the modified string, as follows:

 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
std::string StringProc_1(std::string &&s) {
    s += " proc by 1,";
    std::cout << "I'm in StringProc_1, s = " << s << "\n";
    return s;
}

std::string StringProc_2(std::string &&s) {
    s += " proc by 2,";
    std::cout << "I'm in StringProc_2, s = " << s << "\n";
    return s;
}

std::string StringProc_3(std::string &&s) {
    s += " proc by 3,";
    std::cout << "I'm in StringProc_3, s = " << s << "\n";
    return s;
}

The entire pipeline is called in the test function SimplePipeTest, defined on lines [28-33].

28
29
30
31
32
void SimplePipeTest() {
    std::string start_str("Start string ");
    std::cout << (std::move(start_str) | 
                  StringProc_1 | StringProc_2 | StringProc_3);
}

The pipe operator is called in a series, starting with the initial string start_str. In other words, start_str is passed on to StringProc_1, its result is passed on to StringProc_2, and then to StringProc_3; then, finally, its result is streamed to std::cout.

The output is as follows:

I'm in StringProc_1, s = Start string proc by 1,
I'm in StringProc_2, s = Start string proc by 1, proc by 2,
I'm in StringProc_3, s = Start string proc by 1, proc by 2, proc by 3,
Start string proc by 1, proc by 2, proc by 3,

Run @Compiler Explorer

Making it more general  

This is an easy way to organize a pipeline operation in C++. However, to make our operator | more generic, it can be re-coded to the following:

template <typename T, typename Function>
requires (std::invocable<Function, T>)
constexpr auto operator | (T &&t, Function &&f) -> typename std::invoke_result_t<Function, T> {
    return std::invoke(std::forward<Function>(f), std::forward<T>(t));
}

The improvements we have added are as follows:

  • A concept has been added that requires:
    • That Function parameter can be invoked with a parameter of type T.
  • constexpr, so the operator | can be called and executed at compile time, if its arguments are also available at that time.
  • The return type is defined with the helper std::invoke_result_t<Function, T>, which deduces the return type at compile time.
  • Calls std::invoke that invokes the callable object f with the parameter t. The benefit of using std::invoke, instead of a direct call f(t), is that the former works with any callable, such as a function pointer, a reference to a function, a lambda function, a member function pointer, a functional object (i.e., the one with operator() on board), or a pointer to member data. In other words, the callable f has to satisfy the Callable concept.

Here’s an updated example that illustrates the benefits of our updated operator |:

void SimplePipeTest() {
    std::string start_str("Start string ");
    std::cout << (std::move(start_str) | 
                  StringProc_1 | StringProc_2 | [](std::string&& s) {
                   s += " proc by 3,";
                   cout << "I'm in StringProc_3, s = " << s << "\n";
                   return s;
                  });
}

Run @Compiler Explorer

And the output:

I'm in StringProc_1, s = Start string  proc by 1,
I'm in StringProc_2, s = Start string  proc by 1, proc by 2,
I'm in StringProc_3, s = Start string  proc by 1, proc by 2, proc by 3,
Start string  proc by 1, proc by 2, proc by 3,

Handling errors  

Everything would be fine, but what to do if one link in the above pipeline cannot complete its operation and transmit its result because an error occurred? Of course, it may throw an exception and interrupt the entire operation. But there is also another alternative.

We can use std::optional to express whether the operation was successful and we have the result, or whether we have a situation in which the calculations failed for some reason and we simply cannot provide any result. But if you have a C++23 compiler, an even better option is to use std::expected. Unlike std::optional, which has been available since C++17, in the event of a calculation failure, it allows you to pass an error code and not just state the failure. We have implemented this idea into the new version of the pipe operator |, see below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
using namespace std;
template <typename T, typename E, typename Function>
    requires invocable<Function, T> && 
    is_expected<typename invoke_result_t<Function, T>>
constexpr auto operator | (std::expected<T, E> &&ex, Function &&f) 
    -> typename invoke_result_t<Function, T> 
{
    return ex ? 
           invoke(forward<Function>(f), *forward<expected<T, E>>(ex)) : 
           ex;
}

The key part is the new input parameter on line [5] – it is no longer a T object, but a std::expected<T, E>, where T stands for an expected value, while E denotes an unexpected value to represent those cases where an expected value cannot be computed

To verify if this is as expected, a new concept is defined on lines [3-4]. Its new second part is_expected is responsible for verifying if the result of invoking Function f with the parameters T actually returns std::expected. Its entire definition will be analyzed later.

Given the ex parameter, passed as a universal reference, on line [8] it is checked whether it has a valid object. If so, then on line [9], as before, we call the action f with the ex parameter also passed by universal reference. Otherwise, we simply return ex on line [10]. But in this case, it only transfers the error code. Other functions in the chain will behave the same way. This means that if an error occurs in the pipeline at some stage of processing, it will be propagated to the end of the chain, and no other ‘worker’ function f will be called again.

You can read more about std::expected in our other articles:

More advanced example  

To see this pipeline in action, let’s build a more complex example:

// Some error types just for the example
enum class OpErrorType : unsigned char { 
    kInvalidInput, kOverflow, kUnderflow 
};

struct Payload {
    std::string fStr{};
    int fVal{};
};

// For the pipeline operation - the expected type is Payload,
// while the 'unexpected' is OpErrorType
using PayloadOrError = std::expected<Payload, OpErrorType>;

PayloadOrError is simply std::expected, which has Payload as the expected type, and reports any errors in the form of OpErrorType error codes.

The Payload structure has two members: fStr of type std::string and fVal of type int. In practice, of course, it can be any object that we want to process in a pipeline.

The elements of the processing chain are the functions Payload_Proc_1 and subsequent functions. A characteristic feature of each of them is the initial condition in which we check whether the object s, passed through a right reference, contains a valid object. If not, the function immediately ends its operation by returning the s object, which, in this case, carries the error code.

PayloadOrError Payload_Proc_1(PayloadOrError &&s) {
    if (!s)
        return s;
    ++s->fVal;
    s->fStr += " proc by 1,";
    std::cout << "I'm in Payload_Proc_1, s = " << s->fStr << "\n";
    return s;
}

However, if we have a valid Payload object, we can freely process it. Finally, this processed object s is returned so that it can be processed by another function in the pipeline, and so on.

We introduced a slight variation only to the Payload_Proc_2 function. This time we simulate an error – if the randomly drawn value is even, std::unexpected with a randomly drawn error code will be returned. This means that the calculations have failed and, as a result, the pipeline has been interrupted.

PayloadOrError Payload_Proc_2(PayloadOrError &&s) {
    if (!s)
        return s;
    ++s->fVal;
    s->fStr += " proc by 2,";
    std::cout << "I'm in Payload_Proc_2, s = " << s->fStr << "\n";
    // Emulate the error, at least once in a while ...
    std::mt19937 rand_gen( std::random_device {} () );
    return ( rand_gen() % 2 ) ? s : 
             std::unexpected { rand_gen() % 2 ? 
               OpErrorType::kOverflow : OpErrorType::kUnderflow };
}

And the last Proc_3:

PayloadOrError Payload_Proc_3(PayloadOrError &&s) {
    if (!s)
        return s;
    ++s->fVal;
    s->fStr += " proc by 3,";
    std::cout << "I'm in Payload_Proc_3, s = " << s->fStr << "\n";
    return s;
}

The entire pipeline component with std::expected is launched and tested in function Payload_PipeTest. If the pipeline operation was successful, then the resulting string and integer are printed. Otherwise, one of the error messages is displayed in one of the branches of the switch statement.

void Payload_PipeTest() {
    auto res = PayloadOrError{Payload{"Start string ", 42}} |
      Payload_Proc_1 | Payload_Proc_2 | Payload_Proc_3;
    if (res)
      print_nl("Success! Result of the pipe: ", res->fStr, ", ", res->fVal);
    else
      switch (res.error()) {
        case OpErrorType::kInvalidInput:
          print_nl("Error: OpErrorType::kInvalidInput");
          break;
        case OpErrorType::kOverflow:
          print_nl("Error: OpErrorType::kOverflow");
          break;
        case OpErrorType::kUnderflow:
          print_nl("Error: OpErrorType::kUnderflow");
          break;
        default:
          print_nl("That's really an unexpected error ...");
          break;
      }
}

The last thing to explain is the is_expected concept. First, the parameter t of type T is introduced. Then, it is checked that type T defines value_type, as well as error_type. And then the series of three nested requirements begins.

template <typename T>
concept is_expected = requires(T t) {
    typename T::value_type;
    typename T::error_type;
    requires std::is_constructible_v<bool, T>;
    requires std::same_as<std::remove_cvref<decltype(*t)>, typename T::value_type>;
    requires std::constructible_from<T, std::unexpected<typename T::error_type>>;
};

What is characteristic of them is the first word requires. The main difference is that inserting the keyword requires forces the compiler to check what the value of this expression actually is – if it is true, then the concept is fulfilled, as in the following requirement:

requires std::is_constructible_v<bool, T>;

However, the same expression without the requires keyword at the beginning only checks whether the expression compiles or not, without evaluating its logical value. Of course, the first approach is ‘stronger’.

std::is_constructible_v<bool, T>;

The condition is used to ensure that the type T can be explicitly converted to bool. However, to check this why don’t we just call:

requires std::is_convertible<T, bool>;

or

requires std::convertible_to<T, bool>;

The thing is that the former is valid only if T is implicitly convertible to bool. On the other hand, the latter is valid only if T is implicitly and explicitly convertible to bool. However, std::expected defines only the explicit conversion to bool, that is:

constexpr explicit operator bool() const noexcept;

Therefore neither of the two above will work in our case. Hence, a workaround is to use std::is_constructible_v<bool, T>, which is valid if an object of the bool type can be constructed out of T. In our case, this means that the following initialization:

bool test_b{(bool)PayloadOrError()};

is possible. On the other hand, and as explained earlier, if put with no requires keyword in front, both std::is_convertible and std::convertible_to compile. However, in this case only syntax, but not the value is verified.

The other two requirements are a bit simpler. We require that the type of dereferenced *t is the same as T::value_type. Finally, it is verified that T can be constructed out of std::unexpected object. All these are fulfilled if T is compatible with the std::expected type.

The whole concept can be easily verified using static_assert, like below:

static_assert(is_expected<PayloadOrError>); // a short-cut to verify the concept

The last thing is to observe and analyze the results of this code. After execution, we sometimes receive the following texts:

I'm in Payload_Proc_1, s = Start string proc by 1,
I'm in Payload_Proc_2, s = Start string proc by 1, proc by 2,
I'm in Payload_Proc_3, s = Start string proc by 1, proc by 2, proc by 3,
Success! Result of the pipe: Start string proc by 1, proc by 2, proc by 3,, 45

but sometimes it happens that we get the below error message:

I'm in Payload_Proc_1, s = Start string proc by 1,
I'm in Payload_Proc_2, s = Start string proc by 1, proc by 2,
Error: OpErrorType::kOverflow

or the other one:

I'm in Payload_Proc_1, s = Start string proc by 1,
I'm in Payload_Proc_2, s = Start string proc by 1, proc by 2,
Error: OpErrorType::kUnderflow

All this is OK and as intended. And the above pipeline building techniques are a powerful programming tool that we can successfully use in many projects. They are also the basis for functional programming with the ranges library.

Let us observe that the presented pipeline framework is a kind of alternative to monadic processing in std::expected.

Finally, let’s notice that there are also proposals to provide users of the ranges library with mechanisms to create adaptor closure objects, so users can seamlessly implement their custom range adaptors any way they like.

Here’s the code to experiment: Run @Compiler Explorer

Summary  

What a ride! The article started with a simple notion of a pipe operator, and then we extended it with a generic calling code and std::expected. As you can see, thanks to std::expected, we can efficiently handle cases where something goes on the “else” path.

Back to you

  • Do you use pipe operator for functional composition?
  • Do you compose ranges with the pipe operator or you prefer regular invocation?

References