A monad is just a monoid in the category of endofunctors, what’s the problem?


DISCALIMER: What lies ahead is a large load of noob C programming and likely contains code that can and will launch the missles.


Perhaps it was a mistake to learn Rust, because now that I’ve experienced errors-as-values, It feels like I will never be able to live without it, and every colaborative codebase I will ever work on from now on will be try-catch and null riddled agony. Actually, It’s not even just at work. Even in my personal projects (which are rarely written in Rust nowadays), I sometimes find myself spending more time than I should implementing Result and a bunch of unnecessary composition methods to go along with it.

I don’t think I’m alone in this, right? I’m willing to bet theres a lot of recovering Rust programmers who have had their fair share of trying to turn other languages into Rust.

On the other hand, I’ve been writing a lot of C recently. I think theres something nice about using a language that doesn’t have any big ideas. It lets you focus more on actually getting things done instead of contemplating the spider-web of abstractions. Regardless, It doesn’t keep me away from the urge of making it just a little bit more Rusty.

And the traditional ways of handling/categorizing errors in C basically amounts to returning NULL, false, or an enum status code, and/or using out-parameters inplace of return values, which leaves quite a bit to be desired in my opinion. So I like to believe my urge is a bit justified.

The Beginning

Long story short, I want Result monads in C. The concept of it is pretty simple. It’s just a struct that holds 2 potential values. So lets try that out on a sum function; for now, ignoring any error situations:

typedef struct {
    int value;
    char *error;
} Result;

Result add(int a, int b) {
    return (Result) { .value = a + b };
}

This works, but the obvious problem is that this struct is not reusable at all, unless everything we use it in also returns Result<int, char *>. So my next idea was to leave the struct anonymous and define it directly in the function declaration without any typedef:


struct { int value; char *error; } add(int a, int b) {
    // Hmmm
}

Believe it or not, this is a valid way to use a struct. But now the problem is that the struct isn’t named, so now we can’t even use it inside the function. But what if we just use a compound literal of a struct defined the same way? It’ll be ugly, for sure, but it makes sense, right?

struct { int value; char *error; } add(int a, int b) {
    return (struct { int value; char *error; }) { .value = a + b };
}

Yea, its pretty messy, but we can fix it in post (macros), so lets see–

<source>:6:12: error: returning 'struct (unnamed struct at <source>:7:9)' from a function with incompatible result type 'struct (unnamed struct at <source>:1:1)'

Oh, nevermind. Turns out that two identically defined anonymous structs will never be compatible with eachother, because the type checker doesn’t actually care if they’re identical, just if they are (or arent) the exact same type. This usually isn’t an issue because C asks very few questions when a cast is involved, but that is already when both structs in question are named. So at this point it seems pretty hopeless to create a reusable Result monad in C.

Enter C23

C23 was officially released recently. It has a few really exciting features, like constexpr, #embed, auto, and typeof; check out this wonderful post from ThePhD about some of the coolest things that come with C23. The ones that stuck out to me the most were auto and typeof, two features that come from C++, which is, as expected, a little bit controversial amongst the simplicity purists.

Starting with typeof, it is usually used to declare variables based on the type of a compile-time expression. For example:

// int x = 1;
typeof(1) x = 1;

Interestingly, this also works in compund literals:

struct { int a; } x;
x = (typeof(x)) { .a = 1 };

And, it can even get the return type of a function:

char *hello() {
    return "Hello, Sailor!";
}

// char *x = hello();
typeof(hello()) x = hello();

This is exciting because it immediately solves the issue with two identical anonymous structs not being the same type. With typeof, we can get the actual anonymous struct instead of a different one that looks just like it. Now we can fix the code from before:

struct { int value; char *error; } add(int a, int b) {
    return (typeof(add(0, 0))) { .value = a + b };
}

This actually compiles! And if we were to use this function, the auto keyword is there for the convinience of infering the type of the result struct:

auto result = add(1, 2);

if (result.error) {
    printf("Error: %s\n", result.error);
} else {
    printf("Result: %d\n", result.value);
}

Now we can finally slap some macros over it and call it a day.

The Macros

RESULT(T, E), OK(F, V), ERR(F, E)

To start cleaning up and making the types more reusable, we can use a macro to define the return type of a function:

#define RESULT(T, E) struct { union { T value; E error; }; bool is_ok; }

Notice the addition of the is_ok field. This is so we can tag the result and indicate which field is valid. Also, error and value are now in a union, so they share the same space in memory, because only one of them will be used at a time.

Now, to clean up the return types, we can make macros for OK and ERR:

#define OK(F, V) (typeof(F)) { .value = V, .is_ok = true }
#define ERR(F, E) (typeof(F)) { .error = E, .is_ok = false }

And now, we can implement a checked_add function that uses these macros, this time with checks for overflow and underflow:

RESULT(int, char *) checked_add(int a, int b) {
    if ((b > 0) && (a > INT_MAX - b)) return ERR(checked_add(0, 0), "Overflow");
    if ((b < 0) && (a < INT_MIN - b)) return ERR(checked_add(0, 0), "Underflow");
    return OK(checked_add(0, 0), a + b);
}

That looks pretty nice in my opinion. The one minor downside is that a dummy function call needs to be passed to ERR and OK so they can acquire the return type of the function and cast the compound literal. It could be shortened by making a temporary macro, but im not sure how worthwhile it is. For example:

#define T checked_add(0, 0)
RESULT(int, char *) checked_add(int a, int b) {
    if ((b > 0) && (a > INT_MAX - b)) return ERR(T, "Overflow");
    if ((b < 0) && (a < INT_MIN - b)) return ERR(T, "Underflow");
    return OK(T, a + b);
}
#undef T

And at the call site, the result can be infered with auto and error would be properly handled by checking the is_ok field of the result:

int main() {
    auto result = checked_add(1, 2);

    if (!result.is_ok) {
        fprintf(stderr, "Error: %s\n", result.error);
        return 1;
    }

    fprintf(stderr, "Result: %d\n", result.value);
}

It does seem a bit misleading to have T be defined as a function call, because it looks like it could cause extra side-effects, but when a function call is passed into typeof, it gets evaluated at compile time, so the call never actually occurs.

UNWRAP(R)

Obviously, the preferable (lazy) way to do this is with unwrap. Which extracts the good value from the result and panics if the result wasn’t ok. In this macro, I used a ternary so that the macro can be treated as an expression rather than a statement:

void panic(const char *msg, const char *file, int line) {
    fprintf(stderr, "%s:%d Error: %s\n", file, line, msg);
    exit(EXIT_FAILURE);
}
#define UNWRAP(R) ((R).is_ok ? (R).value : (panic((R).error, __FILE__, __LINE__), (R).value))

Originally, this is where the post ended, but Cunningham’s Law got the better of me. You probably already see whats wrong with the code above, because it’s like mistake #1 when it comes to using ternaries; if R is a function, UNWRAP(R) will always evaluate it twice; once for checking is_ok, and once again for whatever branch it takes to get the error/value out of it.

The solution to this is to use a temporary variable to store the result of the function, which can probably be done with the usual macro tricks, with the help from our new friend, auto:

#define UNWRAP(R) { \
    auto _r = (R); \
    _r.is_ok ? _r.value : (panic(_r.error, __FILE__, __LINE__), _r.value); \
}

Oh, but how do we retrieve the value from the ternary? The macro is a statement now and we cant return from a statement. As far as I know, there isn’t a way to write macros in standard C that both are expressions, and have temporary values. However, there is a GNU extension called Statement Expressions that can help with this, all we need to do is wrap the block in parentheses:

#define UNWRAP(R) ({ \
    auto _r = (R); \
    _r.is_ok ? _r.value : (panic(_r.error, __FILE__, __LINE__), _r.value); \
})

This lets us use the macro as an expression, and now it should be okay to use with functions that have side effects (Cunningham will be the judge of that). It is a bit of a bummer that it requires a GNU extension to use the unwrap macro, but I use clang on windows anyways, and you can always skip using unwrap and check the results manually, closer to how you’d do it in Go.

TRY(T, R)

The original post also didn’t do this part because, at the time, I didn’t think it was possible to make this macro. In Rust, the ? (try) operator is used to return early from a function of the result contains an error, and evaluate to the ok value otherwise. Now that I’m aware of GNU statement expressions, this is actually not super difficult:

#define TRY(T, R) ({ \
    auto _r = (R); \
    if (!_r.is_ok) return ERR(T, _r.error); \
    _r.value; \
})

This macro is useful for chaining a bunch of functions that return results together. If anything in the chain returns an error, the error will propagate out with an implicit early return.

Just to demonstrate, if we were to write a checked_multiply function that uses a series of checked_add calls, we could do it like this:

#define T checked_multiply(0, 0)
RESULT(int, char *) checked_multiply(int a, int b) {
    int c = a;
    for (int i = 1; i < b; i++) c = TRY(T, checked_add(c, a));
    return OK(T, c);
}
#undef T

And when we enter something that would cause an overflow, checked_multiply will return the same error that checked_add failed with:

int result = UNWRAP(checked_multiply(2, INT_MAX));
$ ./a.out
main.c:48 Error: Overflow

Conclusion

As much as I love the idea of using these macros in my own projects, It seems like more work than its worth to actually write libraries that use them. You only really get the benefit of something like this when its already used everywhere, like it is in the Rust or Go standard libraries, and if it isn’t, you’ll spend a lot of time writing unnecessary wrappers for existing functions.

Even that aside, regularly using this many macros for control flow is a little bit evil, and sounds like it could cause some debugging nightmares. I hope it goes without saying that this was all for fun and exploration, and you shouldn’t use these for anything serious (please don’t).

Full Code

Heres a full version of the code from this post. The only notable change is the existence of UNWRAP_ERR, which is implemented the same way as UNWRAP, but has the inverse effect.

You can also see the code at https://github.com/ethanavatar/c23-results

#include <stdio.h>
#include <stdlib.h>

// -------- Macros --------

#define RESULT(T, E) struct { union { T value; E error; }; bool is_ok; }
#define OK(F, V) (typeof(F)) { .is_ok = true, .value = V }
#define ERR(F, E) (typeof(F)) { .is_ok = false, .error = E }

void panic(const char *msg, const char *file, int line) {
    fprintf(stderr, "%s:%d Error: %s\n", file, line, msg);
    exit(EXIT_FAILURE);
}

#define UNWRAP(R) ({ \
    auto _r = (R); \
    _r.is_ok ? _r.value : (panic(_r.error, __FILE__, __LINE__), _r.value); \
})

#define UNWRAP_ERR(R) ({ \
    auto _r = (R); \
    (!_r.is_ok) ? _r.error : (panic("unwrap_err got an ok value", __FILE__, __LINE__), _r.error); \
})

#define TRY(T, R) ({ \
    auto _r = (R); \
    if (!_r.is_ok) return ERR(T, _r.error); \
    _r.value; \
})

// -------- Usage --------

#define T checked_add(0, 0)
RESULT(int, char *) checked_add(int a, int b) {
    if (a > 0 && b > INT_MAX - a) return ERR(T, "Int overflow");
    if (a < 0 && b < INT_MIN - a) return ERR(T, "Int underflow");
    return OK(T, a + b);
}
#undef T

#define T checked_multiply(0, 0)
RESULT(int, char *) checked_multiply(int a, int b) {
    int c = a;
    for (int i = 1; i < b; i++) c = TRY(T, checked_add(c, a));
    return OK(T, c);
}
#undef T

int main(void) {
    int a = 34, b = 35;
    int c = UNWRAP(checked_add(a, b));
    fprintf(stderr, "%d + %d = %d\n", a, b, c);

    a = 42, b = 10;
    c = UNWRAP(checked_multiply(a, b));
    fprintf(stderr, "%d * %d = %d\n", a, b, c);

    a = 42, b = INT_MAX;
    auto err = UNWRAP_ERR(checked_add(a, b));
    fprintf(stderr, "Error: %s\n", err);

    a = 42, b = INT_MAX;
    err = UNWRAP_ERR(checked_multiply(a, b));
    fprintf(stderr, "Error: %s\n", err);

    return 0;
}