Almost lambdas in C99 – faking local variable capture

Function objects, and therefore lambdas, are not possible in C. However, if we think about one of the main benefits of lambdas – local variable capture – we can write both functions that can “capture” local variables, and functions that use those functions. Say for example you need to read from a socket. You can write the same socket reading routine each time, and handle all the errors and signals each time, handle the buffer management each time, or you can write a “generic” function that hides all the details:

int socket_read(int sock, bool (*func)(char*, ssize_t, va_list), ...)
{
    ssize_t rc = 1024;
    char buffer[rc];
    va_list args;
    va_start(args, func);
    while(!func(buffer, rc, args))
    {
        va_end(args);
        va_start(args, func);
        errno = 0;
        rc = recv(sock, buffer, sizeof(buffer), MSG_PEEK);
        if(rc > 0)
        {
            rc = recv(sock, buffer, rc, 0);
        }
        else if(rc == 0)
        {
            break;
        }
        else if(rc == -1 && errno != EAGAIN && errno != EWOULDBLOCK)
        {
            break;
        }
    }
    va_end(args);
    return rc;
}

Note that this function also takes care of the va_args initialization and cleanup, making client code safer.

An example usage would be to have separate functions that reads, for example, HTTP headers, and one that reads HTTP payload:

bool read_header(char* buf, ssize_t len, va_list args)
{
    ssize_t* hblen = va_arg(args, ssize_t*);
    ssize_t* hlen = va_arg(args, ssize_t*);
    char** headers = va_arg(args, char**);
    if(*hlen + len > *hblen)
    {
        if(!(*headers = (char*) realloc(*headers, *hlen + len)))  // Replace naive memory management algorithm with more efficient one
        {
            return true;
        }
        *hblen = *hlen + len;
    }
    memcpy(*headers + *hlen, buf, len);
    *hlen += len;
    return strstr(*headers, "rnrn") != NULL;
}

bool read_payload(char* buf, ssize_t len, va_list args)
{
    ssize_t* bytes_left = va_arg(args, ssize_t*);
    char** data = va_arg(args, char**);
    memcpy(*data, buf, len);
    *data += len;
    *bytes_left -= len;
    return bytes_left;
}

Note that the read_header function has its own buffer management due to the variable nature of HTTP headers, but the read_payload function doesn’t need to because it would be used to read fixed size data which would be provided by outside code.

Note also how each of the functions can virtually communicate with itself through the pointers passed into it via the va_list, even though it has no direct control of the socket_read loop.

Note that you should also be mindful of the order of the varargs. The rule of thumb is the put the most important and/or independent variables first. eg, sizes should always come before the buffer they limit.

Now you can read from a socket much more elegantly:

int main()
{
    int sock = socket(AF_INET, SOCK_STREAM, 0);

    // Set up socket
    // ...

    ssize_t hblen = 0;
    ssize_t hlen = 0;
    char* headers = NULL;  // For the purposes of this example, rely on realloc(NULL,size) behaviour. In real use, preallocate, and set hblen to size
    socket_read(sock, read_header, &hblen, &hlen, &headers);

    // Note that read_header may read more than just the header
    // so find rnrn, and copy the rest of the valid contents
    // into the following payload buffer
    // and adjust begin accordingly

    ssize_t clen /* = Find and interpret Content-Length header */;
    char payload[clen];
    char* begin = payload;
    socket_read(sock, read_payload, &clen, &begin);
    return 0;
}

Note how you’ve effectively locally captured variables from the calling function to be passed on to the delegate functions. Note that I also don’t clutter the calling function with memory management.

  • DISCLAIMER: I can vouch that the technique works, but since I rewrote code from scratch for this example and have not tested it due to time constraints, I cannot guarantee that the code works as is. It will need tweaking to get rid of bugs.
Advertisement

Responses to the Invalid value concept, and responses to those

Link 1
Link 2

You could argue that an end iterator is the archetypal example of an “invalid value”. And end iterator does not represent any valid part of a vector. It’s one-past-the-end, which serves as a marker. And not even that, in set and map, where the one-past-the-end end iterator doesn’t even point to the physical end of the container’s range.

Or what if you’re writing a test generation library?1 You need to test invalid values, and you need a way to signal “I want you to generate an invalid value of a class to test this function that uses it”. For example, a function may only take a certain integer range. Any value outside of that range is invalid, and you should test for it.

If you should never create an invalid value, you may as well say you should never test how invalid values are handled.

I did consider calling it a null value, but a null value can be a valid representation of a concept. For a contrived example, what if you’re writing a simple wrapper around BSD sockets? If a socket cannot be created, you get a -1. That is an invalid value. If the divine decree is that a constructor should never create an invalid value, then such a socket wrapper class can never be a legal construct.

You could argue that you should throw an exception if an invalid value is about to be created, but exceptions aren’t always the desired behaviour.


  1. Which is how I found a potential need for it. 

The Invalid Value constructor – a proposal

C++ has the concept of the default constructor. One is provided for you by the compiler under most common circumstances. What isn’t provided is the concept of a default constructor for an invalid value of the class. I propose the following convention for an invalid value constructor:

struct Value
{
    Value(std::nullptr_t) {std::cout << "Null constructor" << std::endl;}
    Value(void* v) {std::cout << "Null pointer constructor" << std::endl;}
};

int main()
{
    Value v{nullptr};
    return 0;
}

The output of this program is:

Null constructor

This works because nullptr on its own has the type of std::nullptr, and so overload resolution will choose the narrowest possible match, thus the constructor that takes the single argument of nullptr.

The Internals Pattern, or emulating Ada’s discriminated variant records

In C, unions are not typesafe and requires that the programmer be completely responsible with how the unions are used. Ada’s discriminated variant records are a typesafe way of doing unions and using the Internals pattern1 we can get the same functionality in C++. This is all the code that’s needed:

template<typename...>
struct Internals
{
    // Empty struct for the default case
};

Due to C++’s template instantiation rules, you can have different members of a class for any specialization of the template. In essence:

template<>
struct Internals<int>
{
    int int_data;  // Note that there is no requirement that the members are the same
};

template<>
struct Internals<std::string>
{
    std::string string_data;
};

template<>
struct Internals<int, int>
{
    std::tuple<int, int> int_tuple_data;
};

All of these structs are completely different types, even though they all come from the Internals templated type. They bear no relation to each other in the way inheritance would have done, and this is the main benefit. You can have compile-time polymorphism that is checked before you even run the code. This is now possible:

template<typename T, typename U, typename... Vs>
struct Scaffold
{
    Internals<Scaffold> internals;
    void print()
    {
        std::cout << "Default dummy data" << std::endl;
    }
};

template<>
struct Internals<Scaffold<int, float, char, double>>
{
    std::string data{"Specialized dummy data"};
};

template<>
void Scaffold<int, float, char, double>::print()
{
    std::cout << this->internals.data << std::endl;
}

int main()
{
    Scaffold<long, long>{}.print();
    Scaffold<int, float, char, double>{}.print();

    return 0;
}

The output of this program is:

Default dummy data
Specialized dummy data

This is a useful behaviour when you want something akin to aspects, or crosscutting concerns. Clients of your framework can specialize the internals without having to derive from any framework class, decoupling the framework scaffolding from the client.


  1. I have not found this pattern discussed anywhere and so do not know the correct name even if there is one 

DRAKON

DRAKON is a really nifty language and this tool is really easy to learn and use. It is a very expressive graphical notation, and much like C++, it doesn’t actually force you into a paradigm. The constraint rules of DRAKON make for really easy to read diagrams, but it also makes it quite enjoyable to use because you’re not focused on layout out things to avoid line collisions. They also force you to make your designs more consistent with good principles and each other.

A general algorithm for modern C++11 design

C++11’s main domain is systems programming. A system that does anything interesting is made up of components and its complexity emerges from the interaction of those components. Software engineering is about managing complexity, and C++11 helps you manage that complexity by offering you multiple programming styles1. As programmers, we choose which style best reduces the complexity of expressing the system’s constraints. One size doesn’t fit all.

This is my ad-hoc process whenever I’m in my design mode:

  1. Imagine a new programming language that does what you need. Don’t limit the syntax to C++11 – consider constructs from other languages or that don’t even exist.
  2. Imagine short sequences of actions written in that language that achieves the main goals of the system.
  3. Break down those actions into primitives – verbs and nouns that can be composed in any manner of sentences.
  4. Read up on C++11 and keep an eye out for language features that look like those primitives – eg, operator overloading, lambdas, return type deduction, variadics.
  5. Consider any similarities between primitives and work out if the differences between the similar primitives can be figured out during compile time by the compiler, or can only be figured out at runtime.
  6. If it can be figured out during compile time, consider using templates.
  7. If using templates, consider using preprocessor macros that improve readability and usability of using templates.

  1. Not paradigm. I view it as a stylistic choice. There is some aesthetic aspect to looking at code, but it’s a stretch to call syntactic sugar a paradigm. 

While I’m at it…

I guess my first piece of advice to coding C++11 is to always read the Standard Library documentation. Especially the Algorithms of the STL. Many complex algorithms, especially those that require loops, can be reduced to just a few lines of non-loop code. But an even more basic benefit is to never rely on remembering what anything is supposed to do. Just browsing the algorithms and rereading the specifications of those algorithms can often lead to ingenious ways to combine algorithms.