More on brevity and clarity

Continuing on from the previous post, there was another side challenge to implement the POSIX utility wc. Someone claimed C++ makes things unnecessarily hard and the challenge was supposed to prove it. Well, it was simple and I threw in a simple (incomplete) SLOC counter as well. The challenger couldn’t argue that C++ made it hard to implement wc, and so decided to nitpick on small things that do not even relate to the challenge at hand, mostly around coding style preferences that have nothing to do with the ease of implementing the core functionality of wc.

I’m by no means the best C++ coder in terms of complexity or style. Judge for yourself whether or not this was impossible to do cleanly in C++:

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>

enum class char_opts
{
  BYTES, 
  CHARS, 
  NUM_OPTS
};

void count(std::istream& _in, unsigned& sloc_count, unsigned &line_count, unsigned &word_count, unsigned &char_count, unsigned &byte_count, unsigned &max_line_length, unsigned &find_count, const std::string &str)
{
  std::string line;
  std::getline(_in,  line);
  bool in_block_comment = false;
  for (unsigned lc = 0; _in; std::getline(_in, line), ++lc)
  {
    byte_count += line.length();
    char_count += line.length();
    if (!_in.eof())
    {
      ++line_count;
      ++byte_count;
      ++char_count;
    }
    max_line_length = std::max<unsigned>(max_line_length, line.length());
    if (!str.empty()) for (auto s = line.find(str); s != std::string::npos; s = line.find(str, s+1), ++find_count);

    std::istringstream line_str{line};
    std::skipws(line_str);
    std::string word;
    line_str >> word;
    for (; line_str; line_str >> word) ++word_count;

    auto trimmed = line;
    trimmed.erase(0, trimmed.find_first_not_of(" t"));
    auto trailing = trimmed.find_last_not_of(" t");
    if (trailing != std::string::npos) trimmed.erase(trailing);
    if (!trimmed.empty() && trimmed != "{" && trimmed != "}" && trimmed.find("//") != 0) ++sloc_count;
  }
}

int main(int _c, char** _v)
{
  char_opts copts = char_opts::NUM_OPTS;
  bool sloc = false;
  bool lines = false;
  bool words = false;
  bool line_length = false;
  std::string str;

  bool opts_supplied = false;

  auto args = _v + 1;
  const auto end = _v + _c;
  for (; args < end; ++args)
  {
    std::string arg{*args};
    if (arg == "-" || arg[0] != '-') break;

    if (arg == "-c" || arg == "-bytes") copts = char_opts::BYTES;
    else if (arg == "-m" || arg == "-chars") copts = char_opts::CHARS;
    else if (arg == "-L" || arg == "-max-line-length") line_length = true;
    else if (arg == "-sloc") sloc = true;
    else if (arg == "-l" || arg == "-lines") lines = true;
    else if (arg == "-w" || arg == "-words") words = true;
    else if (arg == "-o") str = *++args;
    else
    {
      std::cerr << "Invalid argument '" <<  arg << ''' << std::endl;
      return -1;
    }

    opts_supplied = true;
  }

  if (!opts_supplied)
  {
    copts = char_opts::BYTES;
    lines = true;
    words = true;
    line_length = true;
  }

  unsigned file_count = 0;
  unsigned total_sloc_count = 0;
  unsigned total_line_count = 0;
  unsigned total_word_count = 0;
  unsigned total_char_count = 0;
  unsigned total_byte_count = 0;
  unsigned total_max_line_length = 0;
  unsigned total_find_count = 0;
  for (bool no_file = args == end; no_file || args < end; ++args, ++file_count, no_file = false)
  {
    std::string filename{no_file ? "" : *args};
    unsigned sloc_count = 0;
    unsigned line_count = 0;
    unsigned word_count = 0;
    unsigned char_count = 0;
    unsigned byte_count = 0;
    unsigned max_line_length = 0;
    unsigned find_count = 0;
    if (no_file || filename == "-") std::cin.clear();
    count(no_file || filename == "-" ? std::cin : std::move(std::ifstream{filename}), sloc_count, line_count, word_count, char_count, byte_count, max_line_length, find_count, str);
    std::cout << (sloc ? std::to_string(sloc_count) + " " : "")
          << (lines ? std::to_string(line_count) + " " : "")
          << (words ? std::to_string(word_count) + " " : "")
          << (copts != char_opts::NUM_OPTS ? std::to_string(copts == char_opts::BYTES ? byte_count : char_count) + " " : "")
          << (line_length ? std::to_string(max_line_length) + " " : "")
          << (!str.empty() ? std::to_string(find_count) + " " : "")
          << filename <<  std::endl;

    total_sloc_count += sloc_count;
    total_line_count += line_count;
    total_word_count += word_count;
    total_char_count += char_count;
    total_byte_count += byte_count;
    total_max_line_length = std::max(total_max_line_length, max_line_length);
  }

  if (file_count > 1) std::cout << (sloc ? std::to_string(total_sloc_count) + " " : "")
        << (lines ? std::to_string(total_line_count) + " " : "")
        << (words ? std::to_string(total_word_count) + " " : "")
        << (copts != char_opts::NUM_OPTS ? std::to_string(copts == char_opts::BYTES ? total_byte_count : total_char_count) + " " : "")
        << (line_length ? std::to_string(total_max_line_length) + " " : "")
        << (!str.empty() ? std::to_string(total_find_count) + " " : "")
        << "total" <<  std::endl;

  return 0;
}

Interesting exercise in brevity and clarity

Recently had a discussion and challenge in comparing two languages, C++ and Python. I think modern C++ is holds up really well to so-called scripting languages to do quick and dirty utility programs. This is a reasonably short implementation of a prime number finder:

#include <cstdio>
inline bool prime(const auto _candidate, const auto *_first, const auto *_last) {
  for (auto p = _first; p != _last && *p * *p <= _candidate; ++p)
    if (_candidate % *p == 0)
      return false;
  return true;
}
int main(int _c, char** _v) {
  const unsigned num_primes = 10000;
  static unsigned primes[num_primes] = {2, 3};
  for (unsigned i = 2; i < num_primes; ++i)
    for (primes[i] = primes[i-1] + 2; !prime(primes[i], primes + 1, primes + i); primes[i] += 2);
  printf("The %uth prime is: %u.n", num_primes, primes[num_primes - 1]);
  return 0;
}

Several things. Mostly I’ve learned to re-embrace the spirit of C/C++ for brevity, such as single statement if and for blocks. But you really have to think about readability when you code in that style. The brief C style is only bad if it’s done without consideration about code aesthetics. The brief style shouldn’t be about reducing line count, but about increasing readability. It is a bit counter-intuitive coming from a university education that told you to put every if block in braces over multiple lines.

When coded in such a manner, modern C++ can approach the ease of writing that languages like Python enjoy.

Template meta-programming rule of thumb

“Use template meta-programming to express design, not to express computation.”

Various explanations of template meta-programming uses examples like a compile-time Fibonnacci sequence. What those tutorials should be focusing on is how to use template meta-programming to hide incidental requirements of interfaces.

The ultimate goal of template meta-programming is to enable code like this:

int main()
{
    do_what_i_expect(/* args */);
    return 0;
}

Making templates easier with named-arguments

Say you have a template that requires a large number of arguments. ie, more than three. This is not a very clear or concise interface and not self-documenting. Any thing with a large number of arguments suffers from the same problem. You can give default arguments but if you just want to provide one argument that happens to be after one or more other default arguments, you have to provide those as well and you have to know the defaults if you want to keep the default behaviour bar the one you want to modify. My solution to provide named template arguments is this:

template<typename NumType,
         NumType TLow = std::numeric_limits<NumType>::lowest(),
         NumType TMin = std::numeric_limits<NumType>::min(),
         NumType TMax = std::numeric_limits<NumType>::max(),
         NumType TDef = 0,
         NumType TInv = -1,
         typename Specials = TestList<NumType>,
         typename Excludes = TestList<NumType> >
struct NumWrapper
{
    static constexpr NumType low = TLow;
    static constexpr NumType min = TMin;
    static constexpr NumType max = TMax;
    static constexpr NumType def = TDef;
    static constexpr NumType inv = TInv;
    typedef Specials inc_type;
    typedef Excludes exc_type;

    template<NumType Special>
    struct Min
    {
        typedef NumWrapper<NumType, low, Special, max, def, inv, Specials, Excludes> type;
    };

    template<NumType Special>
    struct Max
    {
        typedef NumWrapper<NumType, low, min, Special, def, inv, Specials, Excludes> type;
    };

    template<NumType Special>
    struct Low
    {
        typedef NumWrapper<NumType, Special, min, max, def, inv, Specials, Excludes> type;
    };

    template<NumType Special>
    struct Def
    {
        typedef NumWrapper<NumType, low, min, max, Special, inv, Specials, Excludes> type;
    };

    template<NumType Special>
    struct Inv
    {
        typedef NumWrapper<NumType, low, min, max, def, Special, Specials, Excludes> type;
    };

    template<NumType... List>
    struct Inc
    {
        typedef NumWrapper<NumType, low, min, max, def, inv, TestList<NumType, List...>, exc_type> type;
    };

    template<NumType... List>
    struct Exc
    {
        typedef NumWrapper<NumType, low, min, max, def, inv, inc_type, TestList<NumType, List...> > type;
    };

    NumType val = def;
    NumWrapper() = default;
    NumWrapper(NumType _v) : val(_v) {}

    operator NumType ()
    {
        return val;
    }
};

This class is something I needed to quickly create data ranges really easily in order to generate values for testing. I may want to provide a different minimum that is different from the underlying type but use the std::numeric_limits for the other values, or I may want to provide extra values that have a special meaning within the context of its use.

The named argument effect is achieved by declaring nested classes in the NumWrapper classes that have an internal typedef that creates a new NumWrapper type from the enclosing template instantiation. The internal typedef only instantiates on the template argument they “name”, and use the rest of the values from the enclosing template instantiation. The use of default template arguments in the main definition, and the inheritance of those arguments as you continue the typedef chain means the user does not then have to provide those values if they don’t want to.

Take special note that, as the library developer, you will of course need to know the order of the template arguments. You just have to make it so that the user of your library does not have to know the order.

Declaring a new integer type becomes as simple as this:

typedef NumWrapper<short>::Max<9999>::type::Def<1>::type::Min<-10>::type::Inc<2,3,5,7,11,13,17,19>::type MyIntegralType;

It also makes it easy to figure out what the expected range and values of this integral type should be. You can even automate its specialization for std::numeric_limits:

namespace std
{
    template<typename NumType, NumType TLow, NumType TMin, NumType TMax, NumType TDef, NumType TInv, typename Specials, typename Excludes>
    struct numeric_limits<NumWrapper<NumType, TLow, TMin, TMax, TDef, TInv, Specials, Excludes> > : numeric_limits<NumType>
    {
        static constexpr bool is_specialized = true;
        static constexpr NumType min()
        {
            return NumWrapper<NumType, TLow, TMin, TMax, TDef, TInv, Specials, Excludes>::min;
        }

        static constexpr NumType max()
        {
            return NumWrapper<NumType, TLow, TMin, TMax, TDef, TInv, Specials, Excludes>::max;
        }

        static constexpr NumType lowest()
        {
            return NumWrapper<NumType, TLow, TMin, TMax, TDef, TInv, Specials, Excludes>::low;
        }
    };

    template<typename NumType, NumType TLow, NumType TMin, NumType TMax, NumType TDef, NumType TInv, typename Specials, typename Excludes>
    struct is_arithmetic<NumWrapper<NumType, TLow, TMin, TMax, TDef, TInv, Specials, Excludes>> : is_arithmetic<NumType> {};
}

This way, the user will never have to specialize std::numeric_limits ever again.

One final note, you can use preprocessor macros to make this even more easier to write:

#define NUMTYPE(type) NumWrapper<type>
#define TLOW(low) ::Low<low>::type
#define TMIN(min) ::Min<min>::type
#define TMAX(max) ::Max<max>::type
#define TDEF(def) ::Def<def>::type
#define TINV(inv) ::Inv<inv>::type
#define TINC(...) ::Inc<__VA_ARGS__>::type
#define TEXC(...) ::Exc<__VA_ARGS__>::type

typedef NUMTYPE(short)TMAX(9999)TDEF(1)TMIN(-10)TINC(2,3,5,7,11,13,17,19) MyIntegralType;

Almost lambdas in C99 – faking local variable capture

Function objects, and therefore lambdas, are not possible in C. However, if we think about one of the main benefits of lambdas – local variable capture – we can write both functions that can “capture” local variables, and functions that use those functions. Say for example you need to read from a socket. You can write the same socket reading routine each time, and handle all the errors and signals each time, handle the buffer management each time, or you can write a “generic” function that hides all the details:

int socket_read(int sock, bool (*func)(char*, ssize_t, va_list), ...)
{
    ssize_t rc = 1024;
    char buffer[rc];
    va_list args;
    va_start(args, func);
    while(!func(buffer, rc, args))
    {
        va_end(args);
        va_start(args, func);
        errno = 0;
        rc = recv(sock, buffer, sizeof(buffer), MSG_PEEK);
        if(rc > 0)
        {
            rc = recv(sock, buffer, rc, 0);
        }
        else if(rc == 0)
        {
            break;
        }
        else if(rc == -1 && errno != EAGAIN && errno != EWOULDBLOCK)
        {
            break;
        }
    }
    va_end(args);
    return rc;
}

Note that this function also takes care of the va_args initialization and cleanup, making client code safer.

An example usage would be to have separate functions that reads, for example, HTTP headers, and one that reads HTTP payload:

bool read_header(char* buf, ssize_t len, va_list args)
{
    ssize_t* hblen = va_arg(args, ssize_t*);
    ssize_t* hlen = va_arg(args, ssize_t*);
    char** headers = va_arg(args, char**);
    if(*hlen + len > *hblen)
    {
        if(!(*headers = (char*) realloc(*headers, *hlen + len)))  // Replace naive memory management algorithm with more efficient one
        {
            return true;
        }
        *hblen = *hlen + len;
    }
    memcpy(*headers + *hlen, buf, len);
    *hlen += len;
    return strstr(*headers, "rnrn") != NULL;
}

bool read_payload(char* buf, ssize_t len, va_list args)
{
    ssize_t* bytes_left = va_arg(args, ssize_t*);
    char** data = va_arg(args, char**);
    memcpy(*data, buf, len);
    *data += len;
    *bytes_left -= len;
    return bytes_left;
}

Note that the read_header function has its own buffer management due to the variable nature of HTTP headers, but the read_payload function doesn’t need to because it would be used to read fixed size data which would be provided by outside code.

Note also how each of the functions can virtually communicate with itself through the pointers passed into it via the va_list, even though it has no direct control of the socket_read loop.

Note that you should also be mindful of the order of the varargs. The rule of thumb is the put the most important and/or independent variables first. eg, sizes should always come before the buffer they limit.

Now you can read from a socket much more elegantly:

int main()
{
    int sock = socket(AF_INET, SOCK_STREAM, 0);

    // Set up socket
    // ...

    ssize_t hblen = 0;
    ssize_t hlen = 0;
    char* headers = NULL;  // For the purposes of this example, rely on realloc(NULL,size) behaviour. In real use, preallocate, and set hblen to size
    socket_read(sock, read_header, &hblen, &hlen, &headers);

    // Note that read_header may read more than just the header
    // so find rnrn, and copy the rest of the valid contents
    // into the following payload buffer
    // and adjust begin accordingly

    ssize_t clen /* = Find and interpret Content-Length header */;
    char payload[clen];
    char* begin = payload;
    socket_read(sock, read_payload, &clen, &begin);
    return 0;
}

Note how you’ve effectively locally captured variables from the calling function to be passed on to the delegate functions. Note that I also don’t clutter the calling function with memory management.

  • DISCLAIMER: I can vouch that the technique works, but since I rewrote code from scratch for this example and have not tested it due to time constraints, I cannot guarantee that the code works as is. It will need tweaking to get rid of bugs.