Combining container hashes with C++14 metaprogramming – Cure for insomnia #1729

2015/02/08 King-Yan Kwan software engineering Advice, C++11, C++14, Pattern

The most missed feature in standard C++ for thirteen years, more than lambdas or variadic templates, are the hash container equivalents of std::map and std::set. The “normal” map and set containers are implemented as binary trees, and any class of objects that one may want to store in a map or set must implement strict weak ordering – meaning that either the less-than operator must be defined for that class, or the container template is instantiated with a custom compare functor.

All of the fundamental types and most of the standard library types for which a strict weak ordering makes sense has the less-than operator defined. User defined types which have “key” members that are fundamental types can simply define their own strict weak ordering based on the lexicographical ordering of the tuple of those key members. This makes it somewhat easy to recursively apply strict weak order to user defined types composed of, or derived from, other user defined types.

Similarly, in modern C++, the hash containers require that the class of objects stored in them must have a hash operation defined. As with the normal map and set, the standard library also provides default hashing functions for fundamental types and standard library types for which a hash makes sense.

Unlike the normal map and set, it is less trivial to provide hash functions for user defined types. Naïve combinations of hashes resulting from hash functions may lead to more collisions than desired, for while the standard library may provide a hash function that is generally reasonable for the fundamental and standardized types, it doesn’t necessarily apply for user defined hashes.

What is needed is a straightforward way to hash multiple objects as one entity and without having to do tricky bit-twiddling ourselves. Fortunately, the standard library, and the updated C++ language, provides all that is needed. There are three elements that are required to make this work:
1. It must look like std::hash
2. The standard library provides std::bitset for arbitrarily sized (nb: but computable at compile time) strings of bits, and it also provides a std::hash specialization for std::bitsets
3. It should not take more than one or two logical lines of code for clients to specialize

These three elements form what I call the “vice approach” to design. It is both top-down – by specifying what the design looks like; and bottom-up – by specifying the components of the implementation. In the vice is the programmer’s head, and the process of resolving the top design and the bottom design is tightening the vice around the programmer’s head until it either explodes1 or the carbon fuses into diamond.

So, to wit, the top surface of the design, which otherwise could be viewed as “how would a programmer write” should look like this:

struct Department {std::bitset<7> dep_id;};
struct Employee {short id; std::u16string name; Department dep;};
// Such that we can write something resembling (not C++ syntax):
hash(Department d) → hash(d.dep_id);
//and
hash(Employee e) → hash(e.id, e.name, e.dep);

The bottom surface of the design should result in this:

std::bitset</* Size of combined hashes */> result = /* Bit shifting */;
return std::hash</* type of result */>{}(result);

We start tightening the vice by figuring out how to calculate the size of the final bitset. Without this, we’d have to resort to using vector and the runtime calculation of the size and degraded performance when used with hash containers.

We somehow have to calculate the total number of bits, which I’ll call area2, of the hashes of each key member of an arbitrary structure like Department or Employee in compile-time. This means the use of template metaprogramming; namely, using variadic templates and recursive list processing.

// Definition
template<typename Arg, typename... Args>
struct area_of
{
  static constexpr area_of_t<Arg> value =
    area_of<Arg>::value + area_of<Args...>::value;
};
// Partial specialization
template<typename Arg>
struct area_of<Arg>
{
  static constexpr area_of_t<Arg> value =
    sizeof(area_of_t<Arg>) * CHAR_BIT;
};

For those unfamiliar with template metaprogramming, what this code does is similar to what LISP or Prolog does. Say you have a list {1, 2, 3, 4, 5, 6, 7 …}. To go through each of the list of the element, you extract the first element – the head – and pass the tail list to the next operation, which is often the recursively to the same function. A programmer would write something like this:

area_of<bitset<7>> /* and */ area_of<short, u16string, Department>

So when we pass a list of types to area_of, it extracts the head Arg, and the tail is kept in the parameter pack Args. It calculates the area of the head by calling area_of::value, and then combines it to the area of the tail. Since the tail is itself still a list, the call to area_of::valuewill recursively calculate the area of its head and its tail etc etc. Args… is called template parameter unpacking which is almost like a macro that places the rest tail into the code. We have a partial specialization of area_of which takes one argument, and this is the exit condition that is required of any recursive process, whether it’s a runtime operation or a compile time operation. In that exit condition, it calculates the area of each individual part of the hash by multiplying its size in bytes by the number of bits per byte. CHAR_BIT is a standard library defined macro and should be used instead of assuming all compilers use octets for bytes.

The member value of area_of is declared as static constexpr. This is pretty much necessary of all template metaprogramming. constexpr tells the compiler to evaluate the expression at compile time if possible, and therefore usable as a template argument for something else.
The definition of area_of_t will be explained later, but it would suffice to say that it is the type of the integer that is returned by a hash function. At this point, we only have to consider that we only need to know the sizeof that integer type. It’s often defined as std::size_t but why assume when the compiler can work it out for you.

With the definitions of area_of, we now can figure out the bit size of any combination of hashes:

std::bitset<area_of<bitset<7>>> result;
std::bitset<area_of<short, u16string, Department>> result;

Now we get on to the task of tightening the other side of the vice. We’ll call it multihash and we define it thus:

template<typename...> struct multihash;
template<typename T> struct multihash<T> : std::hash<T>{};

We declare a struct template multihash with variadic template arguments. We don’t give it a definition, as there is no sensible default that covers all bases. We can, however, partially specialize it for one template argument and by default, we will make it equivalent to std::hash. This makes it easier later on where we want to calculate the hash of any single value and we don’t want to have to call the correct hash depending on whether it is a fundamental or library provided type versus a user defined type.

To put it another way, we would be able to call multihash or multihash and let the compiler figure out whether it calls the std::hash or the user defined version. Now that we have the easy stuff defined, we can now define a version of multihash that can do the work of hashing an arbitrary number of arguments.

template<typename Arg, typename... Args>
struct multihash<Arg, Args...>
{
  auto operator()(const Arg& arg, const Args&... args) const noexcept
  {
    std::bitset<area_of<Arg, Args...>::value> concat{
      multihash<Arg>{}(arg)
    };
    concat <<= (concat.size() - area_of<Arg>::value);
    concat |= multihash<Args...>{}(args...);
    return std::hash<decltype(concat)>{}(concat);
  }
};

multihash follows in the footsteps of std::hash in that is is a functor – an object that can be used like a function (has the operator() defined). One, is that one of its specializations inherits from std::hash, but more importantly, it needs to be a functor to be used in the hash containers. The operator() is declared as const noexcept to allow for optimizations, and uses return type deduction so that, again, the compiler can work out the integer type it needs to return rather than having us figure it out and maybe getting it wrong.

Similarly to area_of, this uses the list/head/tail recursive structure to move through the variadic template arguments one by one. Note that const Args&… is not the same kind of variadic functions as the printf clan. This is completely safe as, unlike printf, these parameter packs preserves both the number of arguments and the type of their arguments. It is as though the compiler generates a completely new function for every permutation of arguments that occurs in a program’s source code, but without the nastiness of variadic functions or C preprocessor macros.

The actual function itself basically calculates the hash of each individual argument, and concatenates them into the bitset concat, the area of which we have already calculated during compilation, and return the hash of that uber-bitset as the combined hash. Then we have completed the circle – tightened the vice.

What’s left is actually explaining where area_of_t comes from. It is defined like this:

template<typename T>
using area_of_t = decltype(multihash<T>{}(std::declval<T>()));

C++ has now introduced an alternative to typedef called aliases. We define the integer type of the area_of calculation to be the type returned when we use multihash an any single type. The use of declval and decltype allows us to get the compiler to evaluate the expression to figure out its type without actually having to run the program. Thus, there is already a definition for all of the fundamental types and library defined types, and also for any user defined types that have specialized multihash to provide their own hash definitions. So now we actually get to see how a programmer would use this library:

template
struct multihash
{
  auto operator()(const Department& dep) const noexcept
  {
    return multihash{}(dep.dep_id);
  }
};

template
struct multihash
{
  auto operator()(const Employee& emp) const noexcept
  {
    return multihash{}
             (emp.id, emp.name, emp.dep);
  }
};

This is not really that far off from our original goal of being able to define these in just one line. These are explicit instantions of multihash for the types Department and Employee. The multihash of Department must be defined before Employee’s because the multihash of Employee needs to “see” the definition for Department. Otherwise it would use the default, which is std::hash, for which we have not instantiated.

Note the judicious use of decltype in order to supply the template arguments. This just makes it much more easier if we, say, define Employee::name to be std::wstring instead and so we don’t have to change anything as we’ve told the compiler to figure it out. Finally, for completeness, we show the normal usage for this:

multihash emp_hash;
multihash dep_hash;

Department dep1{0b0'00'00'01};
Department dep2{0b0'00'00'10};
Department dep3{0b0'00'00'11};

Employee emp1{1, u"one", dep1};
Employee emp2{2, u"two", dep2};
Employee emp3{3, u"three", dep3};

std::cout << dep_hash(dep1) << std::endl;
std::cout << dep_hash(dep2) << std::endl;
std::cout << dep_hash(dep3) << std::endl;

std::cout << emp_hash(emp1) << std::endl;
std::cout << emp_hash(emp2) << std::endl;
std::cout << emp_hash(emp3) << std::endl;

std::unordered_set<Employee, multihash> empset{
  emp1, emp2, emp3};

Just because we can, we demonstrate the other new C++ features, such as binary literals (0b11010101), number literal separators (31’4159’26’535897’9), unicode string literals (u”I’m a string”), and braced initialization ({1, u”two”, dep3} or {emp3, emp2, emp1}).

Cyclic Tower of Hanoi – analysis and implementation

2014/12/26 King-Yan Kwan software engineering C++, C++14

The Tower of Hanoi problem consists of moving a size-ordered stack of n discs from one tower to another tower, out of three towers {A, B, C}, one disc at a time without putting a larger disc on top of a smaller one. The cyclic version of this problem adds the further constraint that a disc can only move through the towers in cycles, eg B -> C -> A.

Regardless of the details of the different versions, the problem has one useful invariant: the nth disc can only ever move to an empty tower because there are no discs larger than it. So wherever the nth disc moves, the top n-1 discs can move to any tower regardless of where the nth disc is without breaking the size-order rule. So this implies a recursive process in which we get the nth disc to it’s final position, but also apply that process to the top n-1 discs BEFORE applying it to the nth disc. But due to the cyclic constraint, we’ll need two versions of each process – one that moves m discs to an adjacent tower, and one that moves m discs to the post-adjacent tower. For moving one disc, there’s no difference than with applying the same function twice, but for n > 1 discs, moving to the post-adjacent tower can be done more efficiently than moving all the discs to the adjacent tower twice.

So first, we’ll represent our three towers using:

constexpr int A = 0, B = 1, C = 2, TOWERS = 3;

You could use enum classes, but ints are easier for now. Since we are doing this in cyclic order, we may as well define a function to get the next tower:

constexpr auto next_tower(auto _t)
{
  return (_t + 1) % TOWERS;
}

constexpr ensures compilation time evaluation to speed things up a bit but also is a requirement if we want to use template metaprogramming. The modulo wraps the tower addition around back to A(0). Next is to represent our towers and discs. The discs are represented by the natural numbers from 1 to n, thereby also representing their relative sizes and thus ordering constraint. An empty position is represented with a 0-disc. While we’re at it, we’ll also print out the towers for demonstration purposes.

int tower[3][12] = {{1,2,3,4,5,6,7,8,9,10,11,12}, {0}, {0}};
constexpr int DISCS = std::extent<decltype(tower), 1>::value;

const auto disc_width = std::to_string(DISCS).length();
std::string padding(disc_width + 1, '-');

/*
The towers are printed horizontally to avoid having to align columns.
--------------------------------2--7-11-14-15-16|A
--------------------------------1--3--5--8-10-13|B
--------------------------------------4--6--9-12|C
*/
void print_tower()
{
  for(auto t = 0; t < TOWERS; ++t)
  {
    for(auto d = 0; d < DISCS; ++d)
    {
      if(tower[t][d]) std::cout << "-" << std::setw(disc_width) << std::setfill('-') << tower[t][d];
      else std::cout << padding;
    }
    std::cout << "|" << static_cast<char>('A' + t);
    std::cout << std::endl;
  }
  std::cout << std::endl;
}

For verification purposes, you can use the algorithm std::is_sorted(…) on each tower to assert that the towers are do not violate size-ordering. For the disc transfer functions, I use template metaprogramming to illustrate compile-time generation of the move sequence. We must define the template definition before we define the specializations. The definition of the general function is to move n discs from one tower to another tower (doesn’t matter which). We’ll also include a conditional parameter for choosing moving to either an adjacent or post-adjacent tower.

auto rfind_zero(auto _first, auto _last)
{
  typedef std::reverse_iterator<decltype(_first)> riter;
  return std::find(riter(_last), riter(_first), 0).base();
}

template<int Discs, int Src, int Dst, bool direct = next_tower(Src) == Dst>
struct transfer;

template<int Src, int Dst>
struct transfer<1, Src, Dst, true>
{
  static inline void disc()
  {
    auto top_src = rfind_zero(tower[Src], tower[Src] + DISCS);
    auto top_dst = rfind_zero(tower[Dst], tower[Dst] + DISCS) - 1;
    std::iter_swap(top_src, top_dst);
    print_tower();
  }
};

The templated structure definition does not actually define a complete structure. This is okay as we should expect that we will cover all cases. Leaving it undefined like this means that if we made a mistake in our implementation, the compiler will tell us with a compiler error saying that there is no definition. The direct boolean is a default parameter calculated from the source and destination tower arguments. This means that in the specialization, the compiler will calculate it for us and we would not need to specify whether it is direct or not. This shows how the power of the strong static typing in C++ allows us to check algorithm validity without having to run it.

To move a disc from one tower to another, we find the top disc of the source tower, remove it, and place it on top of the destination tower. The top disc of the first tower is the first non-zero element in our source tower array. The top of the destination tower which we place our disc is the element before the first non-zero element in the destination tower array. I use a reverse linear search (std::find with reverse iterators) because the array is small and the first non-zero disc will regularly be closer to the bottom than the top (beginning of the array), but using binary search (std::upper_bound) will work equally well. I use std::iter_swap to do the actual transfer of the disc from one tower to the next since, by definition, it will always swap with a 0-disc, so it will look like we actually moved something.

Next we’ll need to specialize for moving one disc to a post-adjacent tower – the indirect specialization:

template<int Src, int Dst>
struct transfer<1, Src, Dst, false>
{
  static inline void disc()
  {
    transfer<1, Src, next_tower(Src)>::disc();
    transfer<1, next_tower(Src), Dst>::disc();
  }
};

This is the only case where moving to a post-adjacent tower is equal to moving to the adjacent tower twice. We can cheat a bit here where we can simulate two transfers by calling the direct specialization with the Src and Dst towers, as long as any metadata function (such as move counts, or move listing) is also simulated in the output.

These two specializations are the base cases and as you can see they will never fall down further to the undefined structure definition above. There is also another opportunity for optimization by keeping track of the top disc for each tower so that we can avoid the search. Then we’ll need to generalize these two functions for cases with more than one disc.

template<int Discs, int Src, int Dst>
struct transfer<Discs, Src, Dst, true>
{
  static inline void disc()
  {
    transfer<Discs-1, Src, next_tower(Dst)>::disc();
    transfer<1, Src, Dst>::disc();
    transfer<Discs-1, next_tower(Dst), Dst>::disc();
  }
};

To move an n-tower of discs to an adjacent tower, if we remember our invariant, we first have to make room so that we can move the nth disc into the final position. So first, we’ll need to move n-1 discs from the source tower to the tower after next. Then we move the nth disc to the adjacent tower. Now the destination tower is the post-adjacent tower to where our n-1 discs are temporarily stacked, so we’ll do another indirect transfer as before. Now you may wonder why we can call the indirect transfer for n > 1 discs before we specialize it. The answer is that we’ve only defined the templates, but we haven’t used them yet – we haven’t yet referred to a class where we’ve nailed down all of the template parameters, so they still only exist “in the ether”. But we have to specialize the remaining function before we make the actual call:

template<int Discs, int Src, int Dst>
struct transfer<Discs, Src, Dst, false>
{
  static inline void disc()
  {
    transfer<Discs-1, Src, Dst>::disc();
    transfer<1, Src, next_tower(Src)>::disc();
    transfer<Discs-1, Dst, Src>::disc();
    transfer<1, next_tower(Src), Dst>::disc();
    transfer<Discs-1, Src, Dst>::disc();
  }
};

To move an n-tower of discs to the post-adjacent tower, we must do a little bit more. In essence, we’ll have to make space twice to allow the nth disc to move to its final position. So we make space the first time by transferring the top n-1 discs to the post-adjacent tower. Then the nth disc moves to the adjacent tower. Then we move n-1 discs to its adjacent tower. The nth disc is moved again to its adjacent tower and has reached its destination tower. So all that’s left is the move the n-1 discs to the destination tower, which is an indirect transfer. Finally, we define our main function:

int main()
{
  print_tower();
  transfer<DISCS, A, B>()();

  return 0;
}

You can add a counter and check that the number of single disc transfers equals the numbers given by the following table:

http://www.wolframalpha.com/input/?i=%28%281+%2B+sqrt%283%29%29%5E%28n%2B1%29+-+%281+-+sqrt%283%29%29%5E%28n%2B1%29%29%2F%282+*+sqrt%283%29%29+-+1%2C+n+%3D+1%2C+16

Easy adapter pattern with object map

2014/11/232014/11/23 King-Yan Kwan software engineering C++11, C++14, Pattern

Say you want to have an a class that allows dynamic adaptation. You can’t create it as a template class to provide “policies” in policy based design because different policies will create distinct classes. This will prevent different adapter classes from attaching to the same object.

You also wouldn’t want to require adapter classes to have to inherit from some interface and introduce an unneeded dependency through virtual inheritance.

Ideally, you’d want to have the flexibility of templates but in a dynamic context. This is the kind of container you’d need to achieve this:

#include <unordered_map>
#include <typeindex>
#include <memory>
.
.
.
std::unordered_map<std::type_index, std::shared_ptr<void>> object_map;

type_index is a C++11 feature that allows types to have hash and weak ordering semantics. To get rid of the boilerplate code to use this container, you’d have interface functions like these:

template<typename T, typename... Args>
void add_object(Args... args)
{
  object_map[std::type_index{typeid(T)}] = std::static_pointer_cast<void>(std::make_shared<T>(args...));
}

template<typename T>
T& get_object()
{
  return *static_cast<T*>(object_map[std::type_index{typeid(T)}].get());
}

shared_ptr is used because it is the only smart pointer that supports a void type as well as having nice helper functions to cast to shared_ptrs of other types. It also uses variadic templates to allow construction via make_shared, which is the exception safe way to create a shared_ptr. What’s also nice about it is that the casting to a void shared_ptr does not throw away custom deleters, as proven by this code:

struct A
{
  static int gen;

  A()
  {
    ++gen;
  }

  void operator()()
  {
    std::clog <<  "Run A: " << gen <<  std::endl;
  }

  ~A()
  {
    std::cerr <<  "Destructing A" <<  std::endl;
  }
};

int A::gen = 0;

int main()
{
  auto aptr = std::shared_ptr<A>(new A(), [](A* _p)
    {
      std::cerr << "Custom delete A" <<  std::endl;
      delete _p;
    });
  auto voidptr = std::static_pointer_cast(aptr);
  aptr.reset();
  (*std::static_pointer_cast<A>(voidptr).get())();
  std::clog << "Number of owners: " <<  voidptr.use_count() << " " << aptr.use_count() << std::endl;

  return 0;
}

A has destructor to signal that it is in fact being deleted. aptr has a custom deleter. After being cast to a shared_ptr<void>, aptr is reset to show that it isn't the custom deleter in aptr that is destroying the A object but the one that is casted into voidptr.

The output of that test is:

Run A: 1
Number of owners: 1 0
Custom delete A
Destructing A

So back to the object map. This simple test demonstrates it working:

std::clog << "Object map test" << std::endl;
add_object<A>();
add_object<int>(42);
add_object<double>(2.71828);
add_object<std::string>("Hello world!");

std::cout << get_object<std::string>() << " " << get_object<int>() << " " << get_object<double>() << std::endl;
get_object<A>()();

The output of this test is:

Object map test
Hello world! 42 2.71828
Run A: 2
Destructing A

Radix sort – another language comparison exercise

2014/10/21 King-Yan Kwan software engineering C++, C++14

I was reading the Wikipedia page for Radix sort[1] and I noticed a pretty terrible example for C++, so I set about writing my own, which I’ve included in the wiki as a modernized C++14 version.

#include <iostream>
#include <algorithm>
#include <array>
#include <vector>
#include <random>

int main()
{
  using INT = std::mt19937_64::result_type;
  std::array<INT, 200000> randoms;
  std::generate(randoms.begin(), randoms.end(), [engine = std::mt19937_64{}]() mutable {return engine();});
  std::array<std::vector<INT>, 256> buckets;

  for (auto r : randoms) std::cout << r << std::endl;
  std::cout << std::endl;

  for (int i = 0; i < sizeof(INT); ++i)
  {
    for (auto r : randoms) buckets[(r >> (i << 3)) & 0xFF].push_back(r);

    auto j = randoms.begin();
    for (auto& bucket : buckets)
    {
      for (auto b : bucket) *(j++) = b;
      bucket.clear();
    }
  }

  for (auto r : randoms) std::cout << r << std::endl;

  return 0;
}

1: http://en.wikipedia.org/wiki/Radix_sort

Interesting exercise in brevity and clarity

2014/10/062014/10/06 King-Yan Kwan software engineering Advice, C++, C++14

Recently had a discussion and challenge in comparing two languages, C++ and Python. I think modern C++ is holds up really well to so-called scripting languages to do quick and dirty utility programs. This is a reasonably short implementation of a prime number finder:

#include <cstdio>
inline bool prime(const auto _candidate, const auto *_first, const auto *_last) {
  for (auto p = _first; p != _last && *p * *p <= _candidate; ++p)
    if (_candidate % *p == 0)
      return false;
  return true;
}
int main(int _c, char** _v) {
  const unsigned num_primes = 10000;
  static unsigned primes[num_primes] = {2, 3};
  for (unsigned i = 2; i < num_primes; ++i)
    for (primes[i] = primes[i-1] + 2; !prime(primes[i], primes + 1, primes + i); primes[i] += 2);
  printf("The %uth prime is: %u.n", num_primes, primes[num_primes - 1]);
  return 0;
}

Several things. Mostly I’ve learned to re-embrace the spirit of C/C++ for brevity, such as single statement if and for blocks. But you really have to think about readability when you code in that style. The brief C style is only bad if it’s done without consideration about code aesthetics. The brief style shouldn’t be about reducing line count, but about increasing readability. It is a bit counter-intuitive coming from a university education that told you to put every if block in braces over multiple lines.

When coded in such a manner, modern C++ can approach the ease of writing that languages like Python enjoy.

World v1.0 beta

Re-engineering the world a line at a time

C++14