When people say a program is fast, they usually mean it takes a short time to execute; but when computer scientists say an algorithm is efficient, they mean the growth rate of the run-time with respect to the input size is slow. It is often the case that these two notions approximately mean the same thing, that a more efficient algorithm yields a faster program, even though it is not always true. I recently learned about two types of hash tables that show this difference: Cuckoo hashing and Robin Hood hashing. Even though in theory Cuckoo is more efficient, Robin Hood is in practice a lot faster.
Vanilla Hash Tables
Before going into details, let’s revisit how a hash table works. Simple thing, you have a hash function that takes a key and yields a random (but deterministic) number as the “hash”, and you use this hash to figure out where to put this item in the table. If you want to take something out, just compute the hash again and you can find exactly where the item is stored.
That’s not the whole story though. What if you want to insert item A into spot 1, but item B is already there? You can’t just throw away item B, what are you, a savage? There are multiple ways to deal with this problem called collisions. The easiest way is chaining, which is to make each slot into a list of items, and inserting is just appending to the corresponding list. Sure, that would work, but now your code is a lot slower. This is because each operation takes a look up to get the list, and another look up to get the contents of the list. This is really bad because the table and the lists will not align nicely in cache, and cache misses are BAD BAD things.
To improve cache performance, people started to throw out bad ideas. One bad idea that kind of worked was, if this slot is occupied, just take the next one! If the next one is taken, just take the next next one! This is called linear probing, and is pretty commonly used. Now we solved the caching problem because we’re only accessing nearby memory slots, our programs now run a lot faster. Why is this bad? Well, it turns out as the table gets filled up, the probe count can get really high. Probe count is how many spots need to be checked before an an item is found, or how far away the item is from where it should be. Imagine out of 5 spots, the first 3 are occupied. Then, the next insertion has a 60% chance of collision. If it does collide, the probe count is on average 2 (probe count = 3 if the hashed index is the first slot, 2 if second, and 1 if third). And after a collision, the size of the occupied segment increases by one, making the next insertion even worse.
Robin Hood to the Rescue
This brings us to Robin Hood hashing. It is actually very similar to linear probing, but with one simple trick. Say in the example above, we have item A sitting at slot 1, B at 2, C at 3 like this: [A B C _ _] and we have another collision trying to insert D at slot 1. Instead of placing D like [A B C D _], we arrange them as [A D B C _], then the probe count would be [0 1 1 1 _] instead of [0 0 0 3 _]. Now there are two things to understand: (1) What exactly happened? (2) Why is this better?
To answer (1), imagine we’re inserting D to slot 1. We see that A is there, then just like linear probing, we move on. Now B is there with probe count 0, while D already has probe count 1. That’s not fair! How dare you have a lower probe count than I do! So D kicks out B, and now B has to find a place to live. It looks at C and yells and kick C out, now C has to find a place, which happens to be slot 4. In other words, during probing, if the current item has a lower probe count, it is swapped out and probing continues.
But (2) why is this better? If you think about it, the total probe count (which is 3) is unchanged, so the average probe count is the same as just linear probing. It should have exactly the same performance! This approach is superior because the highest probe count is much lower. In this case, the highest probe count is reduced from 3 to 1. If we can keep the max probe count small, then all probing can be done with almost no cache misses. This makes Robin Hood hash tables really fast. Having a low maximum probe count also solves a really painful problem in linear probing, which is when you remove an item in the hash table and leave a gap. When we look for items, we can’t just declare missing when we first probe an empty spot, because maybe our item is stored further ahead, and this spot was just removed later on. With the maximum probe count really low in Robin Hood hashing, we can then say “I’m just going to look at these n spots, if you can’t find it it ain’t there.” Which is nice and simple. (It also enables shift back deletion instead of tombstone, which is another boost to the performance, but I don’t want to get too deep.)
A Theoretically Better Solution
Even though the highest probe count is small, it still grows as we have more items in the hash table (probe count goes up when load factor goes up). What if I tell you I don’t want it to grow? Meet Cuckoo hashing, which has a maximum probe count of 1.
The algorithm is slightly more complicated. Instead of one table, we have two tables, and one hash function each, chosen randomly among the space of all hash functions (not practical, but close enough). When we look up an item, we just need to compute the hash for the first and the second table, and look at those two spots. Done!
I mean, sure, but that’s easy – how do you insert? What if both spots are full? That’s where the name cuckoo comes from. You pick one of the spots, kick that guy out and put your item there. Now with the new guy, you find its other available spot, kick that guy out, so on and so forth. If you end up in a loop (or spend too long and decide to give up), you pick another two hash functions, and rebuild the hash table. As far as at least half of the slots are empty, it is highly likely that the whole operation takes constant time in expectation. I don’t have any simple intuitions about why this is true. If you do, please let me know.
Theoretically, Cuckoo hashing beats Robin Hood because the worst case look up time is constant, unlike in Robin Hood. Why is it still slower than Robin Hood in practice? As it turns out, the constant is both a blessing and a curse. When looking up missing items, there has to be two look ups; even when looking up existing items, on average there still has to be 1.5 look ups. To make matters worse, since the two spots are spatially uncorrelated, they usually cause two cache misses. Once again, cache performance makes a big difference in the overall speed.
Having said all that, performance in real life is a very complicated matter. Cache performance is just one of the unpredictable components. There are also various compiler optimizations, chip architectures, threading/synchronization issues or language designs that can affect performance, even given then same algorithm. Writing a fast program is all about profiling, fine tuning, and finding the balance*.
*My coworker once said, “at the end of almost any meeting, you can say ‘it’s all about the balance’, and everyone would agree with you.