Building an AVL Tree From a Sorted Sequence in One Pass

Recently I came across the function Map.of_increasing_sequence in the base library of OCaml. It might sound like a very simple and common function, but the implementation is actually quite cool. Let’s dive in. (Spoiler: it’s related to a weird number system.)

First Impressions

A sequence is like an iterator – you can either get the next value or reach the end. A map is an immutable AVL tree of key value pairs. An AVL tree is a binary tree with two properties – each node is larger than all nodes in its left subtree and smaller than all nodes in its right, and both subtrees can only have heights differing by at most 1. From an algorithmic standpoint, making a BST from a sorted array is trivial. You can achieve O(n) time complexity, and O(log(n)) space excluding the return value, just by simple recursion. But you can’t do that to a sequence, since sequences don’t permit random access. Now, we can always turn the sequence into an array first, but that would require O(n) extra space. Can we do better?

It turns out that library function is implemented with only one pass through the sequence, using only log(n) extra space. The code had almost no documentation, and I couldn’t find any description online (although I didn’t try very hard), so here I’ll attempt to motivate and derive the algorithm.

First Attempt: Try to Build a Tree

Imagine yourself with the task of building a balanced tree and are handed one number at a time, and you need to incrementally build a BST as quickly as possible. That might look like this.

I get a 1 – that’s easy. I’ll have a tree with one node.

2 – OK, that’s bigger than 1, I can make that the new root, and make 1 the left child.

3 – Let’s put that as the right child. So far so good.

4 – Hmm, maybe we could make that the new root, and have the tree rooted at 2 as the left child?

5, 6, 7 – That looks like 1, 2, 3 all over again, we can put those in the right subtree. Now we have a complete BST, looking good!

8 – That’s awkward again, let’s just say it’s the new root again.

9-15 – Looking like 1-7 again…

Maybe you can see the recursive pattern here. This looks like a procedure that produces reasonably balanced trees, and the height is always bounded by O(log(n)). That seems good, right?

It would be acceptable, but only if your map library only has two functions – build the tree, then look up values and never change it again. The problem is, the BST also has to support adds and deletes as well. The most common BST types, like red black trees and AVL trees, all have their own invariant, and unfortunately we’re not meeting those standards with our almost-balanced trees. For example, At step 8, we have a root (8), a left subtree of size 7, and an empty right subtree. That’s not a valid red black tree or AVL tree, and you can’t just return that, since that would break the rest of your library. How can we fix this?

Second Attempt: Build Branches Instead

One might think that we could make this work somehow with some clever ordering of insertions to the tree. But the fundamental issue here is that with only one tree, we can never change the root – at the moment we make the newest element the root, the tree must become heavily imbalanced. So perhaps we can instead maintain a bag of branches, and quickly assemble them into a tree when we hit the end of the sequence.

Here, the defining characteristic of a branch would be its composability. Let’s define a branch (called a fragment in the source code) like the trees we had in step 4 and 8 in the last attempt. In other words, a branch would be a tree with a complete left subtree and an empty right subtree. To merge two branches into a tree, you could put one branch as another branch’s right child. A branch of height n has 2^n nodes.

You could also merge two branches of the same size into a new branch. Here’s one way to do it: to merge X with Y, take the left subtree of branch Y and move it to the right subtree of branch X. Now X becomes a complete binary tree, and you can set it to be the left subtree of Y. This fits our branch definition, while also preserving order in the tree.

Merge branches as tree:
    X       Y        X
   /   +   /   =    / \
  A       B        A   Y

Merge branches as branch:
    X       Y        Y
   /   +   /   =    /
  A       B        X
                  / \
                 A   B

Now let’s try again:

1 – One node by itself is a branch.

2 – We could make 2 a branch and merge with 1.

3 – We could have 3 be its own branch. Now we have branches of size 2 and 1.

4 – Let’s merge 4 with 3, and then we have 2 branches of size 2. We could merge those two again into one branch.

5 – That’s a new branch.

6 – Add that to 5’s branch…

That starts to look recursive again. Now we always have a bunch of branches. And when we need to generate the final tree, we could just iteratively merge them together, from small to large. Is that good enough?

We Are Building a Binary Number

It’s not. To see this, we can frame this algorithm a bit more abstractly.

Consider the heights of our branches. For each integer n, each time we can only have either 0 or 1 branch of height n, because once we have 2, we merge them together. We can visualize the branch building in this table.

# branches of size 4# branches of size 2# branches of size 1# Nodes
Our branches sizes correspond to the binary representation of total tree size.

From here, we can see that we’re abstractly incrementing a binary counter. Now the problem is we can have a lot of gaps, or 0s, in the binary number. For example, for 17 nodes, we’ll have a large branch of size 16 and a small branch of 1 node. Now if the sequence terminates, we’ll have to merge those branches into a tree – but that again will be a heavily one-sided tree.

Third and Final Attempt: Keep 2 Branches at Each Level

Since gaps are causing us problems, maybe we could just, like, not have them. And in fact we could. This is hinted at in the code – “using skew binary encoding”. (Although from Wikipedia, skew binary system actually refers to a slightly different definition.)

In this new “binary” encoding, we could use the digits 0, 1 and 2, as opposed to just 0 and 1. Each position in the number would still have the same weight. So for example, 212 = 2*4 + 1*2 + 2 = 12. Here’s how to count in this new number system.

Counting in the new system

Basically, to add one, we flip 0 to 1 and 1 to 2, but we flip 2 back to 1 and carry forward. There are never any 0s in any number.

Translating this back to our branch building, that means we don’t merge when we have two branches. We merge when we have three – and we merge the older two branches, “carrying” it forward to the next level, and always keep one branch for each height.

Let’s convince ourselves that at every point in the process, we can merge all branches and end up with a valid AVL tree (i.e. the algorithm is correct).

Say we are n steps into the branch building process, and we have to make a tree. We can convert n into a string of 1 and 2 in this number system. Starting from the least significant digit, we either have 1 or 2. Together, that gives us either a tree of height 1 or 2. Moving onto the next digit, we again have either 1 or 2 branches of size 2. At the max, we have 3 branches, each of size 2. If we first merge the left two branches into a branch, then merge that with the right branch to a tree, that leaves us with a maximum tree height of 3 (while minimum is 2). At the next level, we can at max have 3 branches of height 3. Similarly, we end up with a tree with height between 3 and 4.

Illustrating the 222 case, with 14 nodes.
Level 1:
13 + 14 

Level 2:
 10  +  12  +  14
09     11     13
  10      14
09  11  13

Level 3:
      04          08          12
  02      +   06      +   10      14
01  03      05  07      09  11  13
      04              12
  02      06      10      14
01  03  05  07  09  11  13  

In this process, we always create trees that preserve order. And after level n, the tree that we end up with always have height n or n+1. That satisfies the AVL tree invariant that two subtrees should have heights differ by at most 1.

That’s It!

Now we should be reasonably convinced that this algorithm produces valid BST. But there were a lot of details that were glossed over. To be completely rigorous, we would need to formalize the observations into claims, and prove them by induction. But I believe this process captures the key ideas already.

We also skipped the time/space complexity discussion. There are two slightly nontrivial details here. First, each insertion could lead to a cascade of branch merges (or carries), so we need to argue that insertion has an amortized cost of O(1). Then, we need to realize that the final tree merging takes O(log(n)) branches, and each tree merge is O(1) as well. As an aside, this number system has a unique representation for each number, which is perhaps not totally obvious.

I am not able to identify the inventor of this algorithm. It doesn’t seem particularly likely that the author of this code was also the inventor.

There are still some aspects of that source file that I don’t quite understand, but is perhaps not closely related. In particular, the invariant is that subtrees have heights differing by at most 2, not 1, like normal AVL trees. Maybe I’ll find out why another day.

What I Learned in Two Years’ Tech Work in Finance

It’s been two years since I started working full time as a software engineer. As I accumulated experience, I became a lot more hesitant to write, because I start to feel that I can’t contribute anything new on top of what everyone else already knows. And I even felt bad for having written some of the old posts, since they now seem quite silly and naive.

In some sense those feelings must reflect a lot of truth, but that still shouldn’t stop me from writing. Perfect is the enemy of good, and if I wait until I know everything, I’ll never write again; hence this post. Random thoughts will be laid out in no particular order.


One mistake that I’ve repeated is to optimize prematurely. As a recent college grad having done a lot of brain teasers, it’s really tempting to work clever algorithms into the job. But a lot of the times, it’s just unnecessary. In coding competitions, we are only rewarded for writing correct, fast and short programs. Nothing else mattered. In professional work, we need to add a few more terms to the equation – cost of human effort (to write, test and review the code), flexibility for future modifications, and simplicity of the solution. A simple solution that gets 90% of the cases right is perhaps even better than a complicated solution that gets 99.99%, if humans can much more easily understand failure modes and be able to manually fix things in the former case. After all, the alternative is to spend a lot of time debugging when the one edge case happens, and breaks the system.

I think this is important enough to deserve an emphasis – simplicity is valuable. A prediction system that gives you a slightly inaccurate number in a predictable way is much better than one that gives you a more accurate magic number that no one understands. In financial markets, complexity creeps in wherever competition is fierce, but the simplicity of many models would probably still be surprising to outsiders (hint: it’s not all machine learning).

There’s another form of short-sighted cleverness, which is to tweak the system very slightly to achieve what I want without learning about the whole system or understanding the full consequences of those tweaks. The smallest diff isn’t necessarily the best fix. Adding a patch that isn’t well thought out and that doesn’t fit in with the rest of the code is just incurring tech debt. Perhaps we could call this under-engineering.

Don’t Pretend You Understand

One thing that is common, perhaps more so among newer folks, is to pretend to understand, whether listening to a conversation or getting answers from teammates. Having been on both sides, I believe that this is a really bad habit. Of course it is only human nature to hide your inexperience, and I’ve also heard criticism about people asking for help before spending a lot of time on the issues. But no, I think a lot of the times, asking questions eagerly is way more productive over all, especially for new teammates.

I say this multiple times to my interns: if a problem can possibly take you half an hour to figure out, and I already know the answer, then the decision here is between one minute of my time versus thirty minutes of yours. Is my time 30x more valuable? I wish! If it’s a tough question and the mentor doesn’t already know the answer, then it seems even sillier for the intern to struggle alone for a long time.

There are times when I’m answering questions from newer folks, and I know that there’s no way they understood some statements I made. However they were still nodding and reacting as if they did. Invariably they will return in a few days with the same questions. This is counter productive.

This problem is less common, although perhaps more serious for more experienced people, since the pride has built up. I’ve told myself “this is something I should know by now” multiple times and stopped myself from asking my coworkers. But the truth is no one knows every corner of the system, and everyone knows that.

There is no shame in not knowing things. People expect that. Just ask.

Aligned Incentives

One idiosyncrasy about the finance industry is that a lot of compensation comes in the form of variable and unpredictable year end bonuses, as compared to stock offerings in tech companies. Of course people like certainty. But I think there’s a case for an opaque process of reward in the form of bonuses.

From first principle, employees have different incentives, and they usually aren’t the same as the company’s goals. Whenever incentives diverge or even conflict, we can get serious issues.

There are countless examples in real life. One example in finance is the reward curve for some hedge funds. Hedge funds are roughly companies that take investor money and help them pick investments. Some hedge funds collect a fixed fee, plus a significant fraction in additional return. That means when the investments increase in value, they make more money. That’s good incentive, right? The problem is when the investments lose money, they aren’t affected – they still take the fixed fee, regardless of how much was lost. (This is not entirely accurate, since they will lose clients if they keep losing money.) Therefore the funds will tend to make riskier investments. If your investment is going to make on average 10% a year, you’d rather make 100% this year and lose ~80% next year, as you can collect a much larger fee. This is worse for the clients.

Now let’s look at employee compensation. We want to reward employees in a way that encourages them to help the company make more money (assuming making money is the goal of the company). One thing we can do is to measure the amount of contributions for everyone, and reward accordingly. For example, we could measure hours spent in the office, or number of lines of code written, or survey people for their estimations of their teammates’ contributions, etc. The problem is these are only proxy measurements, and once you start measuring them, people will optimize for the proxies instead of the actual goal. If you measure lines of code written, you’ll encourage verbose and redundant code; if you measure hours in the office, people will stay longer but not necessarily work at the same speed, and so on.

But the problem is fundamental – you can’t measure the actual contribution and hard work, and by measuring proxies you’ll encourage cynical behavior. A fix, if not a complete solution, is through obfuscating the reward function. If I tell you that I’ll give you an unknown amount of money by the end of the year based on How Well You DidTM, and let you fill in the rest, then you won’t be encouraged to write bad code, or only focus on projects that had Impact, or other things bad for the firm.

I feel that this has worked quite well in my company. But there are a lot of assumptions for this to work. One is that employees have to be OK with not knowing how much they’ll make. There also needs to be a lot of trust between employees and managers, so that employees can trust that they’ll be evaluated fairly by the end of the year.

And More…

This post is getting long and messy, so maybe I’ll call it a day for now. There are a lot of smaller lessons that come from trading and recruiting. Trading is arguably the best arena to hone one’s rational decision making skills, and interesting stories come up every now and then. Maybe I’ll write a follow up some day.