Bloomberg CodeCon 2017

Went to Bloomberg’s headquarters in New York for a coding contest a few days ago. It looks more like a recruiting event than an actual contest, as the problems were fairly not difficult comparing to ACM contests, and it was only 2 hours.

Getting into the contest

There are like around 2^7 contestants in this contest, 3 from each invited university. Couple months before the actual contest on January 27, a local contest was held within USC to select representatives. 8 problems in 2 hours, each problem holds a different weighting, correlated with the difficulty.

After qualifying, Bloomberg schedules the flights, hotels and transport, so we the contestants just have to ditch the Friday classes and head to the airport. Of course, they pay for everything: round trip flights, stay at Fitzpatrick Grand Central over the weekend, cars between hotel and airport, etc. For free stuff, there’s a Bloomberg hoodie, beanie, $100 gift card, 2 $5.5 Metro cards, and a random nanoblock thing.

The horseshoe shaped building, from the inside.

When we got there, they had a little tour around their pretty nice building. They didn’t walk us through a lot of it though; we went to the top floor to see the Manhattan view, grab free food, look at the UX lab and some random fish tanks and that’s it. The other contestants didn’t seem to give a * about the tour, I bet most of them came for the free stuff.

Rules

Basically the CodeCon was the same as local contest: 8 problems in 2 hours, open Internet open book and everything. Technically we could’ve joined the contest without all the flying since it was online anyways. I think there was also a synchronous event in London, and colleges like ICL and Oxford were there too.

Top three guys can actually win a prize. They gave out a laptop, a PSVR and something else I forgot. Given the slim chances I didn’t try very hard, since the marginal benefit is basically 0.

The contest started to everybody’s surprise since they “instead of pushing back the contest by 10 minutes, accidentally pushed it early by 10 minutes.” Well… alright, let’s get into coding then.

Problems

Honestly I don’t like this problem set too much. Even the local USC contest was more interesting; it had a little math thing like GCD, BFS maze search and DP. A few problems in the finals, however, are purely brute-force problems. The input sizes would be like around 2^4, and you just have to enumerate all the possibilities and do some stuff. Here are a few examples.

The sorting problem

Given an integer array of size at most 5 with unique elements, print the minimum number of pair swaps to make it sorted. The correct/most efficient way was to pick the minimum and move it in the front, something like this:

int sorting(vector<int>& v) {
    int ans = 0;
    for (int i = 0; i < v.size(); i++) {
        int mn = i;
        for (int j = i+1; j < v.size(); j++)
            if (v[j] < v[mn])
                mn = j;
        if (mn != i) {
            swap(v[mn], v[i]);
            ans++;
        }
    }
    return ans;
}

But here’s what I did:

bool check(vector<int>& v) {
    for (int i = 1; i < v.size(); i++)
        if (v[i-1] > v[i])
            return false;
    return true;
}
int sorting(vector<int>& v) {
    vector<pair<int, vector<int> > > bfs;
    bfs.insert(make_pair(0, v));
    for (int i = 0; i < bfs.size(); i++) {
        if (check(bfs[i].second))
            return bfs[i].first;
        vector<int> copy = bfs[i].second;
        for (int j = 0; j < v.size(); j++)
            for (int k = j+1; k < v.size(); k++) {
                swap(copy[j], copy[k]);
                bfs.push_back(make_pair(bfs[i].first+1, copy));
                swap(copy[j], copy[k]);
            }
    }
    return 0; // not reached
}

I was a little rushed and didn’t want to think through the correctness of picking the smallest every time, so I just enumerated all the possible swapping without any optimization. Each level multiplies the number of elements in the BFS vector by 10, and the number of levels is bounded by 4, so it was definitely fast enough.

The Pokemon problem

Two trainers are battling N Pokemons of their own. Each trainer has an order of battling the Pokemons, and they battle 1vs1 until their health goes to zero, then the next Pokemon in the queue take up the space. Damage is a function of parameters of the attacking and defending Pokemons. The problem is to maximize the number of Pokemons standing from trainer A at the end of the war using any permutations of the queue, given the order of trainer B’s Pokemons.

The solution is also quite trivial, simply implement the battles and use next_permutation() to enumerate through all possible permutations. However I got stuck here and couldn’t debug because I was using the names of the Pokemons in the comparator of my Pokemon struct, but those names are not unique, so I’m missing some permutations. It was such a stupid bug.

And so on…

And then there are 2 more problems of exactly the same type: literally enumerate all possible configurations. I used the same recursive call structure, so the two solutions looked alike. After solving 4 and getting stuck on Pokemons, there weren’t too much time left, so I kind of just gave up at that point.

Results

Team USC

4 problems and ranked 48th; my ICPC teammate Yuehui Wang also from USC got 7th. As you can tell, he carried my ICPC contests. It was more or less a speed contest, since I think only the last problem actually requires some algorithms (it was like a 2D matrix connectivity thing). I like the ones with a little more difficulty instead of “enumerate and take max”. Anyway, I need to get better and faster for next year’s ICPC SoCal regional.

The rest of the trip

Before heading to Bloomberg, I paid a short visit to Jane Street, since I haven’t seen the new office yet. Don’t think I’m allowed to take pictures inside, but here’s a T shirt I got from them. It has quite a number of references to JS’s favorite functional programming language OCaml.

Since it was the day before Chinese New Year, a big dinner is somehow compulsory by tradition. Sakagura is a legit Japanese place hidden in a basement somewhere.

Then I headed to Yale to visit a friend. First time at Yale, looks very much like Harvard to me; perhaps it’s an East coast thing.

The Harkness Tower and me

Last but not least, my favorite thing in New York: Luke’s Lobster.

Euler Path: 8 Lines Solution

The problem of Euler path marked a very fundamental moment in algorithm studies. When Euler posed the 7-bridge problem, there was no mathematical tool to solve it, hence he created the tool – graph theory. It sounds like how Newton invented calculus to solve his gravity problems and how Bernoulli invented calculus of variations to solve his brachistochrone problem. As I quote Sutherland,

“Well, I didn’t know it was hard.”

Problem statement

Say you have arrived in a new country with a bunch of islands and one way bridges between some pairs of islands. As a tourist you would like to visit all the bridges once and only once. Abstractly, an Euler path is a path that traverses all the edges once and only once. Is it always possible? If possible, how do we find such a path?

Existence of path

Unlike many other problems, we can determine the existence of the path without actually finding the path. There are 2 conditions for it. The first has to do with in and out degrees, and the second is about the connectivity of the graph.

For a directed graph, the in degree is the number of incoming edges to a vertex, and the out  degree is the number of outgoing edges to a vertex. Easily, if we have an Euler path, then there will be a start and end vertex. The start vertex will have an out degree – in degree of 1, the end vertex will have in degree – out degree of 1. Every other vertex will have an equal number of out degree and in degree, because if you have a different number, you obviously cannot go through all of those edges in one path. The only exception is when the start and end vertices are the same, in which case all the vertices have the same number of in and out degrees.

The second condition is that all vertices have to be weakly connected, meaning that by treating all edges as undirected edges, there exists a path between every pair of vertices. The only exception is for the vertices that have no edges – those can just be removed from the graph, and the Euler path trivially does not include them.

Given the above two properties, you can prove there is an Euler path by the following steps. First it is easy to see that if you start walking from the start vertex (out – in = 1) and removing edges as you walk through them, you can only end up at the end vertex (in – out = 1). Then, the remaining graph is full of cycles that can be visited through some vertex in the path we already have, and you just have to merge the cycles and the path to get the Euler path.

It is easier to see the other direction of the proof: if there is an Euler path, both conditions have to be met.

Implementing existence of path

As mentioned above, we code up the two conditions and check whether we can find a valid starting position, otherwise return -1. Vertices are numbered 0 to n-1. The graph is stored as an adjacency list, meaning that adj[i] has all the neighbors (edges go from i to those). So the out degree of i is adj[i].size().

int start(vector<vector<int> >& adj) {
    // condition 1: in and out degrees
    vector<int> deg(adj.size());
    for (int i = 0; i < adj.size(); i++) {
        deg[i] += adj[i].size();
        for (int x : adj[i])
            deg[x]--;
    }
    int ans = -1;
    for (int i = 0; i < deg.size(); i++)
        if ((ans != -1 && deg[i] != 0 && deg[i] != -1)
            || deg[i] > 1)
            return -1;
        else if (deg[i] == 1)
            ans = i;
    if (ans == -1)  // start and end vertices are the same 
        for (int i = 0; i < adj.size() && ans == -1; i++)
            if (!adj[i].empty())
                ans = i;
    if (ans == -1)  // there is no edge at all
        return -1;
    // condition 2: connectivity
    vector<bool> vis(adj.size());
    vector<int> bfs{ans};
    vis[ans] = true;
    for (int i = 0; i < bfs.size(); i++)
        for (int x : adj[bfs[i]])
            if (!vis[x]) {
                bfs.push_back(x);
                vis[x] = true;
            }
    for (int i = 0; i < adj.size(); i++)
        if (!vis[i] && !adj[i].empty())
            return -1;
    return ans;
}

That is slightly more clumsy that I would like it to be, but it should be clear. Basically just counting in and out degrees and running a BFS on the starting vertex.

Actually finding the path

As mentioned above in the sketch of proof, finding a path consists of the 3 steps:

  1. Walk from the start vertex, removing edges as we use them, until there is nowhere to go. Then we have a path from start to end.
  2. For the remaining edges, start at a vertex on the path and randomly walk until we go back to the same vertex. Then we have a cycle. Merge the cycle on the path. For example, if we had a path s->a->b->c->…->t and a cycle c->alpha->beta->c, we merge them to become s->a->b->c->alpha->beta->c->…->t.
  3. Repeat step 2 until there is no remaining edge.

Well, that does not sound very easy to write nor very efficient to run, if we literally implemented the above. How many lines of code would that be?

The 8 lines solution

The answer is 8. Here’s the code:

void euler(vector<vector<int> >& adj, vector<int>& ans, int pos) {
    while (!adj[pos].empty()) {
        int next = adj[pos].back();
        adj[pos].pop_back();
        euler(adj, ans, next);
    }
    ans.push_back(pos);
}

This is a very beautiful solution using recursion, in my opinion. I was quite surprised when I first saw this. Where is everything? Where is getting the main path? Where is getting the cycles? Where are we merging them?

The gist of this recursion is a post-order DFS, meaning that we visit the end of the graph first, and then backtrack. This comes from a very crucial observation: we can always be sure what could be at the end of the path, but not the front. It is important to know that the answer array is in reverse order of visit, i.e. we need to reverse it to get the Euler path. Let’s go through the stages of how the program works.

  1. Path from start to end vertex. The first time we push back is when we run out of outgoing edges, which can only be the case of the end vertex, with in-out = 1. In all other vertices, the numbers are the same so if you can go in, you can definitely go out of that vertex. Hence the first push back occurs with the end vertex, and at that point the program execution stack has the entire path.
  2. As we return from a recursive call of the function, we are essentially going back from the end to the start. If a vertex on the main path does not have any outgoing edge, we know we will visit it next, so we push it to the ans vector and return from the function.
  3. But if a vertex on the path does have an outgoing edge, that means there is at least one cycle including this vertex. Then in the next iteration of the while loop, we will visit one of the outgoing edges and start another round of recursion. Again, this recursion must end on the same vertex because it is the only one with in-out = 1. This recursion gets you a cycle and by the time it returns, the cycle would have been pushed to the ans array already, finishing the merge operation.

By studying this code, there is one interesting point to note. That is, the while loop will only be executed 0 to 2 times in any recursive call. It will only be 0 at the end vertex, and on the main path with no branches, it will be 1. On the main path with branches, it will only be 2 but not more regardless of the out degree, because surely the recursive call for cycle needs to end on that vertex but it will not end until all outgoing edges are used up. Therefore to the program, multiple cycles on one vertex is just one big cycle. On the other vertices on cycles, it works the same way whether they have branches or not.

In case you want to see how to run it, here is the main function I wrote to test it:

int main() {
    int n, q;
    cin >> n >> q;
    vector<vector<int> > adj(n);
    for (int i = 0; i < q; i++) {
        int u, v;
        cin >> u >> v;
        adj[u].push_back(v);
    }
    int s = start(adj);
    if (s == -1) {
        cout << "no path" << endl;
        return 0;
    }
    vector<int> ans;
    euler(adj, ans, s);
    reverse(ans.begin(), ans.end());
    for (int x : ans)
    cout << x << endl;
    return 0;
}

That’s it – a simple problem with a simple solution. Leetcode has one Euler path problem, and the algorithm in this blog post comes from the discussion of that problem. Everything is pretty much the same for undirected graphs – you just have to use different data structures to store the edges. The proof is mostly the same with the first condition now about odd/even number of edges at each vertex, as there is no distinction between in and out degree.

TIW: Subset Sum

Subset sum, aka the coin change problem, is a very specific problem that for some reason gets mentioned a lot. The problem is very simple: given some coins of different values, what values of items can we buy with a subset of the given coins? Sometimes the problem statement asks, can we buy an item of a certain price with our coins without change? If not, what is the minimum amount of change needed? What if we have a number of the same coins, or even an infinite number? Many of these variants can be solved with some tweaks to the same code. So let’s start with the most basic formulation: given a vector of positive integers, return a vector of all integers that are subset sums (i.e. there exists a subset in the input vector with a sum equal to that number).

There are many good ways to accomplish this task.

First attempt: iterative, boolean array

Let’s jump into the code directly:

vector<int> solution1(vector<int>& coins) {
    int total = 0;
    for (int x : coins)
        total += x;
    vector<bool> pos(total+1);
    pos[0] = true;
    for (int x : coins)
        for (int i = total; i >= 0; i--)
            if (pos[i])
                pos[i+x] = true;
    vector<int> ans;
    for (int i = 0; i <= total; i++)
        if (pos[i])
            ans.push_back(i);
    return ans; 
}

OK, so first we calculated the total values of the coins, then we know all possible subset sums are in the range [0, total]. Then we constructed a boolean vector with indices in the same range, to indicate what values we can form. The algorithm proceeds by adding a new coin every iteration. Say we used to be able to pay {0, 1, 2, 3, 4, 5, 6}, and we have a new coin of value 10. Then by adding the new coin in, we can pay {10, 11, 12, 13, 14, 15, 16}. But we can also not use the new coin and pay [0, 6], so we need to keep that in the original array. Then we combine the two.

The most important part here is the loop for (int i = total; i >= 0; i–). It counts down, because counting up will lead to a logic error. Think of it this way: you could already pay 0, and you add a new coin of value 2 in, so you set pos[2] = true. And then you count up and see pos[2] is true – you will then set pos[4] to be true, although you already used the new coin in that combination! Counting down avoids this problem because every new element you set to be true will not be visited again in this iteration.

Second attempt: iterative, set of integers

vector<int> solution2(vector<int>& coins) {
    set<int> pos;
    pos.insert(0);
    for (int x : coins)
        for (auto it = pos.rbegin(); it != pos.rend(); it++)
            pos.insert(x+*it);
    return vector<int>(pos.begin(), pos.end());
}

The basic idea is the same, but instead of indicating val as possible by setting pos[val] = true, we insert it to the set. The code is much more concise.

Third attempt: recursive, set of integers

void soln3helper(vector<int>& coins, int idx, int val, set<int>& pos) {
    if (idx == coins.size()) {
        pos.insert(val);
        return;
    }
    soln3helper(coins, idx+1, val, pos);
    soln3helper(coins, idx+1, val+coins[idx], pos);
}

vector<int> solution3(vector<int>& coins) {
    set<int> pos;
    soln3helper(coins, 0, 0, pos);
    return vector<int>(pos.begin(), pos.end());
}

This is basically DFS, if you treat whether using a coin or not as an edge and a combination of coins as a vertex in the graph, you will get a tree. At the leaf nodes of the tree, insert the sum of the combination into the set.

Why the three approaches?

Each approach has its pros and cons. The second one has the least amount of code, and is easy to understand. The first one might be marginally more efficient for small test cases, because it only uses vectors but not any sets. Note however that the boolean vector could be huge if the coins are of really big values, in which case it would be very inefficient. The third one is intriguing because there is not any branch cutting. It incidentally takes care of negative coin values, which the other solutions do not. That means our runtime will always be the worst case runtime, 2^n where n is the number of coins. But is it really that bad? Let’s compare the runtimes:

First attempt: O(n*sum)

This is obvious from the nested loops. How big is sum though? The lower bound is n, when all the numbers are 1. In that case we get O(n^2) overall runtime. There is however not an upper bound – any coin can be arbitrarily big, and in that case we are kind of screwed.

Second attempt: O(n*2^n)

In the ith loop, we have at most 2^(i+1) items in the set. Looping through them and all insertions cost O(2^(i+1)*log(2^(i+1))) = O(i*2^i). Summing from i = 0 to i = n-1, we can see that each later term is larger than the previous term by 2 times at least, so we can drop all the front terms and only take the last, and we get the answer.

Third attempt: O(2^n)

Very simple: on each level i, we have 2^i edges coming out, i.e. O(2^i) work. Summing them gives O(2^n) by the same argument.

From the runtime analyses, you can see either the first or the third algorithms could be optimal for different inputs. The second, however, is never asymptotically optimal. Even in the case of all 1s, it runs in O(n^2*log(n)). Hence it is most useful when you don’t really need that much performance but prefer pretty code instead (say, during an interview).

Fourth attempt: iterative, integer array

vector<int> solution4(vector<int>& coins) {
    vector<int> pos{0};
    for (int x : coins) {
        vector<int> newpos, next;
        for (int p : pos)
            newpos.push_back(p+x);
        int l = 0, r = 0;
        while (l < pos.size() || r < newpos.size()) {
            if (l == pos.size() ||
                (r != newpos.size() && pos[l] >= newpos[r])) {
                if (next.empty() || next.back() != newpos[r])
                    next.push_back(newpos[r]);
                r++;
            } else {
                if (next.empty() || next.back() != pos[l])
                    next.push_back(pos[l]);
                l++;
            }
        }
        pos.swap(next);
    }
    return pos;       
}

This is clearly a more elaborate approach to the problem. What we’re trying to achieve here is an O(2^n) algorithm that has a best case scenario of O(n^2), in the case of all 1s as input array. This is built mostly upon the second attempt, and we’re trying to eliminate the use of a set, since it inevitably gives us an extra time complexity factor. The way to accomplish this is to store the new values in a new vector, and merge the two vectors like in merge sort but removing all duplicates. In the ith iteration, we must only have O(2^i) amount of work, and summing through all i’s gives O(2^n).

Multiple coins

That’s about it, but one last discussion about multiple coins or infinite coins. Say we have the coins {3, 4, 7, 19} and we want to pay for an item of price 25, can we do it? Yes, 3+3+19 = 25. How do we do this? There are a few approaches, but here is one of them. I will build it on attempt 2:

bool multipleCoins(vector<int>& coins, int val) {
    set<int> pos;
    pos.insert(0);
    for (int x : coins)
        for (auto it = pos.rbegin(); it != pos.rend(); it++)
            for (int v = x+*it; v <= val; v += x)
                pos.insert(v);
    return pos.count(val);
}

Instead of pushing in only one value, we keep pushing in more values with more coins of the same type until we go over the desired value. Of course we can immediately return true if we find val, which could potentially save a lot of work. But this is good enough as an example to show how to modify the above versions to account for multiple coins.

Brief Intro to Segment Tree

Something I just learned – segment tree, is a data structure more advanced and generalized than binary indexed tree. Even though I just learned it and might not be qualified to discuss it yet, I’m pretty excited so who cares. So here’s an intro to segment tree from a noob point of view. Most of the content comes from this Codeforces blog by Al.Cash, but that blog assumes more prior knowledge and I will attempt to explain it from scratch.

Motivation

Member binary indexed tree? I member. For range sum query, BIT supports logarithmic time complexity for updating an element and querying the sum of any range. The way querying from l to r was done was a 2-phase process: first find the prefix sum of l-1 and r, then subtract the two. Pretty straight forward.

But what if we want to take minimum or maximum of the range, instead of just taking the sum? With BIT, we can only query the range from the beginning to a certain point. It is trivial really, refer to the code below:

// binary indexed tree for taking maximum of range [0, k)
void update(int v, int k, vector<int>& bit) {
    for (k++; k < bit.size(); k += k&(-k))
        bit[k] = max(bit[k], v);
}
int query(int k, vector<int>& bit) {
    int ans = 0;
    for (k++; k > 0; k -= k&(-k))
        ans = max(ans, bit[k]);
    return ans;
}

There are a few limitations with this approach. First, you can only update an element to a bigger integer, because there is no reverse operation for maximum like subtraction for addition. Second, you can only take maximum from the beginning to a certain element but not an arbitrary range. With summation that’s not a problem, as we can subtract sums of different ranges to get partial sums. But with taking minimum, you can’t just subtract minima can you?

Therefore, we need a data structure that is like BIT, but supports true range query, instead of prefix range query.

Basic specs

  1. Segment tree is like a binary indexed tree on steroids, all problems solved by BIT can be solved by segment tree.
  2. It takes 2 times the space of the original array (double of BIT).
  3. You can update any element of the original array in O(log(n)) time (same as BIT).
  4. You can look up any element of the original array in constant time (BIT takes log(n)).
  5. The original array can be of any data type (same as BIT).
  6. You can perform a range query of a certain associative function in O(log(n)) time (BIT can only make prefix range queries).

The last point is important. As far as a function is associative, you can use it with ST. It does not have to be commutative or reversible. A few examples:

  1. Addition. Just like BIT, given {1, 2, 3, 4, 5}, you can sum {2, 3, 4} in log(n) time, for example.
  2. Maximum. Given {1, 2, 3, 4, 5}, you can take maximum of {2, 3, 4} in log(n) time, for example.
  3. Matrix multiplication. Given {A1, A2, A3, A4, A5} of five matrices, you can get the product {A2, A3, A4} in log(n) time.

Associativity: say you have 3 variables a, b and c, and a function f that operates on two parameters. Then f(f(a, b), c) = f(a, f(b, c)) iff f is associative. Note that commutativity does not follow, f(a, b) does not necessarily equal f(b, a).

Associativity is important because for a range {a, b, c, d, e}, you can precompute {a, b} and {c, d, e}, then combine the two results.

Basic idea

Let’s borrow the sample from BIT: {1, 2, 3, 4, 5, 6, 7, 8}. Here’s a graph of what segment tree stores:

screen-shot-2017-01-01-at-2-36-30-pm

For comparison, here’s a graph of what binary indexed tree stores:

screen-shot-2017-01-01-at-2-39-26-pm

A few observations:

  1. ST is simply BIT with the whole table filled in, without any blanks.
  2. st[8..15] is our original array. In the second half of the ST array, we always store the original array.
  3. st[i] = st[i*2] + st[i*2+1]. Node i/2 is node i’s parent.
  4. To update any element changes the same number of nodes in the tree.
  5. We can sum up ranges directly. For example {3, 4, 5, 6, 7} = st[5]+st[6]+st[14].

It is easy to see that ST is an extension of BIT and supports direct range queries while BIT doesn’t. The only thing left now is how to actually implement it.

Code

In general, the function does not have to be addition of integers, so I will abstract it to any function that takes 2 structs and returns a struct.

struct data {
    int num;
    data() {
        num = 0;
    }
    data(int n) {
        num = n;
    }
};
data merge(data a, data b) {
    return data(a.num+b.num);
}

For simplicity I still made it a wrapper of addition of numbers. You can make it any function with any data, either minimum of long longs, multiplication of matrices etc.

Then, given an array, we need to build the segment tree.

vector<data> build(vector<data>& v) {
    int n = v.size();
    vector<data> st(n*2);
    for (int i = 0; i < v.size(); i++) st[i+n] = v[i]; for (int i = n-1; i > 0; i--)
        st[i] = merge(st[i*2], st[i*2+1]);
    return st;
}

Technically we do not need a build function if we have an update function, just like with BIT. We can just update the entries one by one. But that would take n*log(n) time, while this function is O(n), so this is not entirely useless.

The build function takes in the original array and makes an array double its size. Then, we copy the array to the second half of the tree. Constructing the tree is a bottom-up procedure, each time calculating the new sum from the two lower nodes (from observation 3). That’s it.

The update function is even simpler.

void update(data v, int k, vector<data>& st) {
    int n = st.size()/2;
    st[k+n] = v;
    for (int i = (k+n)/2; i > 0; i /= 2)
        st[i] = merge(st[i*2], st[i*2+1]);
}

It is a 4-line function. Technically you can make it 1 line, but why?? Here, we first update the entry in the second half of the tree, then we go to its parent iteratively by dividing by 2, until you hit 1, the root.

Now the actual hard part: queries. Given the range [l, r) from l to r-1, query the “sum” (could be product or any arbitrary function) of the range.

data query(int l, int r, vector<data>& st) {
    int n = st.size()/2;
    data ansl, ansr;
    for (l += n, r += n; l < r; l /= 2, r /= 2) {
        if (l%2 == 1) {
            ansl = merge(ansl, st[l]);
            l++;
        }
        if (r%2 == 1) {
            r--;
            ansr = merge(st[r], ansr);
        }
    }
    return merge(ansl, ansr);
}

I will only try my best to explain, but I will not go through everything in fine detail because it is too tedious. Let’s say we want to sum {2, 3, 4, 5, 6}, i.e. l = 1, r = 6 (because v[1] = 2, v[6-1] = 6). First we add n to both l and r, so we have l = 9, r = 14. Refer to the above graph, our goal is to sum st[9], st[5] and st[6]. In fact, in the first loop we will pick up st[9], because l is an odd number. After we add it to the left sum, we add 1 to l to denote that we have already added numbers in this subtree, so we have a smaller range of numbers to add. In the next loop, l and r are divided by 2 to go up a level in the tree, and become 5 and 7. Now both numbers are odd, and we will merge st[5] and st[6] to the left sum and right sum respectively. In the next iteration, l and r are both 3, meaning that the range to sum is empty, and the loop breaks.

To be honest, I don’t 100% understand why the conditions are l and r are odd. The idea could be that if l is even, that means both l and l+1 are in the range, and we would rather add the number at l/2 since it includes both, therefore we do not do anything when l is even. The only exception when not both l and l+1 are in the range is when r = l+1, but that means r is odd and we will add st[r-1], which is st[l]. Otherwise if l is odd, we might as well eliminate this subtree by adding st[l] and moving l to l+1. Everything should be mirrored and r should be checked to be even, but r is not included in the range [l, r), so the actual end point is r-1, so we check whether it is odd. Just like with low bits in BIT, you don’t actually need to understand it to use it; I bet you can’t implement a red black tree either but you still use set<> like an algorithm master anyway.

Important point: There are two ans variables, ansl and ansr, and they take sums from both parts respectively. This is to maintain the order of computation, in case the merge function is not commutative. In this case however it does not make any difference.

Array of arbitrary size

The above is explained with an array size of 8, which was sort of cheating, because you will most likely want an array of arbitrary size. Of course, you can add padding zeros at the end to make it a power of two.

n = v.size();
while (n != lowbit(n))  // n&(-n)
    n += lowbit(n);
v.resize(n);

That would do. This is because when n is a power of two, its low bit is itself. However this is not even needed; the original code, although designed for powers of two, will work for any vector size n. There is a short explanation on the original blog, but the full proof should be too complicated and not useful to know. We only need to know that it automagically works for any n, and happily copy paste code.

Sample: matrix multiplication

Build, update and query are copy pasted, so they are omitted in the sample.

struct data {
    int m, n;
    vector<vector<int> > A;
    data() {
        m = 0;
        n = 0;
    }
    data(const vector<vector<int> >& B) {
        A = B;
        m = A.size();
        n = a[0].size();
    }
    void print() {
        for (auto r : A) {
            for (auto c : r)
                cout << c << " ";
            cout << endl;
        }
    }
};
data merge(data a, data b) {
    if (a.m*a.n == 0)
        return b;
    if (b.m*b.n == 0)
        return b;
    int m = a.m, n = b.n, l = a.n;
    vector<vector<int> > C(m, vector<int>(n));
    for (int i = 0; i < m; i++)
        for (int j = 0; j < n; j++)
            for (int k = 0; k < l; k++)
                C[i][j] += a.A[i][k]*b.A[k][j];
    return data(C);
}
vector<data> build(vector<data>& v) {...}
void update(data v, int k, vector<data>& st) {...}
data query(int l, int r, vector<data>& st) {...}
int main() {
    vector<data> v;
    v.push_back(data({{2, 0}, {0, 2}}));
    v.push_back(data({{1, 1, 4}, {4, 2, 2}}));
    v.push_back(data({{-2, 0}, {1, 4}, {1, 2}}));
    v.push_back(data({{0, 1}, {1, 0}}));
    vector<data> st = build(v);
    int l, r;
    while (cin >> l >> r)
        query(l, r, st).print();
    return 0;
}

You can mostly just copy paste the three functions and modify the data and merge definitions to fit your applications. I was motivated to study segment trees because of this problem on Codeforces. I was not able to do this problem during the contest, but neither could tourist, so whatever.

New Year and Old Subsequence

This is a slightly more advanced application of segment tree. The problem: given a string of digits, return the minimum number of digits that need to be removed such that there is a subsequence of “2017” but not “2016”. If 2017 is not a subsequence, print -1. More precisely, after a string of length up to 200,000 characters is given, there are up to 200,000 queries of the range l to r, and we need to answer what is the minimum number of digits to remove such that the sequence [l, r] has a subsequence of “2017” but not “2016”. The algorithm is described here. The gist is that for an interval of a string, all that we need to know is given we already have a certain prefix of 2017, how many digits do we need to erase in this interval so that we will have a longer prefix of 2017. For example, if the current digit is 6, then given “201” or “2017”, we will have to erase one digit (the 6) to ensure we have a prefix of “201” or “2017” without any “2016”. Otherwise the 6 doesn’t matter. Here’s my code.

struct data {
    unsigned int dp[5][5];
    data() {
        for (int j = 0; j < 5; j++)
            for (int i = 0; i <= j; i++)
                dp[j][i] = INT_MAX;
    }
    void clear() {
        for (int i = 0; i < 5; i++)
        dp[i][i] = 0;
    }
};

data merge(const data& a, const data& b) {
    data temp;
    for (int j = 0; j < 5; j++)
        for (int i = 0; i <= j; i++)
            for (int k = i; k <= j; k++)
                temp.dp[j][i] = min(temp.dp[j][i], a.dp[k][i]+b.dp[j][k]);
    return temp;
}

int main() {
    int n, q;
    string s;
    cin >> n >> q >> s;
    vector<data> st(2*n);
    for (int i = 0; i < n; i++) {
        st[i+n].clear();
        if (s[i] == '2') {
            st[i+n].dp[0][0] = 1;
            st[i+n].dp[1][0] = 0;
        } else if (s[i] == '0') {
            st[i+n].dp[1][1] = 1;
            st[i+n].dp[2][1] = 0;
        } else if (s[i] == '1') {
            st[i+n].dp[2][2] = 1;
            st[i+n].dp[3][2] = 0;
        } else if (s[i] == '7') {
            st[i+n].dp[3][3] = 1;
            st[i+n].dp[4][3] = 0;
        } else if (s[i] == '6') {
            st[i+n].dp[3][3] = st[i+n].dp[4][4] = 1;
        }
    }
    // build segment tree
    for (int i = n-1; i; i--)
        st[i] = merge(st[i<<1], st[i<<1|1]);
    while (q--) {
        int l, r;
        cin >> l >> r;
        // query from l-1 to r
        data ansl, ansr;
        ansl.clear();
        ansr.clear();
        for (l += n-1, r += n; l < r; l >>= 1, r >>= 1) {
            if (l&1) {
                ansl = merge(ansl, st[l]);
                l++;
            }
            if (r&1) {
                r--;
                ansr = merge(st[r], ansr);
            }
        }
        int ans = merge(ansl, ansr).dp[4][0];
        cout << (ans == INT_MAX ? -1 : ans) << endl;
    }
    return 0;
}

If you paid attention to the code, you will see I embedded the build and query functions in the main function. Also you will notice I used integer array in C style instead of vectors in C++ style. There are also changes in details such as replacing *2 by <<1 (left shift) and +1 by |1 (bitwise or). I hate to do this, but the judge on Codeforces is very demanding and my normal coding style got TLE.