TIW: Range Sum

2016-10-25 4 minute read

Prefix array sum is a useful tool that shows up often. It achieves the following:

For a given array of n integers that will not change over the course of program execution, precomputation runs in O(n).
After the precomputation step, querying the range sum (summation of v[l..r]) is O(1).

Here’s how you do it in code:

vector<int> v{1, 2, 3, 4, 5};
vector<int> s{0};
for (int x : v)
    s.push_back(s.back()+x);
int i = 2, j = 4;
cout << "3+4+5 = " << s[j+1]-s[i] << endl;

There are two things to note: first, the size of s, our prefix sum array, is one larger than our original array, and the indices are shifted by one; second, in all summation problems, always think about overflow and whether you need to use long long instead of int.

For those who haven’t seen this before, s[i] is the sum of integers from v[0] to v[i-1]. We’re essentially generating the sums of {}, {1}, {1, 2}, {1, 2, 3}, {1, 2, 3, 4} and {1, 2, 3, 4, 5}, and say we want to know the sum of {3, 4, 5}, you just use sum {1, 2, 3, 4, 5} to substract sum {1, 2}.

The trick here that I want to bring out is the initial value in the array, 0. Without the 0, the code will look like this:

vector<int> v{1, 2, 3, 4, 5};
vector<int> s{v[0]};
for (int i = 1; i < v.size(); i++) {
    s.push_back(s.back()+v[i]);
int i = 2, j = 4;
cout << "3+4+5 = " << s[j]-(i > 0 ? s[i-1] : 0) << endl;

OMG, no. Please don’t do this. This is actually all it takes to solve Range Sum Query - Immutable:

class NumArray {
private:
    vector<int> s;
public:
    NumArray(vector<int> &nums) {
        s.push_back(0);
        for (int x : nums)
            s.push_back(s.back()+x);
    }
 
    int sumRange(int i, int j) {
        return s[j+1]-s[i];
    }
};

It’s just my personal preference to put class members and methods as private. Everything is the same as above, nothing to see here.

A slightly harder version is doing this in 2 dimensions, but it’s merely putting the above code in some loops and doing it over and over again. Anyway, let’s look at this problem first:

Range Sum Query 2D - Immutable

Given a 2D array, compute summation of a rectangle of the array in O(1) time after precomputation.

So there are two parts to explain: what our “2D prefix sum array” looks like, and how to compute it in linear time.

In 1D, we have one index, i, for each number, and we sum everything from index 0 to index i-1. Similarly in 2D, we have two indices, i, j, and we sum everything 0 <= i’ < i, 0 <= j’ < j. With the same shift-by-one convention, our notation will be: s[i][j] = sum {v[0..i-1][0..j-1]}. To calculate the sum of numbers in the rectangle v[r1..r2][c1..c2], the equation will be: s[r2+1][c2+1] - s[r1][c2+1] - s[r2+1][c1] + s[r1][c1]. To understand why, say if a matrix looks like this:

[ A ] [ B ]
[ C ] [ D ]

Then the sum of D, {D} = {A+B+C+D} - {A+B} - {A+C} + {A}. Therefore there are 4 terms in the expression.

Now down to the second part: how to calculate it in linear time. It’s the same relation, but reversed: {A+B+C+D} = {D} + {A+B} + {A+C} - {A}. In other words, s[i][j] = v[i][j] + s[i-1][j] + s[i][j-1] + s[i-1][j-1].

vector<vector<int> > v{ {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
int r = v.size(), c = v[0].size();
vector<vector<int> > s(r+1, vector<int>(c+1));
for (int i = 1; i <= r; i++)
    for (int j = 1; j <= c; j++)
        s[i][j] = v[i-1][j-1]+s[i-1][j]+s[i][j-1]-s[i-1][j-1];
int r1 = 1, r2 = 2, c1 = 0, c2 = 2;
cout << "sum { {4, 5, 6}, {7, 8, 9} } = "
     << s[r2+1][c2+1]-s[r1][c2+1]-s[r2+1][c1]+s[r1][c1] << endl;

As a remark, with some more caution you can do this without any extra space; just store everything in the original array. However you will need to worry about the case about the first item in each 1D array in that case, since you won’t have the zeros padding your matrix to prevent segmentation fault.

The idea of padding a matrix with dummy items is also useful in 2D BFS sometimes, but we can talk about that later.

With some adaptation, this is the code that gets accepted:

class NumMatrix {
private:
    vector<vector<int> > s;
public:
    NumMatrix(vector<vector<int> > &matrix) {
        int r = matrix.size();
        if (r == 0)
            return;
        int c = matrix[0].size();
        s = vector<int>(r+1, vector<int>(c+1));
        for (int i = 0; i < r; i++)
            for (int j = 0; j < c; j++)
                s[i+1][j+1] = matrix[i][j] + s[i][j+1] + s[i+1][j] - s[i][j];
    }
 
    int sumRegion(int row1, int col1, int row2, int col2) {
        return s[row2+1][col2+1]-s[row1][col2+1]-s[row2+1][col1]+s[row1][col1];
    }
};

Few problems will be so straightforward to apply this approach, but it is often useful as a subroutine in other problems, to reduce the run time complexity. For example this: Count of Range Sum. This is because querying the sum naively is linear in the number of items in the summation, but with this precomputed array we can achieve constant time querying.

That’s basically it; you can call this a data structure with O(n) update and O(1) query, where update means updating an entry in the original matrix, and query means calculating the sum over a rectangle. O(1) update and O(n) query is trivial; you just use the original array for this, and run for loops to calculate the sum every time you want it. But if you want something in between, there’s actually a magical thing called binary indexed tree, which is quite advanced and I will cover in a (very) later post. It achieves O(log(n)) update and query, in somewhere around 10 lines of code.