TIW: Monotonic Stack

I actually made this name up; I don’t know what it is called. This is how the Chinese call it apparently, 單調棧. It sometimes comes handy when you need O(n) performance in a problem that seemingly needs more than that and which requires finding a number.

Basic facts about monotonic stacks:

  1. It is a kind of precomputation, just like precomputing the prefix sum array.
  2. It is O(n), so it almost never hurts your runtime, just like prefix sum array.
  3. As the name suggests, it is a method that uses a stack (implemented using a vector) that has all elements sorted.
  4. It is used to find the next (or previous) element in the array that is larger (or smaller) than the current element.

Let’s look at the code:

vector<int> lastBigger(vector<int> v) {
    vector<int> ans(v.size()), mono;
    for (int i = 0; i < v.size(); i++) {
        while (!mono.empty() && v[mono.back()] <= v[i])
        ans[i] = mono.empty() ? -1 : mono.back();
    return ans;

This piece of code finds the index of the closest previous element that is larger than the current one. For example, for an input vector {1, 5, 4, 2, 3}, it will return {-1, -1, 1, 2, 2}. The index is stored instead of the actual element, because it is more informative and we could look up the actual values from the indices.

The algorithm is based on the idea that to find the last bigger value, we do not need to look at all the previous values. If we go farther to the left, it is because we want a larger value. For example, on the above array, at the last element, we only need to look at the values {5, 4, 2} but not the value 1, because taking 5 is strictly better than taking 1. And after the element 3, 2 will be useless to all future numbers, if there is any. Therefore we keep popping the stack until we find something that is potentially useful for later.

To find a smaller instead of bigger value, simply change <= to >=. To find the next instead of the previous, just start looping from the end to the front.

I have seen this technique come up from time to time from seemingly unrelated problems. I will go through 3 problems.

Sliding Window Maximum

The first problem: given an integer array v of size n and an integer k, return an integer array w of size n-k+1 where w[i] = max(v[i], v[i+1], …, v[i+k-1]).

The most naive thing to do is to run a loop of k times for each element in w, taking the maximum for each value. An obvious observation is that each time we are only moving the window by one index, so the maximum often remains unchanged. So a slight speedup would be to check if the element we are discarding by moving the window is the previous maximum, if not, then we can take the max of the previous maximum and the current value to update the current maximum.

But we are still at worst case O(nk), because if we always discard the maximum, then we need to update the maximum again by looping. Here’s how monotonic stack will compress the runtime to O(n). For each element in w:

  1. Maintain a variable, idx, that stores the index of the maximum element in the previous window.
  2. Move the window to the right by one.
  3. If idx stays in the window, and the new value is larger than the one pointed by idx, update idx to the new index.
  4. If idx stays in the window, and the new value is not larger, then idx remains unchanged.
  5. If idx falls out of the window, first move idx to the right by one. Now it might not be the largest anymore, so we look for the next bigger element. This is O(1), as precomputed by monotonic stack. While the next bigger element is in the window, update idx to point to that element.

The runtime of this algorithm is O(n) because idx and the window only move to the right, and we never go backwards.

vector<int> maxSlidingWindow(vector<int>& nums, int k) {
    if (nums.empty())
        return {};
    vector<int> nextBigger(nums.size(), INT_MAX), mono;
    for (int i = nums.size()-1; i >= 0; i--) {
        while (!mono.empty() && nums[mono.back()] <= nums[i])
        if (!mono.empty())
            nextBigger[i] = mono.back();
    int pt = 0;
    vector<int> ans;
    for (int i = k-1; i < nums.size(); i++) {
        if (pt == i-k)
        while (nextBigger[pt] <= i)
            pt = nextBigger[pt];
    return ans;

There is a 2D version of this problem: given a 2D array v of size n by n and an integer k, return a 2D array w of size n-k+1 by n-k+1 where w[i][j] = max(v[i][j], v[i][j+1], …, v[i][j+k-1], v[i+1][j], …, v[i+k-1][j+k-1]). It is in fact very easy, just run this algorithm twice, first horizontally then vertically. It will be O(n^2), which is optimal as input itself would be O(n^2).

Largest Rectangle in Histogram

This one, although categorized as hard, is fairly straightforward: enumerate elements and use each one as the height, then look for the previous one and the next one that are smaller. Therefore we need to apply monotonic stack twice.

int largestRectangleArea(vector<int>& heights) {
    if (heights.empty())
        return 0;
    vector<int> lastSmaller(heights.size()), nextSmaller(heights.size()), mono;
    for (int i = 0; i < heights.size(); i++) {
        while (!mono.empty() && heights[mono.back()] >= heights[i])
        lastSmaller[i] = mono.empty() ? -1 : mono.back();
    for (int i = heights.size()-1; i >= 0; i--) {
        while (!mono.empty() && heights[mono.back()] >= heights[i])
        nextSmaller[i] = mono.empty() ? heights.size() : mono.back();
    int ans = 0;
    for (int i = 0; i < heights.size(); i++)
        ans = max(ans, heights[i]*(nextSmaller[i]-lastSmaller[i]-1));
    return ans;

On an irrelevant note, some people like to condense code that should be multiple lines into one line, like squeezing return 0, i++ etc into the previous line. There is no way to be consistent about this, because sometimes the lines get too long and they need to split into two lines. Inconsistent coding style is a sin. It leads to bugs and confusion. Code should be made as short as possible, but no shorter.

132 Pattern

The last problem is fairly new. For these problems, usually we need to enumerate something. In this case, we enumerate the “2”. The algorithm is simple: for each “2”, find “3” by greedy algorithm, and see whether we can find a “1”. The greedy way for “3” is to find the number larger than “2” that is closest to the left of it, which is essentially the last bigger element. Checking whether we can find a “1” is simply finding the range minimum from the left up to “3” and checking whether it is smaller than “2”.

bool find132pattern(vector<int>& nums) {
    if (nums.size() < 3)
        return false;
    vector<int> mins{nums[0]};
    for (int i = 1; i < nums.size(); i++)
        mins.push_back(min(nums[i], mins.back()));
    vector<int> mono, lastbigger(nums.size());
    for (int i = 0; i < nums.size(); i++) {
        while (!mono.empty() && nums[mono.back()] <= nums[i])
        lastbigger[i] = mono.empty() ? -1 : mono.back();
    for (int i = nums.size()-1; i > 1; i--)
        if (lastbigger[i] > 0 && mins[lastbigger[i]] < nums[i])
            return true;
    return false;

Now that you have seen the above, you should be able to solve Maximal Rectangle with minimum effort.


Breadth First Search and Depth First Search are both essential skills to passing medium problems. If you’ve been following along, I’ve already written BFS once in STL#2. Depending on the problem, there are usually two ways I would go about writing BFS, and also two ways for DFS. But before I go on with solving problems, let me point out a few basic facts:

  1. BFS is good for finding shortest paths on a graph with no edge weight, or just traversing the whole graph in general.
  2. DFS is good for traversing the whole graph too, especially trees.
  3. DFS could be have time complexity issues depending on the graph you are walking, in which case it might be replaceable by BFS/Dijkstra.
  4. DFS with recursion uses the program stack and might overflow much easier than dynamically allocating memory (using vector, for example).
  5. There are probably many more use cases of BFS and DFS than you think, on things you would not think of as a graph.

I have a feeling this is going to be long. So here are the key points I will demonstrate:

  1. BFS on 2D array
  2. BFS layer by layer
  3. DFS on tree, iterative and recursive
  4. DFS as brute force search

BFS on 2D array

Let’s start with a standard problem for BFS: Number of Islands. Given a 2D matrix with ‘1’s and ‘0’s, count the connecting pieces of ‘1’s. The idea: go through each point, if it is land and we haven’t seen it, walk the whole island and mark each land as ‘0’ we see.

bool inboard(int x, int y, int n, int m) { // whether a point (x, y) is in the board with size n times m
    return x >= 0 && x < n && y >= 0 && y < m;

int numIslands(vector<vector<char>>& grid) {
    if (grid.empty())
        return 0;
    int m = grid.size(), n = grid[0].size();
    int ans = 0;
    for (int i = 0; i < m; i++) {
        for (int j = 0; j < n; j++) {
            if (grid[i][j] == '1') {
                queue<pair<int, int> > bfs;
                bfs.push(make_pair(i, j));
                grid[i][j] = '0';
                while (!bfs.empty()) {
                    pair<int, int> pos = bfs.front();
                    for (auto& v : vector<vector<int> >{{1, 0}, {0, 1}, {-1, 0}, {0, -1}}) {
                        int next_i = pos.first+v[0], next_j = pos.second+v[1];
                        if (inboard(next_i, next_j, m, n) && grid[next_i][next_j] == '1') {
                            grid[next_i][next_j] = '0';
                            bfs.push(make_pair(next_i, next_j));
    return ans;

The inboard() function is often useful in 2D array graph problems. Notice the line:

for (auto& v : vector<vector<int> >{{1, 0}, {0, 1}, {-1, 0}, {0, -1}})

This for loop will give you four directions along which to go, and you just need to add v[0] to your current position’s row index and v[1] to column index to get to the next position. Stop copy pasting code 4 times, please.

A queue<> has 3 functions we need: pop, push and front. If you need a queue that you can access from both sides, you should consider using deque<>, which has [push/pop]_[front/back], and also front() and back(). Everything is of course O(1), so you don’t have to worry.

It is important to note that in the above problem, there is no need to walk the graph in any particular order as far as you get the job done, so I did not mark the distance to each point. A DFS approach would also have worked and might even be preferred. In a DFS recursion approach, the extra memory will be allocated on the program stack instead of the heap, which is quite advantageous in terms of efficiency.

BFS layer by layer

Sometimes doing things layer by layer will make life easier. See Maximum Depth of Binary Tree:

int maxDepth(TreeNode* root) {
    if (!root)
        return 0;
    vector<TreeNode*> bfs{root};
    int ans;
    for (ans = 0; !bfs.empty(); ans++) {
        vector<TreeNode*> next;
        for (TreeNode* node : bfs)
            for (TreeNode* c : {node->left, node->right})
                if (c)
        bfs = move(next);
    return ans;

This is essentially the same approach I used for the problem in STL#2. DFS would also work for this problem, and might perform better in terms of space for some test cases, since space for a recursive approach would be O(height) instead of O(width). Anyway I still included this problem here for the sake of completeness. Noteworthy: (1) I made a for loop since I want to get the height, otherwise we can write a while (!bfs.empty()) loop; (2) there is no need to keep track of the visited nodes because we know for sure we will not visit the same node twice.

DFS on tree, iterative and recursive

While BFS is always written using a queue/vector, DFS uses a stack, so we could use either vector or recursion to accomplish the same goal.

I will redo the Maximum Depth problem twice to illustrate the idea:

// Iterative DFS
int maxDepth(TreeNode* root) {
    if (!root)
        return 0;
    vector<pair<TreeNode*, int> > dfs{make_pair(root, 1)};
    int ans = 0;
    while (!dfs.empty()) {
        pair<TreeNode*, int> back = dfs.back();
        ans = max(ans, back.second);
        for (TreeNode* c : {back.first->left, back.first->right})
            if (c)
                dfs.push_back(make_pair(c, back.second+1));
    return ans;
// Recursive DFS
int maxDepth(TreeNode* root) {
    return root ? 1+max(maxDepth(root->left), maxDepth(root->right)) : 0;

As you can see, recursion on trees is black magic. I mean, what’s shorter than a one liner? Whenever applicable, it often shortens the code by a great margin, and present the logic in a very clear and concise way.

DFS with recursion is great for tree problems that are self-similar. For example, the height of this tree is the height of the left subtree + 1, or the right subtree + 1, whichever is larger.

  1. Use recursion if you are going to DFS. Except for some Leetcode problems that test on memory limits, I have not found any reason why iterative DFS is superior. Recursion runs on stack while vector dynamically allocate memory in the heap, so in that particular case vector has less limitations on the space constraint.
  2. In recursion, first handle all base cases, then implement the recurrence relation (well… what else are you supposed to do anyways?)
  3. The essential problem here is to determine in order to solve the original problem, what information do you need for a sub-problem. Sometimes you need extra information from your parent or children to solve a task.

To illustrate the third point above, I’m going to do a slightly more complicated one (not a one-liner): given a TreeNode*, determine whether the tree is a binary search tree. This is a medium problem on LeetCode.

pair<int, int> helper(TreeNode* root, bool& ans) {
    int l = root->val, r = root->val;
    if (root->left) {
        pair<int, int> left = helper(root->left, ans);
        if (left.second >= root->val)
            ans = false;
        l = left.first;
    if (root->right) {
        pair<int, int> right = helper(root->right, ans);
        if (right.first <= root->val)
            ans = false;
        r = right.second;
    return make_pair(l, r);
bool isValidBST(TreeNode* root) {
    if (!root)
        return true;
    bool ans = true;
    helper(root, ans);
    return ans;

The recurrence relation here is that the largest element in the left subtree must be smaller than the current node and the smallest element in the right subtree must be larger than the current node.

DFS as brute force search

Sometimes some problems require brute force enumeration of all possibilities. Then, recursive DFS would be a good approach. Let’s look at Decode String: given a string in the recursive format: #[x] where # is a number and x is an string, print x multiple times as denoted by #. For example “2[abc]3[cd]ef” becomes “abcabccdcdcdef”.

There is not much to optimize in this problem, since the runtime is lower bounded by the length of the output. The algorithm is basically running each recursive level of helper() as the same bracket level of the original string, passing by reference the current index in the string as we go along.

string helper(string& s, int& i) {
    string ans;
    while (i < s.length() && s[i] != ']') {
        if (s[i] >= 'a' && s[i] <= 'z') {
            ans += s[i];
        } else {
            int k = 0;
            for (; s[i] != '['; i++)
                k = k*10+s[i]-'0';
            string ret = helper(s, i);
            for (int j = 0; j < k; j++)
                ans += ret;
    return ans;
string decodeString(string s) {
    int i = 0;
    return helper(s, i);

There are many applications of DFS brute force, but they could be very different from each other and hard for me to generalize. So this is only a short description of what you can do with DFS.


Use BFS or recursive DFS for 2D board traversal; use vectors or queues for BFS; use recursion for trees almost always; use recursive DFS to brute force.

EE354 Project: Minesweeper in Verilog

Finished a project a few days ago for EE354 at USC: Introduction to Digital Circuits. The final project for the class requires us to write a program in verilog to implement whatever we want, as far as it’s not too trivial. So I took a few days and finished this minesweeper game. The source code, with the compiled programming file, can be downloaded here. It’s not too difficult; in case you are going to work on one as well, here are a few remarks.

Hardware/Software Specifications

This project is created for the Nexys 3 board using Xilinx ISE 14.7, and programming is done using the software Diligent Adept. If you want to try it out, you can program the bit file: ms_top.bit onto your board. To play, connect the board to a VGA monitor. Control is done with the 5 buttons on the board and a switch, and obviously you look at the monitor to play. Additional debugging output on the LEDs.

How To Play

Up, down, left, right buttons to move your current selected grid, and press the center button to click on the grid. There is no support for flagging a mine, because there are only 5 buttons and I don’t wanna complicate things further. Keep the switch 0 turned off. On the left is a game board with 16 x 16 grids and 40 hidden mines, on the right you can see a smiley face to indicate the status of the game: playing, lost or won, just like the game on a Windows machine. There is also a timer that counts in seconds.


Compiling the project takes about ten minutes each time, so during that time I usually play the old version “to find bugs”.

To display the digits, I created a 30 x 30 bitmap for each digit, as well as the explosive. That takes a while.

The smiley face, however, is done by equations (or inequalities) and not bitmaps, since the bitmaps would be huge otherwise.

The VGA display code is modified from a code template. The timing generator should be the same across all Verilog VGA projects for this class of FPGA boards.

Once I encountered an error saying “your design does not fit in this device”, so I pulled a combinational block of code out of the display module and made a separate small module for it, and instantiated it in the display module. That worked, I have no idea why.

Randomization is done by first creating a pseudorandom sequence. Here I did a x1 = a * x0 + b approach to generate this sequence. Then each time I initialize 40 mines on the top left corner, and randomly swap the states of 2 grids 1000 times. That gives a satisfactory result.

Last but not least, the effect that opens up an entire connected empty area is done naively. In a normal CS approach, I would have created a BFS or DFS walk to flood the neighbors, but such a thing is more complicated in an EE approach, since instead of data structures like vectors or recursive functions, we only have wires, registers and modules. The way I did it was that during each clock, I would go through all the grids, and check whether the grid should be cleared based on its neighbors. The trick here is that this looping over all grids is done in parallel, so doing it once takes the same amount of time as doing 256 times.

Some other groups in the past have done some pretty awesome work, like one group did a maze that presented the walls and the ground in 3D perspectives, and another did a 2 player bomberman. If you try hard enough, this final project could be a lot of fun.