TIW: STL #3
I will dedicate this post to two functions in particular: upper_bound and lower_bound. They make the impossible become possible. Basic facts about them:
- they act on sorted data structures (sorted vector, set, map, etc);
- they perform binary search in O(log(n)) time;
- they return iterators so you can move forward and backward.
I will then go over 2 problems to demonstrate the power of these functions.
Let’s say you have 5 integers in a data structure, {1, 3, 5, 7, 9}, either in a vector or a set. Say you would like to find the smallest number that is larger than k, then you simply do:
vector<int> v{1, 3, 5, 7, 9};
set<int> s(v.begin(), v.end());
int k = 6;
// Binary search in a vector
auto it = upper_bound(v.begin(), v.end(), k);
if (it != v.end())
cout << *it << endl;
// Binary search in a set
it = s.upper_bound(k);
if (it != s.end())
cout << *it << endl;
Lower bound is not the opposite of upper bound, surprisingly. It finds the smallest number that is larger than or equal to k, allowing the equal case (which is excluded in upper_bound).
The functions are simple, but let’s dive into practice. There are quite a few problems on Leetcode for which I used upper_bound, but many are quite hard (see this one: Perfect Rectangle). This series is supposed to be easy and about syntax rather than algorithm, so I decided to make up one problem.
Problem statement
Given a vector of integers v and an integer k, find the smallest sum from all pairs of elements that is larger than k. For example for v = {6, 0, 5, 1, 9} and k = 12, possible sums would be {1, 5, 6, 7, 9, 10, 11, 14, 15}, and the smallest that is larger than 12 would be 14.
One way to do it is to sort it and use 2 pointers, which is O(nlog(n)). I will talk about that approach in a later post. Here, a way is to use set. The algorithm essentially picks an element x, then looks at all the numbers in front of it, and pick the smallest one y that is larger than k-x, since then we know x+y > k.
int closestTwoSum(vector<int>& v, int k) {
set<int> s;
int ans = INT_MAX;
for (auto x : v) {
auto y = s.upper_bound(k-x);
if (y != s.end())
ans = min(ans, x+*y);
s.insert(x);
}
return ans;
}
To get the largest sum that is smaller than k, there are two things you can do: negate all numbers and apply the same code (essentially reversing order of integers), or move the iterator given by upper_bound. Just to demonstrate, the second approach looks like this:
int closestTwoSum(vector<int> v, int k) {
set<int> s;
int ans = INT_MIN;
for (auto x : v) {
auto y = s.lower_bound(k-x); // *y larger than or equal to k-x
if (y != s.begin()) { // if *y is not the smallest number we have
y--; // go back to the smaller one
ans = max(ans, x+*y);
}
s.insert(x);
}
return ans;
}
That’s a teaser problem. Now let’s get to the real business, longest increasing subsequence. The problem statement is:
Given an array of integers, find the longest increasing subsequence and return its length. For example in {1, 4, 5, 7, 2, 9, 3}, the longest increasing subsequence is {1, 4, 5, 7, 9}, so we return 5.
It’s also on Leetcode: Longest Increasing Subsequence
This is a well known dynamic programming problem, depending on how you design your dp algorithm, you can get either O(n^2) or O(nlog(n)) solutions. The O(nlog(n)) algorithm is well known, so I will just briefly describe it.
Our basic idea is to keep appending elements to increasing subsequences. Key observation: if using the first x numbers, you can create two increasing subsequences of length 5: one ending with number 7 and another ending with number 9, then we can discard the latter. This is because if we could append a later element k to 9, we can definitely append k to 5 to make a sequence of length 6. Therefore in our dp array, for each dp[i], we store the smallest element that an increasing subsequence of length i can end with. With a new element k, we search for the largest dp[i] that is smaller than k, then assign k to dp[i+1] in case k is smaller than dp[i+1]. Then the length of the longest increasing subsequence would be the largest i such that dp[i] exists.
Since this post is not meant to teach you introductions to algorithms, you should google it if you don’t understand. I’m just trying to how to implement it without making a mess. In the algorithm, there’s a part of binary search, so that could be replaced by a vector upper_bound. It turns out that instead of appending to the subsequences, it’s easier to work in reverse and prepend to them. If we are searching for a larger element, that means we are finding an increasing subsequence that starts with a number larger than the current one, and we are trying to prepend to it. Let’s look at the code and analyze it:
int lengthOfLIS(vector<int>& nums) {
vector<int> s;
for (int i = nums.size()-1; i >= 0; i--) {
auto it = upper_bound(s.rbegin(), s.rend(), nums[i]);
if (it == s.rbegin())
s.push_back(nums[i]);
else if (*(--it) < nums[i])
*it = nums[i];
}
return s.size();
}
Here, s[i] denotes the largest number that a subsequence with length i can start with. As the length increases, intuitively the starting number has to be smaller, so this vector is strictly decreasing. To use upper_bound to binary search for the slightly larger element to update, we need to pass in the reverse iterators rbegin() and rend(). If the returned iterator is the last element in the vector, that means we have found a new longest increasing subsequence. Otherwise, it means we have found a better increasing subsequence of the same length with a larger (better) starting value.
Final remark on longest increasing subsequence: to get the longest nondecreasing subsequence, we can preprocess the array to use almost the exact same code above. Here’s how:
int lengthOfLNDS(vector<int> nums) {
vector<pair<int, int> > v;
for (int i = 0; i < nums.size(); i++)
v.push_back(make_pair(nums[i], i));
vector<pair<int, int> > s;
for (int i = v.size()-1; i >= 0; i--) {
auto it = upper_bound(s.rbegin(), s.rend(), v[i]);
if (it == s.rbegin())
s.push_back(v[i]);
else if (*(--it) < v[i])
*it = v[i];
}
return s.size();
}
Pretty simple, I just used a pair<int, int> to replace int. The first item is the original data, the second item is the array index, used to tie break equal values.