PROBLEM LINK:
Author: Sergey Kulik
Testers: Vasya Antoniuk and Kevin Atienza
Translators: Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza
DIFFICULTY:
Medium
PREREQUISITES:
Activity selection, greedy, binary search, jump pointers
PROBLEM:
There are N courses in a university. The i th course starts on day S_i and ends on day E_i.
Chef plans to enroll, but he isn’t sure about the exact time yet. Instead, he has Q plans, where the i th plan starts on day PS_i and ends on day ES_i.
For each plan, what is the maximum number of courses Chef can attend? Chef cannot attend more than one course at the same time.
QUICK EXPLANATION:
A single query is an instance of the famous activity selection problem, where the answer can be obtained by a famous greedy algorithm.
Now, first sort the courses by increasing E_i, and then break ties by sorting by decreasing S_i. After this, E_i becomes increasing.
Then, remove courses which contain other courses. This can be done in O(N) by simply removing the i th course if S_{i-1}\ge S_i. After this, S_i and E_i become increasing.
For each course, compute its parent course, which we define as the first course whose start time is greater than the end time of the course. If no such course exists, we set its parent to a sentinel course with start time \infty. The parent of the sentinel course is itself. This forms a rooted tree.
Jump pointers: For each course, compute its $2^i$th ancestor for every 2^i \le N.
Finally, to answer a query (PS_i, ES_i):
- Compute the first course with start time \ge PS_i using binary search (because S_i is increasing).
- Use the jump pointers to compute the lowest ancestor of the first course whose end time is \le ES_i.
- The answer is the distance of this ancestor from the first course.
EXPLANATION:
A single query is an instance of the famous activity selection problem, where the answer can be obtained by the following famous greedy algorithm:
- Sort the courses by end time.
- For each course in that order, take that course if it doesn’t conflict with the previous course.
An explanation of why this works can be found here.
The problem is an extension of that, where we answer multiple queries. Each query only considers activities in a particular interval [PS_i,ES_i].
Based on the greedy algorithm, a good preprocessing step would be to sort the courses by end time beforehand. For now, we break ties arbitrarily.
Unfortunately, even after sorting, we can’t simply run the greedy algorithm for every query as is, because it’s too slow! Specifically, there are two parts of it that make it slow:
- Finding the first activity whose start time is \ge PS_i.
- Finding all activities after that, greedily, but only including activities with end time \le ES_i.
We can probably simplify the problem by removing some unneeded courses. Consider for example courses whose start and end times are [6, 22], and another one with [11, 16]. These two courses overlap, so we can’t take both of them. But looking closer, actually the first one completely contains the second, which means if we were to pick one of these courses, it’s better to pick the second! If you pick the first one, you can always replace it with the second and end up with more free time.
This suggests removing courses which completely contain other courses. To do this efficiently, we first modify the tie-breaking method of our initial sort. If the end times are the same, we break ties by sorting them in decreasing order of start time. This way, if we have a series of consecutive courses with the same end time, the first one is always the shortest, so the others can be removed! The whole procedure can be done in O(N) time (after sorting) in a single pass, as illustrated by the following pseudocode:
sort the courses by increasing E_i, then by decreasing S_i
rel_courses = [] // relevant courses
add the first course (s,e) to rel_courses
for each subsequent course (s,e):
let (S,E) be the last course in rel_courses
if S < s and E < e: // (s,e) doesn't contain (S,E)
add (s,e) to rel_courses
After this, we can now ignore the original list of courses and work only on rel_courses
! And this pseudocode reveals another property of the remaining courses: Their $S_i$s are also increasing! This means that, for a given query, the first part (computing the first activity whose start time is \ge PS_i) can be done with a simple binary search!
Now all that remains is to actually compute the answer. Even having computed the first activity, doing the greedy algorithm above is still slow: it runs in O(N) in the worst case.
We can somewhat improve this by introducing the concept of a successor course. For each course, define its successor course to be the first course whose start time is greater than the end time of the course. To ensure such a course exists, we can add a dummy course whose start time is really large, say \infty.
The idea of successor courses is useful because the greedy algorithm is basically equivalent to selecting successor courses! The following pseudocode illustrates this:
def answer_query(PS,PE):
find the index i of the first course with S_i >= PS, with binary search
ans = 0
while E_i <= PE:
ans++
i = successor[i]
return ans
This is basically the same as the greedy algorithm, but potentially faster because we’re possibly skipping lots of activities that won’t be selected. Unfortunately, the worst case running time is still O(N) because there can still be up to O(N) nonoverlapping courses.
But looking closer, we can actually see that the idea of successor courses actually organizes our courses into a tree! If you consider the successor of a course as its “parent”, then the courses are in fact organized as a tree that is rooted in our dummy course.
With this interpretation, a query actually reduces to finding the lowest ancestor of a node satisfying a certain condition: E_i \le PE. But since E_i is increasing, this can also be done in O(\log N) using a form of binary search which uses jump pointers!
In more detail, let’s define \operatorname{jump}(i,j) as the $2^j$th ancestor of node i. These jump pointers can be computed recursively using the recurrence
For the base case, we set \operatorname{jump}(i,0) to be the parent of node i. We also define the parent of the root as the root itself. We compute jump pointers for every j such that 2^j \le N, so the size of the \operatorname{jump} table is \Theta(N \log N).
Now, with jump pointers, we can now answer a query more quickly by skipping lots of nodes at a time. See the following pseudocode:
def answer_query(PS,PE):
find the index i of the first course with S_i >= PS, with binary search
if E_i > PE: return 0 // skip this case
ans = 1
for j = floor[log_2 N]..0 by -1:
if E_(jump[i][j]) <= PE:
ans += 1<<j // "2 raised to j"
i = jump[i][j]
return ans
This works because the height of the tree is at most N, and every such height’s binary representation has at most \log_2 N+1 bits. This clearly runs in O(\log N) time!
Time Complexity:
O((N+Q) \log N)