PROBLEM LINK:
Author: Praveen Dhinwa
Tester: Suhash Venkatesh and Kevin Atienza
Editorialist: Kevin Atienza
PREREQUISITES:
Greedy algorithms, rooted trees, dynamic programming
PROBLEM:
(Note: this is a translation of the story in terms of rooted trees)
You are given a rooted tree of N nodes with the root node being 1. You are given an array A having K distinct integers, each denoting a level of the tree. For every element A_i of the array, you must delete a node from level A_i. However, when you delete a node, all nodes in its subtree are automatically removed as well. You may choose which order to perform the deletion. What is the minimum number of nodes that must get deleted?
QUICK EXPLANATION:
It is clear that we should delete nodes in decreasing level. Now, let F(n) be the answer to the problem considering only the values A_i having A_i \ge n. Then F(n) can be computed recursively as follows:
- If n is greater than the height of the tree, then the array is empty and there are no more deletions to consider, so we have F(n) = 0.
- If n < A_i for all i, then there’s no need to delete a node from the current level, so we can simply take F(n) = F(n+1).
- If n = A_i for some i, then we must delete a node in level n. Let d be some node at level n. If we decide to delete this node, then all nodes in the subtree are automatically deleted. But if the node d has height H_d, then all $A_j$s with A_j \le n + H_d can be ignored, because we can simply take them from this subtree (they will be deleted anyway). We can try this for all possible nodes d at level n, so we have:
where L_d is the level of d and S_d is the size of the subtree rooted at d.
This only takes O(N) time overall because each node appears in only one level and thus checked only once during the computation of the $F(n)$s.
EXPLANATION:
The first thing that may come to mind is to take the smallest-value in the array A, and take the longest path that reaches all levels whose numbers are in A, in the hopes of minimizing the number of subtrees cut, and therefore possibly the number of nodes cut. However, this greedy selection doesn’t always work, as shown by the sample input!
Other greedy solutions that comes no mind are: take the smallest/largest sized subtree, or take the smallest/largest-height subtree, or something else. But one can probably find some counterexamples to those strategies. We need to find a solution that is convincingly correct.
Order of deletion
One of the first things we can think about is the order in which we process the levels A_i. Intuitively, we should process them in decreasing order, because if we try to delete from a lower level first, we might have to remove another tree from a higher level when we don’t have to. An example is the following tree:
A
/ \
B C
| |
D E
with A = [2,3]. Let’s say we delete a node from level 2 first, and without loss of generality we delete B. Now we have to delete some node from level 3, and our only remaining choice is E. So we have removed three nodes, but we could have removed only two by deleting, say D, first, and then B!
Okay, so our intuition tells us that we should delete nodes in decreasing order of levels. Let’s try to prove this a bit more rigorously. Suppose we have an optimal solution for the problem whose deletions are not in decreasing order of level. Let’s say the order of levels is A_{i_1}, A_{i_2}, \ldots, A_{i_K}. Since this is not a decreasing sequence, there thus exists a position such that A_{i_j} < A_{i_{j+1}} for some j. Let X and Y be the nodes deleted at A_{i_j} and A_{i_{j+1}}, respectively. If Y is in the subtree rooted at X, then it would have been removed when X was deleted, so we could not have been able to choose Y in the first step. Therefore, Y is not in the subtree rooted at X. Also, obviously X is not in the subtree rooted at Y because X has a lower level than Y. This means that the order of deleting X and Y doesn’t matter, and we can swap the order in which X and Y are deleted while still maintaining optimality. We can do this for all other positions A_{i_j} < A_{i_{j+1}} until there are none. Since at each step we are increasing the inversion number of the array, we are sure that this process terminates in a finite number of steps (because the inversion count of the array is bounded by K(K-1)/2), and the end result is an optimal solution with the nodes deleted in decreasing order of level, which is what we wanted to show!
Which nodes to delete
Now that we know that we can delete the nodes in decreasing order of level, we can now find the correct choice of a node to delete from each level in A. Now, we assume that A is reverse sorted, so A_1 > A_2 > A_3 > \ldots > A_K. Let f(k) be the optimal solution if we consider only the first k elements of A (for 0 \le k \le K). Obviously the answer to the problem is f(K).
f(0) is easy to compute: it is simply 0, because we don’t have to delete any node! So let’s try to solve something else. Suppose we want to compute f(k) for k > 0. As we have seen earlier, it seems not wise to make a greedy choice, so we’ll check all subtrees at level A_k instead! We’ll look at a particular node X (at level A_k) and see what will happen if we delete that node.
If we delete X, the next choice we make is the node rooted at A_{k-1} (remember that this node is deleted before X). Suppose the highest level of any node under X is L. Now, if A_{k-1} \le L, then we can just select any node under X of level A_{k-1}, because X will be deleted anyway! This gives us a choice of node at the level A_{k-1} for free, and in fact for all A_{k'} such that A_{k'} \le L. Therefore, the next level we have to make a selection for is the smallest A_{k'} such that A_{k'} > L. But in this case, the answer is simply f(k') by definition! Therefore, we have shown that if we delete X, then the number of nodes we will delete in total is \text{size}(X) + f(k'), where \text{size}(X) is the number of nodes under X (including X itself), and k' is maximal such that A_{k'} > L (or k' = 0 if such a k' doesn’t exist). This gives us a recurrence for f(k):
(where L(X) is the highest level of any node under X, which is simply \text{level}(X) + \text{height}(X))
Using dynamic programming to store the values of f(k) and computing it using this recurrence, we now have an O(N + K \log K) = O(N \log N) solution for this problem! (the \log N part comes from binary search to compute k')
Removing the binary search
We can actually remove the binary search problem above by altering the solution by a little bit. Let us define F(n) as the optimal solution if we only consider the elements of A such that A_i \ge n. Then we have the following observations:
- F(L+1) = 0, where L is the level of the tree.
- If there exists a k such that A_k = n, then F(n) is just f(k) by definition. Thus one can view F as a “generalization” of f.
- If there doesn’t exist such a k, then F(n) = F(n+1).
- More importantly, we have the following recurrence for F(n), assuming there exists a k such that n = A_k:
This equation does not contain anything that requires binary search!
Therefore, we can construct the array F instead, from F(L+1) to F(1), we can compute the answer as F(1), in O(N) time
Time Complexity:
O(N)
AUTHOR’S AND TESTER’S SOLUTIONS:
Will be posted soon