PROBLEM LINK:
Author: Ramazan Rakhmatullin
Testers: Lewin Gan, Marek Sommer
Editorialist: Pawel Kacprzak
DIFFICULTY:
Medium
PREREQUISITES:
Trees, dynamic programming, DFS
PROBLEM:
The problem can be reformulated as follows:
There is given a tree T with N nodes numbered from 0 to N-1, where the i-th node has a color color(i) and level level(i). The goal is to find for each node v the number of non-empty sequences of nodes: v_0, v_1, \ldots, v_k such that:
- v_0 = v, i.e. the sequence starts in v
- All v_i in the sequence have the same color
- For i < k, v_i is ancestor of v_{i+1} in the tree
- level(v_k) = 0
- For each i < k, level(v_i) = level(v_{i+1}) + 1
We call each such sequence a good sequence.
EXPLANATION:
First of all, in the input, there is given an upper limit on maximum level, but we’re going to ignore completely this variable.
Subtask 1
In the first subtask, the sum of N over all test cases is at most 5000. Also, there is an upper limit of 20000 for the sum of numbers in the output, but we won’t use it in the below solution.
The general idea is to use dynamic programming on the tree. Just for a minute let’s forget about efficiency and just defined the most straightforward recursive equation for the tree.
Let dp[v][c][k] be the number of good sequences of length k, color c in v's subtree. Let’s fix node v, and let result[v] be the number of good sequences starting in v, i.e. the result for v, and count(c) be the number of nodes with color c… Then both result[v] and dp[v][c][k] for each c and k can be computed in the following steps:
- For any c, dp[v][c][0] = 0 just by the definition of a good sequence.
- For every color c and any k, dp[v][c][k] can be initiallized as \sum\limits_{u \in children(v)} dp[u][c][k].
- If level(v) = 0, then result[v] = 1, because the only good path starting in v also has to ends in v.
- If level(v) \geq count(color(v)) then result[v] = 0 because there are not enough nodes of color color(v) to form a good sequence starting in v.
- Otherwise, level(v) > 0, so result[v] = \sum\limits_{u \in children(v)} dp[u][color(v)][level(v)]. because any good path of color color(v) of length level(v) can be extended by v to a good path of length level(v)+1.
- Finally, we add result[v] to dp[v][color(v)][level(v)+1]
Ok, but what are the time and memory complexity of this method. Well, there are N nodes, at most N colors and the maximum length of a sequence can be at most N, so it looks like the dp table has to have N^3 entries. However, for a fixed color c, the maximum length of a sequence of color c is at most count(c). This is true because each two consecutive elements of a sequence have to have consecutive levels and the last node has to have level 0. Thus, the dp array, in fact, has at most O(N^2) entries. If this approach is implemented with a DFS, then each entry dp[v][c][k] is passed only a constant number of times to its parent, and because there are O(N^2) such entries, the total time complexity is O(N^2).
Subtask 2
The approach for the first subtask given in this editorial should also pass for the second subtask if only rather than passing everywhere dp[v][c][k] for every color c, we do it only for colors occurring in the input. Constraints for this subtasks are pretty much similar, so perhaps there exist other slight improvements over the method from the first subtask.
Subtask 3
In the third subtask, N is at most 10^5 and sum over N in all test cases is at most 5 \cdot 10^5. These constraints are large enough to prevent any quadratic time approaches. One idea for a solution passing them, which should be too slow to pass the final subtask, is to create for each color c a compressed tree containing only nodes of color c. A compressed tree here means that if there are count(c) nodes of color c, then such a tree should contain O(count(c)) nodes and for any two nodes u and v of colors c if u is an ancestor of v in the original tree then it’s also an ancestor of v in the compressed tree. All such compressed trees can be created using LCA algorithm in O(N \cdot \log(N)) time. After that, dynamic programming approaches can be used to find the results separately for each compressed tree.
Subtask 4
In the last subtask, N can be up to 5 \cdot 10^5 and sum of N over all test cases is at most 10^6, which suggests that this subtask requires a really fast solution, especially considering that the time limit for this problem is 1 second. Fortunately, a really beautiful and easy to implement solution exists.
The general idea is to improve the complexity of the method described in the solution for the first subtask (which I suggest read first).
Let count(c) be the number of nodes of color c in the tree and result[v] be the number of good sequences starting in v, i.e. the final result for v.
Now, let’s assume that we run DFS over the tree staring in the root and that dp[c][k] is the number of good sequences of color c starting in nodes with level k found so far in the tree, and we’ve just entered node v. Let’s save the current value of dp[color(v)][level(v)-1] into variable s or assign s := 0 if level(v) = 0. Notice that s now represents the value of dp[color(v)][level(v)-1] before entering the v's subtree. Next, let’s run DFS for all children of v recursively. After all these recursive calls complete, results for all nodes in v's subtree except v are already computed. The only thing left to do for this subtree is to compute the result for v. There are a few cases to consider:
- If level(v) \geq count(color(v)) then we know that result[v] = 0, because there are not enought nodes in the tree to for any good sequence starting in v.
- Else if level(v) = 0, then result[v] = 1, because the only good sequence starting in v ends also in v.
- Otherwise, 1 \leq level(v) < count(color(v)). Notice that after all recursive calls for children of v finished, then dp[c][k] counts the number of good sequences of nodes with color c starting in nodes with level k counted before DFS entered v's subtree and also these ones counted while DFS was run for all children of v. Remembering what we saved to variable s before running DFS for children of v, result[v] can be computed as the current value of dp[color(v)][level(v)-1] minus s. After that, we just add the value result[v] to dp[color(v)][level(v)].
The total time complexity of this method is O(N) because dp table contains O(N) entries and while DFS visits a single node it performs a constant number of operations not counting iterating over children of the current node and recursive calls.
For implementation details of this exact approach please refer to Marek’s solution and Pawel’s (mine) solution using the same variable names as the editorial.
AUTHOR’S AND TESTER’S SOLUTIONS:
Setter’s solution can be found here.
First tester’s solution can be found here.
Second tester’s solution can be found here.
Editorialist’s solution can be found here.