COMPCLUB - Editorial

pkacprzak · June 24, 2017, 10:58am

PROBLEM LINK:

Practice
Contest

Author: Ramazan Rakhmatullin
Testers: Lewin Gan, Marek Sommer
Editorialist: Pawel Kacprzak

DIFFICULTY:

Medium

PREREQUISITES:

Trees, dynamic programming, DFS

PROBLEM:

The problem can be reformulated as follows:

There is given a tree T with N nodes numbered from 0 to N-1, where the i-th node has a color color(i) and level level(i). The goal is to find for each node v the number of non-empty sequences of nodes: v_0, v_1, \ldots, v_k such that:

v_0 = v, i.e. the sequence starts in v
All v_i in the sequence have the same color
For i < k, v_i is ancestor of v_{i+1} in the tree
level(v_k) = 0
For each i < k, level(v_i) = level(v_{i+1}) + 1

We call each such sequence a good sequence.

EXPLANATION:

First of all, in the input, there is given an upper limit on maximum level, but we’re going to ignore completely this variable.

Subtask 1

In the first subtask, the sum of N over all test cases is at most 5000. Also, there is an upper limit of 20000 for the sum of numbers in the output, but we won’t use it in the below solution.

The general idea is to use dynamic programming on the tree. Just for a minute let’s forget about efficiency and just defined the most straightforward recursive equation for the tree.

Let dp[v][c][k] be the number of good sequences of length k, color c in v's subtree. Let’s fix node v, and let result[v] be the number of good sequences starting in v, i.e. the result for v, and count(c) be the number of nodes with color c… Then both result[v] and dp[v][c][k] for each c and k can be computed in the following steps:

For any c, dp[v][c][0] = 0 just by the definition of a good sequence.
For every color c and any k, dp[v][c][k] can be initiallized as \sum\limits_{u \in children(v)} dp[u][c][k].
If level(v) = 0, then result[v] = 1, because the only good path starting in v also has to ends in v.
If level(v) \geq count(color(v)) then result[v] = 0 because there are not enough nodes of color color(v) to form a good sequence starting in v.
Otherwise, level(v) > 0, so result[v] = \sum\limits_{u \in children(v)} dp[u][color(v)][level(v)]. because any good path of color color(v) of length level(v) can be extended by v to a good path of length level(v)+1.
Finally, we add result[v] to dp[v][color(v)][level(v)+1]

Ok, but what are the time and memory complexity of this method. Well, there are N nodes, at most N colors and the maximum length of a sequence can be at most N, so it looks like the dp table has to have N^3 entries. However, for a fixed color c, the maximum length of a sequence of color c is at most count(c). This is true because each two consecutive elements of a sequence have to have consecutive levels and the last node has to have level 0. Thus, the dp array, in fact, has at most O(N^2) entries. If this approach is implemented with a DFS, then each entry dp[v][c][k] is passed only a constant number of times to its parent, and because there are O(N^2) such entries, the total time complexity is O(N^2).

Subtask 2

The approach for the first subtask given in this editorial should also pass for the second subtask if only rather than passing everywhere dp[v][c][k] for every color c, we do it only for colors occurring in the input. Constraints for this subtasks are pretty much similar, so perhaps there exist other slight improvements over the method from the first subtask.

Subtask 3

In the third subtask, N is at most 10^5 and sum over N in all test cases is at most 5 \cdot 10^5. These constraints are large enough to prevent any quadratic time approaches. One idea for a solution passing them, which should be too slow to pass the final subtask, is to create for each color c a compressed tree containing only nodes of color c. A compressed tree here means that if there are count(c) nodes of color c, then such a tree should contain O(count(c)) nodes and for any two nodes u and v of colors c if u is an ancestor of v in the original tree then it’s also an ancestor of v in the compressed tree. All such compressed trees can be created using LCA algorithm in O(N \cdot \log(N)) time. After that, dynamic programming approaches can be used to find the results separately for each compressed tree.

Subtask 4

In the last subtask, N can be up to 5 \cdot 10^5 and sum of N over all test cases is at most 10^6, which suggests that this subtask requires a really fast solution, especially considering that the time limit for this problem is 1 second. Fortunately, a really beautiful and easy to implement solution exists.

The general idea is to improve the complexity of the method described in the solution for the first subtask (which I suggest read first).

Let count(c) be the number of nodes of color c in the tree and result[v] be the number of good sequences starting in v, i.e. the final result for v.

Now, let’s assume that we run DFS over the tree staring in the root and that dp[c][k] is the number of good sequences of color c starting in nodes with level k found so far in the tree, and we’ve just entered node v. Let’s save the current value of dp[color(v)][level(v)-1] into variable s or assign s := 0 if level(v) = 0. Notice that s now represents the value of dp[color(v)][level(v)-1] before entering the v's subtree. Next, let’s run DFS for all children of v recursively. After all these recursive calls complete, results for all nodes in v's subtree except v are already computed. The only thing left to do for this subtree is to compute the result for v. There are a few cases to consider:

If level(v) \geq count(color(v)) then we know that result[v] = 0, because there are not enought nodes in the tree to for any good sequence starting in v.
Else if level(v) = 0, then result[v] = 1, because the only good sequence starting in v ends also in v.
Otherwise, 1 \leq level(v) < count(color(v)). Notice that after all recursive calls for children of v finished, then dp[c][k] counts the number of good sequences of nodes with color c starting in nodes with level k counted before DFS entered v's subtree and also these ones counted while DFS was run for all children of v. Remembering what we saved to variable s before running DFS for children of v, result[v] can be computed as the current value of dp[color(v)][level(v)-1] minus s. After that, we just add the value result[v] to dp[color(v)][level(v)].

The total time complexity of this method is O(N) because dp table contains O(N) entries and while DFS visits a single node it performs a constant number of operations not counting iterating over children of the current node and recursive calls.

For implementation details of this exact approach please refer to Marek’s solution and Pawel’s (mine) solution using the same variable names as the editorial.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution can be found here.
First tester’s solution can be found here.
Second tester’s solution can be found here.
Editorialist’s solution can be found here.

vivek_reddy · June 24, 2017, 10:48pm

Thank You

Curious to know the solution

hrushikesh_t · June 25, 2017, 6:39am

Can someone please tell me why my solution is giving TLE?

I wrote the exact same code as @pkacprzak.

Solution Link

pkacprzak · June 25, 2017, 6:54am

In fact, the real reason is that you used std::endl to output new lines. Here is a submission with using explicitly “\n” new line character: https://www.codechef.com/submit/complete/14338613 You can read about the difference between these methods here: https://stackoverflow.com/a/213977/948918 In this problem the output is quite large and time limit very strict, so it’s not a good idea to flush the buffer frequently.

hrushikesh_t · June 25, 2017, 7:10am

Thank you for pointing out. I will keep it in mind.

chandyshot · June 25, 2017, 10:41am

Someone please explain the use of variable s, from pkacprzak’s solution, I didn’t get that.

gasser · June 25, 2017, 11:55am

Can any one help me to find why i get TLE

my code is here: http://ideone.com/s9OWbY

my approach: dfs with inTime, outTime for every node then sort queries by (club, level) then get the sum using segment tree

ashishanand26c · June 26, 2017, 2:20pm

@chandyshot
variable s is used to substract the the number of sequences having color c and level k which have not started with v

chawla_harsh · June 26, 2017, 8:55pm

can you please explain the role of variable s? Why we have minus it from the dp array to get the result?

inseder · June 27, 2017, 3:46am

Even though I wasn’t able to solve this problem during the contest - I was really confident that the idea from subtask 3 could solve this problem. With a lot of submissions, small optimizations, I’ve passed the tests.
I think that some tests could TLE with my approach and I challenge anyone with time to find a good testcase :D.

What I did: my_solution

build a tree with all nodes O(n) complexity
dfs iterative (for optimization) - in this dfs I build a “compressed” tree for every color O(N) - (I don’t know why the editorial state that this should take O(N * log N)
for every tree make a dfs that returns for the subtree a map of {length - count} for the entire subtree
merge the maps returned by every dfs for a child and obtain the full map for current node
compute the solution with the idea from subtask 1

I have to admit that the “good” solution is amazing and this post is just to confirm that the idea from subtask 3 could pass the current tests.

Can anyone provide a test case that will TLE?
Thanks in advance.

hikarico · June 27, 2017, 4:30pm

Same here, also did merging of the DP in O(n log n) which luckily passed the last case (link). Interested to find an input that hacks this for a TLE as well.

tapanr97 · June 27, 2017, 11:06pm

I couldn’t understand the use of variable s. Why are we subtracting s from result[v]? Can anyone give proper explaination?

inseder · June 28, 2017, 1:41am

@tapanr97 I will try to explain in my words. First of all everything starts from the explanation of the first subtask. The result is obtained by filling up dp[v][c][k] with the appropriate values. This logic is really good but you need as explained in the editorial O(N^2).

Now lets think of a node. While you run dfs when you get to a node you need the values for the children of level - 1. After you completed a node you can easily see that the values in the dp table for children is not need anymore.

Now getting back to your question the solution relies on a way of getting rid of this extra memory. Let’s assume that you get to a node where you have that dp[c][k] computed.

before running the dfs for the children s = dp[c][k] = “the number of good sequences of color c starting in nodes with level k found so far in the tree”
after you run dfs for a child u: dp[c][k] = s + x -> where x is the amount of good sequences of color c starting in nodes with level k in the subtree rooted in u
going this for all children will result in: dp[c][k] = s + sum(x_u)
for the result of the node v you need the sum of good sequences of color(v) starting with level(v) - 1 in the subtree of v - this is the dp’[c][l-1] - dp[c][l-1]. I noted dp’ as the value of table after the dfs was run over the children of the node.

This is what I understood. Please correct me if I am wrong.

tapanr97 · June 28, 2017, 11:42am

Thank you…Now I got it @inseder

sinha01 · June 28, 2017, 1:26pm

If level(v)≥count(color(v))level(v)≥count(color(v)) then result[v]=0result[v]=0 because there are not enough nodes of color color(v)color(v) to form a good sequence starting in v…can anyone please explain this statement.

sinha01 · June 28, 2017, 1:27pm

If level(v) ≥ count(color(v)) then result[v]=0 because there are not enough nodes of color color(v) to form a good sequence starting in v. can anyone please explain thsi statement?

sinha01 · June 28, 2017, 8:54pm

what is the significance of variable s?

vivek_reddy · June 28, 2017, 10:29pm

Editorialist’s solution is simple and understandable