TUPLES2 - Editorial

PROBLEM LINK:

Practice
Contest

Author: Hanlin Ren
Testers: Sergey Kulik, Kamil Dębowski
Editorialist: Pawel Kacprzak

DIFFICULTY:

Hard

PREREQUISITES:

Trees, dynamic programming, centroid decomposition

PROBLEM:

For a given tree T with N nodes, let’s consider paths in T with at least two nodes. We say that a triple of such paths is a nice triple if and only if either: all paths in the triple are pairwise disjoint or for each pair of paths in the triple these two paths share at least one common node. The goal is to find the number of nice triples of paths in T calculated modulo 10^9+7.

EXPLANATION:

There exist at least two completely different approaches to this problem. Setter of the problem used O(N \cdot \log N) method based on counting degree of a modified graph and speed it up with centroid decomposition. The first tester (Sergey) used also this solution, while the second tester (Kamil) used a completely different O(N) approach based on dynamic programming on the input tree. Subtasks in this problem are created mainly for slow implementation of setter’s idea, but possibly other approaches are also fine for such less restrictive constraints.

Setter’s idea

First of all, let’s sketch the setter’s idea:

Consider creating a graph of M=\binom{N}{2} vertices, where every vertex represents a path in the input tree and two vertices are connected by an edge if and only if paths corresponding to them intersect. We will call this graph G. Now, the problem is reduced to two problems: counting the number of cliques of size 3 in G and counting the number of independent sets of size 3 in G.

Next, let d_v be the degree of vertex v in G and F_i be the number of induced subgraphs of size 3 of G with exactly i edges. Clearly, F_i is only defined for i = 0, 1, 2, 3.

Now the crucial observation, since there are \binom{M}{3} induced subgraphs of size 3 of G, then:

F_0 + F_1 + F_2 + F_3 = \binom{M}{3}

and since we are interested in F_0 + F_3, which is the number of independent sets of size 3 in G plus the number of cliques of size 3 in G, when we can as well compute F_1 + F_2 and then get the final result by subtracting this value from \binom{M}{3}. Thus, we can use the following formula in order to reduce computing of F_1 + F_2 even further:

2 \cdot (F_1 + F_2) = \sum\limits_{i=1}^{M} d_i \cdot (M-d_i-1) = (M-1) \cdot \sum\limits_{i=1}^{M} d_i - \sum\limits_{i=1}^{M} d_i^2

and the problem is now reduced to computing the sum of all degrees in G and the sum of squares of all degrees in G. Both setter and the first tester (Sergey) solved this problem using traversing the input tree and speeding up multiple calculations with centroid decomposition.

The total time complexity of this approach is O(N \cdot \log N) when centroid decomposition is used. For implementation details please refer to their solutions linked below.

Subtasks 1 and 2

Solutions to both easiest subtasks are intended for slow implementation of above idea. A quadratic time solution, by just counting the degrees of paths starting in node v of the input tree performed for every v, should get accepted in the first subtask, while an inefficient implementation of the intended idea should pass tests related to the second subtask.



Second tester’s (Kamil) idea

The second idea is not only faster but perhaps also easier to implement. However, as it happens often with harder problems, coming up with the idea is crucial.

The approach can be divided into two problems:

  1. Computing the number of triples of paths, where all paths are pairwise intersecting, let’s call them intersecting triples
  2. Computing the number of triples of paths, where no two paths intersect, let’s call them disjoint triples

Let’s start with the first problem

Computing the number of intersecting triples

Main observation here is the following:

3 paths are pairwise intersecting if and only if the intersection of all 3 of them is a non-empty path, containing at least 1 node.

For a triple of paths X, let P(X) be such a path which is the intersection of all of them.

Then, if we root T in an arbitrary node, then for every X, let high(X) be the closest node from the root in P(X).

Now, for every node v in T, we want to compute the number of triples X, for each high(X) = v.

Let’s fix node v and assume that there are exactly A paths crossing v, which means that there are \binom{A}{3} triples of paths crossing v. However, we are interested only in those of them which don’t intersect as a triple higher in the tree than v (because we said that v is the highest vertex of the intersection of paths), so we want to subtract these unwanted triples, let their number be B, from \binom{A}{3}. Computing A is pretty straightforward if we know sizes of subtrees of children of v. Similarly, counting B can be done by first counting C as the number of paths going through both v and the parent of v, and then we can calculate B as B = \binom{C}{3}. Notice that counting C is pretty easy because it’s just the number of nodes in v's subtree multiplied by the number of nodes, which are not in v's subtree.

It follows that this part can be solved in O(N) time, and the implementation shouldn’t be complicated - please take a look at Kamil’s solution linked below.

Computing the number of disjoint triples

The only remaining thing is to compute the number of disjoint triples. The idea here is to use dynamic programming to get this value. We are going to do this by performing a DFS. While visiting node v, first we visit all its children, and for each child v_i, we assume that the following values are computed for k = 0, 1, 2, 3:

dp_{empty}[v_i][k] := the number of ways to choose exactly k disjoint path in v_i's subtree, which won’t be ever extended (i.e. none of them goes up to the parent of v_i)

dp_{single}[v_i][k] := the number of ways to choose exactly k disjoint path that won’t be ever extended, and the (k+1)-th path (also disjoint with others) that will go to v_i's parent and perhaps will be extended even further

dp_{doubled}[v_i][k] := the number of ways to choose exactly k disjoint path that won’t be ever extended, and the (k+1)-th path (also disjoint with others) that goes through v_i and through two its children — such a path is the result of merging two paths that went from subtrees of v_i's children to v_i. It means that v_i is the highest (closest to the root) vertex of the path and clearly the path can’t be extended further up

It turns out that knowing values of above 3 functions for all children of v and k is sufficient to compute the values of these functions for v in the time proportional to the number of v's children.

Clearly the final result is dp_{empty}[root][3] and the time complexity of this approach is O(N). For implementation details, and exact equations of updating the above values please refer to second tester’s (Kamil) solution linked below.

Summary

Since both problems are solved in linear time, the total time complexity is also linear and runs under a second faster than the setter’s approach described at the beginning. The implementation linked below is well documented, so we encourage you to check it out.



AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution can be found here.
First tester’s solution can be found here.
Second tester’s solution can be found here (linear approach). This code was re-uploaded a few hours after the editorial was posted. Please hard-refresh it (ctrl+F5) if you’ve alraedy opened it before. The new version should have functions dfs_dp and dfs_intersect.

3 Likes

May you please explain more how the formula 2. (F1 + F2) = SUM_all_i (di . (M - di - 1) is coming from?

1 Like
//