CUTTREE - Editorial

adkroxx · February 28, 2018, 2:03pm

PROBLEM LINK:

Contest

Practice

Author: Jatin Yadav

Tester: Triveni Mahatha

Editorialist: Adarsh Kumar

DIFFICULTY:

Hard

PREREQUISITES:

Fast Fourier Transform, Centroid decomposition, Properties of Expectation

PROBLEM:

You are given a tree with N nodes. On each of the next N-1 days, you remove one of the remaining edges to create a forest. The strength of a forest is defined to be the sum of squares of the sizes of its connected components.

You need to compute the expected value of the strength of the resulting forest after day i for all 0 \le i < n, modulo 1000000007.

EXPLANATION:

Lets try to find the expression for E[x^2,i]. Here E[x^2,i] represent the expected value of the strength of the resulting forest after day i.

Say, after i days, there is a component of size x consisting of vertices (u_1,u_2,...,u_x). We know that, it will contribute x^2 to the answer with some probability. Now, what can be the other way to look at it?

We can decompose x^2 in the following fashion, where each f(u_k)=1:

$E[x^2,i] = E[(\sum_{k=1}^x f(u_k))^2,i]$

\Rightarrow E[x^2,i] = E[\sum_{k=1}^x f(u_k)^2,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]

\Rightarrow E[x^2,i] = E[x,i] + 2E[\sum_{a=1}^x\sum_{b=a+1}^xf(u_a)f(u_b),i]

\Rightarrow E[x^2,i] = n + 2E[u-v,i]

Now, the main problem reduces to finding the expected number of ordered pairs (u,v) which are connected after i days, indicated by E[u-v, i]. This can be solved using linearity of expectation. Basically, we need to find what is the probability that u-v are connected after i days? If we sum this probability for all the ordered pairs, we are done.

What is the probability that u,v are in the same connected component after i days?

For u,v to be in same component all the edges on the shortest path from u to v must not be removed in i days. Let the number of edges between u and v on their shortest path be k (i.e. distance between u and v is k). Then the probability P(u-v,i) can be computed as:

$P(u-v,i) = \frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

Notice that, above expression for probability only depends on the distance between u and v. If we can somehow compute number of ordered pairs at distance k as D[k] for all 1 \le k < n. We will take D[0]=0. Then, we can write the formula for expectation as:

$E[u-v,i] = \sum_{k=0}^{n-1-i}D[k]\frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

Counting the number of ordered pairs at distance k for all 1 \le k < n.

This is a standard problem and can be done in O(nlog^2n) with the help of centroid decomposition and FFT. Following are the step for doing this:

Build the centroid tree of given tree.
Traverse each node of centroid tree. When on a particular node (say P):
- You need to count distance between two nodes present in different subtrees. For doing the same:
  1. Build a polynomial for each of the subtree. Polynomial should contain $\sum_{\text{d $\in$ subtree}} x^{\text{distance between P and d}}$.
  2. Add all the polynomial created for all the subtree and square it. This will overcount those pairs which are present in same subtrees.
  3. Subtract the square of each of the subtree polynomial to remove overcounting.
  4. Still we have every pair counted two times. So divide by two, all the coefficient of polynomial.
  5. We now have a polynomial in which coefficient of $x^k$ represents the number of paths of length $k$ passing through this node and present in different subtrees.
- You need to count distance between P and all other nodes.
  1. Traverse all the subtree and update the count naively.

You can also seek editorial of PRIMEDST for more explanation.

Computing the value of E[u-v,i] modulo 1000000007 for all 0 \le i < n.

Naive computation of E[u-v,i] for all 0 \le i < n will take O(n^2) time, which is not feasible here. Lets take a closer look at the formula for E[u-v,i] now:

$E[u-v,i] = \sum_{k=0}^{n-1-i}D[k]\frac{\binom{n-1-i}{k}(k!)((n-1-k)!)}{(n-1)!}$

\Rightarrow E[u-v,i] = \frac{(n-1-i)!}{(n-1)!} \sum_{k=0}^{n-1-i}D[k]\frac{(n-1-k)!}{(n-1-i-k)!}

\Rightarrow E[u-v,i] = \frac{(n-1-i)!}{(n-1)!} H(i)

where,

$H(i)=\sum_{k=0}^{n-1-i}D[k]\frac{(n-1-k)!}{(n-1-i-k)!}$

\Rightarrow H(i)=\sum_{k=0}^{n-1-i}(D[k] (n-1-k)! )\left ( \frac{1}{(n-1-i-k)!} \right )

Observe that H(i) for all 0 \le i < n can now be computed from convolution of polynomial A(x) and B(x) where:

$A(x) = (D[0] (n-1)!) x^0 + (D[1] (n-2)!) x^1 + ... (D[r] (n-1-r)!) x^r + ... + (D[n-1] 0!) x^{n-1}$, and

B(x) = \left(\frac{1}{0!}\right) x^0 + \left(\frac{1}{1!}\right) x^1 + ... \left(\frac{1}{r!}\right) x^r + ... + \left(\frac{1}{(n-1)!}\right) x^{n-1}

Value of H(i) is the coefficient of x^{n-1-i} in A(x)*B(x) modulo 1000000007.

Multiplication of two polynomials with arbitrary modulo.

We have to multiply two polynomials and then output result coefficients modulo M which is not necessary of the kind c · 2^k + 1. Coefficients in multiplication can be up to O(M^2n) which usually can’t be handled precisely enough by floating point types. You can have a look at this pdf, in order to solve this problem efficiently in O(nlogn).

Time Complexity:

O(nlog^2n)

AUTHOR’S AND TESTER’S SOLUTIONS

Setter’s solution

Tester’s solution

Editorialist’s solution

nik_84 · March 12, 2018, 7:20pm

We can decompose x2 in the following fashion, where each f(uk)=1:… what is f(uk) ?

adkroxx · March 12, 2018, 11:30pm

u_k represents a vertex. f(u_k) just represents that we are contributing because of vertex u_k in the form of f(u_k).

john_smith_3 · March 13, 2018, 4:53am

Let me try to write an explation for the expected strength of a forest.

We start with a tree with n nodes, and randomly decompose it into a forest of smaller trees.

Let T be the set of possible trees that can be in any random forest.

Use t to denote a tree in the set T.

Use u and v to denote nodes in a tree.

Write S_t for the strength of tree t. From the question, S_t is the square of the number of nodes in t.

Write p(t) for the probability that tree t is present in the random forest.

And let E be the expected strength of the random forest.

We should start by summing the product of the probability of each random forest and the strengths of the trees. Then gather together the probability for each possible tree, and we will have
\begin{equation}
E = \sum_{t \in T} S_t \times p(t)
\end{equation}
The difficulty is that calculating p(t) would be tricky.

We will now go through a complicated looking process so that we can split
S_t into a sum of smaller pieces. Then we will be able to swap the order of
summations, and perform a sum over t \in T to simplify the expression for E.

Define an indicator function I(t,u) which has value 1 if node u is in tree t, and 0 otherwise.
Then, for a fixed tree t with r nodes, and summing over all nodes in the forest
\begin{equation}
\sum_u I(t,u) = r
\end{equation}
This last expression is a complicated way to say that there are just r possible nodes u where I(t,u)=1.

Similarly, define another indicator function I(t,u,v) to have value 1 if both
nodes u and v are in tree t, and 0 otherwise. Then, for fixed tree t with r nodes,
and summing over all ordered pairs of distinct nodes in the forest,
\begin{equation}
\sum_{u,v, u \ne v} I(t,u,v) = r(r-1)
\end{equation}
Again, this expression is just saying that there are r(r-1) pairs of nodes u,v in tree t.

We can now rewrite S_t as two sums:
\begin{equation}
S_t = r^2 = r + r(r-1) = \sum_u I(t,u) + \sum_{u,v, u \ne v} I(t,u,v)
\end{equation}

And substituting into the expression for E,
\begin{equation}
E = \sum_{t \in T} \left( \sum_{u} I(t,u) + \sum_{u,v, u \ne v} I(t,u,v) \right) \times p(t)
\end{equation}
We remove the brackets to get
\begin{equation}
E = \sum_{t \in T} \sum_{u} I(t,u) p(t) + \sum_{t \in T} \sum_{u,v, u \ne v} I(t,u,v) p(t)
\end{equation}

The sums are all finite, so the order of the $\sum$s can be swapped. Thus
\begin{equation}
E = \sum_{u} \sum_{t \in T} I(t,u) p(t) + \sum_{u,v, u \ne v} \sum_{t \in T} I(t,u,v) p(t)
\end{equation}

In any of the possible random forests, a fixed node u is always in exactly one tree. So we have
\begin{equation}
\sum_{t \in T} I(t,u) p(t) = 1
\end{equation}
There are n nodes in the forest, so summing over all possible nodes gives
\begin{equation}
\sum_{u} \sum_{t \in T} I(t,u) p(t) = \sum_{u} 1 = n
\end{equation}

Similar argument (which I am struggling to express) gives
\begin{equation}
\sum_{t \in T} I(t,u,v) \times p(t) = {\rm probability}~u~{\rm and}~v~{\rm are~in~same~connected component}
\end{equation}

And so we have the required result
\begin{equation}
E = n + \sum_{u,v, u \ne v} {\rm probability}~u~{\rm and}~v~{\rm are~in~the~same~connected~ component}
\end{equation}

The sum is over ordered pairs (u,v) with u \ne v. If considering un-ordered pairs, there is an extra factor of 2.

gorre_morre · March 13, 2018, 5:54am

I did the convolution in pypy 2 in a bit unusual way. Python can handle multiplication of arbitrary sized integers, for example numbers like 64 \cdot 10^{10^5}. Convolution can be expressed as integer multiplication, so why not just use python’s built in integer multiplication. From this idea I made my own fast convolution using big integers, which got accepted. How I love python =). Unfortunately this convolution was only able to get me 60 points on the CHEFKNN problem. Had to use C++ to fully solve that one.

Link to accepted pypy2 solution to CUTTREE

Link to 60points pypy2 solution to CHEFKNN

Here is the code, doing fast convolution with big integers:

Click to view

# Write A in base 2^p, basically do A[0] + A[1]*2^p + A[2]*(2^p)^2 + ...
def numb(A,p):
    if len(A)==1:
        return A[0]
    mid = len(A)//2
    return numb(A[:mid],p)+(numb(A[mid:],p)<<(mid*p))

# Read n digits of the number XA in base 2^p 
def splitter(XA,p,n):
    A = []
    while n>1:
        mid = n//2
        Y = XA>>(p*mid)
        A+=splitter(XA^(Y<<(p*mid)),p,mid)
        n-=mid
        XA=Y
    if n==1:
        A.append(XA)
    return A

def quick_conv(A,B): 
    n = len(A)
    m = len(B)
    maxA = max(A)
    maxB = max(B)
 
    base = min(sum(A)*maxB,sum(B)*maxA)+1
    
    M=1
    p = 0
    while M < base:
        M *= 2
        p += 1
    XA = numb(A,p)
    XB = numb(B,p)
    ZC = XA*XB
    return splitter(ZC,p,n+m-1)

vivek_1998299 · March 13, 2018, 8:38am

“In any of the possible random forests, a fixed node uu is always in exactly one tree. So we have
∑t∈TI(t,u)p(t)=1”

Could u please explain this line?

vivek_1998299 · March 13, 2018, 8:43am

And just for clearing my doubt:-

If tree is

Then T={{1,2},{1,3},{2,3},{1},{2},{3},{1,2,3}} right?

wittyceaser · March 13, 2018, 12:18pm

Wow! I could never imagine an FFT problem pass in Python. Great stuff.
Especially for this problem, getting it to pass in C++ was not straight-forward, at least for me.

john_smith_3 · March 13, 2018, 1:02pm

\{2,3\} do not form a tree so should not be in T.

vivek_1998299 · March 13, 2018, 4:18pm

Yeah sorry,can u answer my first question plz?

mathmaniac · March 13, 2018, 6:26pm

Same approach except for the fact that I used karatsuba for all the polynomial multiplications, barely passes all the test cases with all the optimizations.

john_smith_3 · March 14, 2018, 2:47am

Part 1:

This reply to Vivek is coming in three parts.

We started with a single tree, and randomly decomposed it into a forest of trees.

Let f_1, f_2, \ldots , f_k be the possible resulting forests.

Suppose the corresponding probabilities of reaching each forest are p_1, p_2, \ldots, p_k.

First key observation is that
\begin{equation}
p_1 + p_2 + \cdots p_k = 1
\end{equation}

Second observation is that if we fix a node u, then in each forest there is just one tree containing node u.

john_smith_3 · March 14, 2018, 2:48am

Part 2:

Let t be a tree. Then p(t), the probability of t being in the resulting random forest, is given by the sum of the p_i
for all the forests f_i containing t.

In symbols, we write this as:
\begin{equation}
p(t) = \sum_{i~{\rm with}~t~{\rm in}~f_i} p_i
\end{equation}

Now fix a node u and sum over different trees containing u
\begin{equation}
\sum_{t \in T~{\rm where}~t~{\rm contains}~u} p(t) =
\sum_{t \in T~{\rm where}~t~{\rm contains}~u~~} \sum_{i~{\rm with}~t~{\rm in}~f_i} p_i
\end{equation}

john_smith_3 · March 14, 2018, 2:48am

Because I(t,u) = 1 when t~{\rm contains}~u, and 0 otherwise, the left hand side is the same as
\begin{equation}
\sum_{t \in T} I(t,u) p(t)
\end{equation}

For the right hand side, we ask ourselves how many times does each p_i appear in the combined sum? Answer,
once for every tree in f_i containing node u.

How many trees are there in forest f_i containing node u? Exactly 1.

So the right hand side simplifies to
\begin{equation}
\sum_{i=1 \ldots k} p_i = 1
\end{equation}

which is the result we require.

vivek_1998299 · March 14, 2018, 7:52am

Thank you so much!! Wished i could upvote you 1 more time.

abhianshu99 · October 31, 2018, 4:47am

Can someone explain how did he calculate P(u-v,i) ??