### PROBLEM LINK:

**Author:** Istvan Nagy

**Tester:** Kevin Atienza

**Translators:** Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Easy-Medium

### PREREQUISITES:

Minimum spanning tree, Kruskal’s algorithm, union-find

### PROBLEM:

You are given a weighted undirected tree. Two nodes are randomly chosen to be connected with an edge of 0 cost. Find the expected cost of the MST of the resulting graph.

### QUICK EXPLANATION:

We want the value C_{MST} - \frac{S}{T} where

- C_{MST} is the MST cost of the original graph (i.e. just the sum of the costs of all edges).
- S is the sum of the largest edge weight across all paths.
- T is the number of paths (i.e. N(N-1)/2).

S is the only nontrivial thing to compute. Perform Kruskal’s algorithm, but for each connected component, also keep track of its size. Then, S is the sum of \mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c for every step (a,b,c) of the Kruskal algorithm.

### EXPLANATION:

# Updating the MST

Suppose we want to connect nodes i and j with an edge of 0 cost. How will the MST be updated?

Here’s one equivalent characterization of a (finite) tree: *A tree is an acyclic graph with exactly N-1 edges.* This gives us some clear requirements on how to turn the graph back into a tree again. Specifically, upon adding the edge (i,j,0):

- We are creating exactly one cycle, namely the cycle i \rightarrow \ldots \rightarrow j \rightarrow i, where the \ldots represents the original path from i to j. Thus, we need to remove at least one edge from this cycle.
- The number of edges is now N, which means we must remove
*exactly*one edge.

Together, this means that we must remove one edge from that cycle. Which edge? Well, we want the resulting graph to have the smallest possible cost, so naturally we want to remove the edge with the *largest* cost!

To summarize, by adding the edge (i,j,0), the cost of the MST reduces by exactly the largest edge weight in the path from i to j!

# Expected value of the new MST

To answer the question, we want to find the expected cost of the new MST. Let C_{MST} be the cost of the original tree. Then from the above, and from the fact that (i,j) is uniformly chosen, we see that this expected value is simply C_{MST} - \frac{S}{T}, where

- S is the sum of the largest edge weight across all paths.
- T is the number of paths (i.e. \binom{N}{2} = N(N-1)/2).

T is pretty easy to compute, (just be careful with overflow!) so all that remains is to compute S. Unfortunately, naïve ways of computing this would be too slow, because there are O(N^2) paths! So instead of computing the sum across paths, let’s try to compute the sum across all edges, and just figure out how many paths have that edge as the maximum-cost edge. In other words,

where F(a,b,c) is the number of paths whose largest-cost edge is (a,b,c).

How do we compute F(a,b,c)? Let’s consider some path x \leftrightarrow y. This path has (a,b,c) as its largest-cost edge if and only if the following two conditions are satisfied:

- x is connected to a with edges of cost < c.
- y is connected to b with edges of cost < c.

(Note that it might be the other way around, i.e. x is connected to b and y is connected to a, but then again, paths are symmetric, so x \leftrightarrow y should be considered the same as y \leftrightarrow x.)

So we find that F(a,b,c) is simply R(a,c)\cdot R(b,c), where R(x,c) is the number of nodes reachable from x with edges of cost < c. But how do we compute R(x,c)? Amazingly, **Kruskal’s algorithm** can help us here. Remember that Kruskal’s algorithm considers the edges in increasing order and *unites* the nodes into components in that order. This means that, during the step where we process the edge (a,b,c), R(a,c) and R(b,c) are simply the sizes of the components currently containing a and b!

This gives us the following solution: Perform Kruskal’s algorithm, but for each connected component, also keep track of its size. Then, S is the sum of \mathrm{size}(a)\cdot \mathrm{size}(b)\cdot c for every step (a,b,c) of the Kruskal algorithm!

In pseudocode:

```
size[1..N] = [1,1,...,1]
parent[1..N] = [1,2,...,N]
# 'find' operation in 'union-find'
def find(n):
return parent[n] == n ? n : parent[n] = find(parent[n])
S = T = C = 0
for all edges (a,b,c) sorted according to c:
# find
a = find(a)
b = find(b)
# update values
S += size[a]*size[b]*c
T += size[a]*size[b]
C += c
# union
if size[a] < size[b]:
parent[a] = b
size[b] += size[a]
else:
parent[b] = a
size[a] += size[b]
print C - S/T # use exact division!
```

### Time Complexity:

O(N \log N)