### PROBLEM LINK:

**Author:** Misha Chorniy

**Tester:** Kevin Atienza

**Translators:** Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Medium-Hard

### PREREQUISITES:

Dynamic programming, matrix exponentiation, Kirchhoff’s matrix-tree theorem, Chinese remainder theorem, Garner’s algorithm, big integers

### PROBLEM:

There are N stars, and M bidirectional space roads in space that connect pairs of stars together. Travelling through one space road takes 1 day.

Chef has spaceships. Each spaceship is designated two stars, and a fixed route between these stars. A **route** is a path that is traversable within at most K days. Roads may be traversed any number times. Also, there is a spaceship for every possible route between stars.

The Empire deployed a weapon capable of destroying **exactly** X spaceships that travel between the same pair of stars. The Empire used the weapon as many times as possible, so after some time, there aren’t enough groups of spaceships any more to destroy.

Chef decided to retire most of the surviving spaceships. Chef wants to retire as many spaceships as possible, but Chef wants to do this such that there is still a way to get from any star to another using spaceships. How many ways are there for Chef to retire spaceships so that this is satisfied?

### QUICK EXPLANATION:

The answer is the number of spanning trees of a related (multi)graph, obtained by the following construction:

- Construct a graph with the same set of N nodes.
- For each pair of nodes (i,j), add P(i,j) \bmod X edges, where P(i,j) is the number of paths of length at most K between i and j.

To compute P(i,j): Let G be the adjacency matrix of the input graph. Compute the matrix M = G^K + G^{K-1} + \ldots + G^2 + G^1 + G^0. Then P(i,j) = M_{i,j}. Note that M can be computed in O(\log K) time using a version of “fast exponentiation”.

Finally, to compute the number of spanning trees of a graph, you can use Kirchhoff’s matrix-tree theorem. It says that the number of spanning trees can be obtained as the determinant of a specific matrix related to the graph.

Unfortunately, the number of spanning trees can be very large, and may not fit a 64-bit variable. One idea is to use big ints and fractions to compute the determinant, but this is slow. Instead, a better solution is to compute the determinant modulo ~150 primes around 10^9, and reconstructing the full answer using the Chinese remainder theorem. (Garner’s algorithm can be used for this.)

### EXPLANATION:

# Number of spaceships

The first step for our solution is to compute the number of spaceships between each pair of stars (i,j). This is equal to **the number of paths between i and j with length at most K.** Let’s denote this number by P(i,j). We also denote by P_k(i,j) the number of paths between i and j with length **exactly k**. Thus, we have the relationship:

Now, what is P_k(i,j)? We can use **dynamic programming** to figure this one. Consider a path of length k from i to j. Let t be the second-to-last node visited. Then the path consists of two parts:

- The path of length k-1 from i to t.
- The path of length 1 from t to j. In other words, t and j must be adjacent.

Thus, the number of paths of length k from i to j *whose second-to-last node is t* is equal to P_{k-1}(i,t)! Summing across all possible values of t, we get:

For the base case, we can have k = 0. In this case, the path is of length 0, so P_0(i,j) = 1 if i = j, and P_0(i,j) = 0 if i \not= j.

Thus, we now have a dynamic programming way to compute P_k(i,j) for every k! Unfortunately, K is very large, so this solution isn’t feasible.

We can restate the recurrence above in a different way. Let G be the N\times N adjacency matrix of the graph, i.e., G(i,j) = 1 if i and j are adjacent, and G(i,j) = 0 otherwise. Then the recurrence above is equivalent to:

However, if we view each P_k as an N\times N matrix itself, then this is just the equation of **matrix multiplication**! In other words, the above is equivalent to

where \cdot denotes matrix multiplication. Also, based on the above, P_0 must be the N\times N identity matrix (denoted I_N). Thus, we have:

- P_0 = I_N
- P_1 = P_0 \cdot G = G
- P_2 = P_1 \cdot G = G\cdot G = G^2
- P_3 = P_2 \cdot G = G^2\cdot G = G^3
- P_4 = P_3 \cdot G = G^3\cdot G = G^4
- P_5 = P_4 \cdot G = G^4\cdot G = G^5
- etc.

More generally, we have P_k = G^k, where G^k denotes the matrix G raised to the power k! This is certainly an elegant formulation of the above, but more importantly, it gives us a faster way of computing P(i,j). Notice that if we also view P as an N\times N matrix, then the relationship above is equivalent to

This sum looks very much like a **geometric series**, so it seems possible that a fast formula for it exists. Unfortunately, the standard formula 1 + x + x^2 + \ldots + x^k = \frac{x^{k+1} - 1}{x - 1} doesn’t work, because we will have to define how to divide matrices, and even if it is possible, G - I_N might not be invertible. This sucks because with such a formula, we could possibly compute the answer using only O(\log k) multiplications with fast exponentiation. Thankfully, there is another way to compute this series using O(\log k) multiplications *without doing any division*.

Let’s define F(k) = G^0 + G^1 + G^2 + \ldots + G^{k-1}. Thus, P = F(K+1), so we can focus on computing F(k). Recursively, this can be computed as:

But the more important insight is the following relationship between F(2l) and F(l):

This relation gives us the following “fast exponentiation” analog algorithm to compute F(k):

- If k = 0, return O_N. (the zero matrix)
- If k is odd, compute F(k) = I_N + G\cdot F(k-1).
- If k is even, compute F(k) = (I_N + G^{k/2})\cdot F(k/2).

Unfortunately, this doesn’t quite run in O(\log k) multiplications because we will need to compute G^{k/2} every time. Computing G^{k/2} using normal fast exponentiation every time will give us O(\log^2 k) multiplications. But it can actually be reduced to just O(\log k) by **computing G^k alongside F(k)**. More specifically, instead of computing just F(k), we compute **F(k) and G^k**:

- If k = 0, then F(k) = O_N and G^k = I_N.
- If k is odd, first compute F(k-1) and G^{k-1}. Then F(k) = I_N + G\cdot F(k-1) and G^k = G\cdot G^{k-1}.
- If k is even, first compute F(k/2) and G^{k/2}. Then F(k) = (I_N + G^{k/2})\cdot F(k/2) and G^k = G^{k/2}\cdot G^{k/2}.

This now uses just O(\log k) multiplications!

Since naïve matrix multiplication runs in O(N^3) time, the overall running time of computing P := F(K+1) is then O(N^3 \log K).

# Number of ways to retire spaceships

After building the spaceships, the Empire’s weapon now destroys most of them. Specifically, every group of X spaceships travelling between the same pair of stars will be destroyed. This simply means you should reduce each P(i,j) **modulo X**!

At this point, Chef now wants to retire as many spaceships as possible such that it is still possible to get from any star to any other star.

Given a multigraph, what is the maximum number of edges you can remove so that it is still connected? We can rephrase this as: Given a multigraph, what is the minimum number of edges you can keep so that the graph is connected? The latter version is much easier to answer: The answer is simply N-1, and the remaining edges must form a **tree**, actually a **spanning tree**. Thus, the answer to this problems is the number of spanning trees of the multigraph where there are P(i,j) \bmod X edges between every pair of nodes (i,j)!

Counting spanning trees is multigraphs is a standard problem which can be solved by the well-known **Kirchhoff’s matrix-tree theorem**. It says that the number of spanning trees can be obtained as a determinant of a specific matrix:

- Construct a new N\times N matrix L, called the
**Laplacian matrix**, as follows: L_{i,i} is the degree of node i, and if there are k edges between i and j, then L_{i,j} = -k. - Remove the last row and column of the matrix, leaving an (N-1)\times (N-1) matrix.
- The number of spanning trees is the determinant of this matrix.

This is quite elegant in my opinion, and its proof is actually accessible/understandablem. A proof outline is given in its Wikipedia page.

Converting this into an algorithm is quite straightforward. L itself is quite easy to construct. The real challenge lies in computing the determinant. One can usually do this in O(N^3) with **Gaussian elimination**. However, we soon encounter a few problems when doing Gaussian elimination:

- The answer can be very large. Indeed, It can be proven that the maximum possible answer is (X-1)^{N-1}\cdot N^{N-2}. (Note that N^{N-2} is the number of spanning trees in a complete graph of N nodes.) It means we will need to represent the coefficients using
**big ints**. - What’s more, the coefficients are not guaranteed to be integers! This means we may have to represent the coefficients as fractions of big ints. Normal Gaussian elimination avoids this by representing the coefficients as floating-point numbers, but in this problem, we need the exact answer, so we can’t do that.

This means that the running time is not just O(N^3), but O(N^5) (assuming N and X fits in a machine word), because our intermediate numbers will have O(N) digits, and we will need to multiply them in O(N^2) time. This might be very slow for the final subtask. I tried implementing it but I didn’t get it accepted no matter how hard I tried. Maybe you were able to get this method accepted, but instead we will show you a faster way.

# Computing the big answer

Note that the problem will be much easier if the problem only asked for the answer **modulo some prime p**, say the usual 10^9 + 7. We can perform Gaussian elimination in the world "modulo p" just fine, and what we get will be the determinant of the matrix modulo p. This works because division by a nonzero number modulo p is possible and well-defined. This is not true when the modulus is not prime. Another advantage of only asking the answer modulo 10^9 + 7, is that we don’t need bigints to compute the answer!

Now, the problem asks for the full answer. But suppose we know that the answer is less than 10^9. Note that Gaussian multiplication doesn’t guarantee that intermediate values are smaller than the final answer, so you’ll probably still need big ints and fractions to do it. But instead, we can take advantage of the fact that if x < 10^9, then x \bmod (10^9 + 7) = x, so we can work modulo 10^9 + 7 instead! This saves us the need to use big ints and fractions.

Note that we can adapt this method as long as we can put an upper bound to the answer. Specifically, suppose we know that the answer is less than some upper bound, U. Then we can select any modulus M > U (not necessarily prime) and just work modulo M, and the answer will still be correct! Unfortunately, in the current problem, the upper bound can be very, very large. Our best upper bound so far is 7777776^{99}\cdot 100^{98}, and it has 879 digits. Sure, we can find a prime number with more than 879 digits and work modulo that prime, but this is still slow, because multiplying numbers that large is slow! (And actually, the running time is still O(N^5).)

The solution is to instead work modulo *many small primes p_1, p_2, \ldots, p_m* and then combine the results using **Chinese remainder theorem**. Specifically,

- Select a bunch of primes p_1, p_2, p_3, \ldots, p_m such that the product p_1\cdot p_2\cdot p_3\cdots p_m has more than 879 digits. Ideally, each p_i must be around 10^9 so we can work modulo p_i without needing big ints.
- Compute the determinant of the matrix modulo each p_i.
- Use the
**Chinese remainder theorem**to compute the determinant modulo p_1\cdot p_2\cdot p_3\cdots p_m. Since this modulus is greater than our upper bound, the result is now the full determinant!

Well, this works, but we still need to outline how to do that last step, which will inevitably require bigints. We will describe one way to do it in O(m^2) time later on.

Now, how fast does this run? Note that we’re not multiplying big numbers anymore, except in the last step, so each determinant is computed in actual O(N^3) time. Thus, the running time is O(mN^3 + m^2). But since the final result has O(N) digits, we can select the primes such that m = O(N). Thus, the whole running time is O(N^4), which is at least an order of magnitude faster than O(N^5)!

All that remains is to describe that last step.

# Chinese remainder theorem

We have m congruence relations, where the i th relation is

where the $p_i$s are distinct primes. What we want is to compute x modulo p_1\cdot p_2\cdot p_3\cdots p_m. The Chinese remainder theorem guarantees the uniqueness of this value.

The basic version of the Chinese remainder theorem involves only two congruences. Suppose we want to find x modulo m_1 and m_2, given that

and

and m_1 and m_2 are coprime. Then the theorem says that:

where n_2 is the inverse of m_2 modulo m_1, and n_1 is the inverse of m_1 modulo m_2. For more than two congruences, repeatedly apply this version.

But since we’re talking about lots of primes p_i, this means that if we apply the theorem naïvely we will need to compute inverses of numbers modulo some big integers, and that sounds quite difficult. Fortunately, there’s another way to combine congruences without this inverse computation. This is called **Garner’s algorithm**, and we’ll show how it works for the case m = 4, i.e. there are four congruence relations, and we want to compute x modulo p_1p_2p_3p_4.

Notice that we can write x modulo p_1p_2p_3p_4 uniquely in the form

where 0 \le x_i < p_i for every i. Thus, our goal now is to compute these $x_i$s. We can compute them one by one:

- Reducing the congruence modulo p_1, we find that r_1 \equiv x_1 \pmod{p_1}, so we now have x_1.
- Reducing the congruence modulo p_2, we find that r_2 \equiv x_1 + p_1x_2 \pmod{p_2}, or x_2 \equiv (r_2 - x_1)p_1^{-1} \pmod{p_2}, so we now have x_2.
- Reducing the congruence modulo p_3, we find that r_3 \equiv x_1 + p_1x_2 + p_1p_2x_3 \pmod{p_3}, or x_3 \equiv (r_3 - x_1 - p_1x_2)(p_1p_2)^{-1} \pmod{p_3}, so we now have x_3.
- Reducing the congruence modulo p_4, we find that r_4 \equiv x_1 + p_1x_2 + p_1p_2x_3 + p_1p_2p_3x_4 \pmod{p_4}, or x_4 \equiv (r_4 - x_1 - p_1x_2 - p_1p_2x_3)(p_1p_2p_3)^{-1} \pmod{p_4}, so we now have x_4.

Notice that nowhere in this solution did we need to use big ints whatsoever! We will only need big ints to compute x itself from

but here, the most complex operation we need is just multiplication! We don’t even need to implement full-fledged big-int multiplication, because each multiplicand is around 10^9, so we just need to implement multiplication by a “small” number! This is the main advantage of Garner’s algorithm.

The running time of this algorithm is O(m^2), as long as we write the Garner relation as

### Time Complexity:

O(N^3 \log K + N^4)