# PROBLEM LINKS

## DIFFICULTY

HARD

## PREREQUISITES

Tree Data Structure, Suffix Array

## PROBLEM

There is a tree of N nodes numbered 1 through N. Let deg(i) be the number of nodes connected to node i. Count the number of **different** pairs (A, B) such that:

- 1 ≤ A, B ≤ N,
- Node B will be visited during the shortest path traversal from node A to node 1.

Let (P, Q) and (R, S) be two pairs. Suppose that the shortest path traversal from node P to node Q visits g cities u_{1}, u_{2}, …, u_{g} (u_{1} = P; u_{g} = Q) and the shortest path traversal from node R to node S visits h cities v_{1}, v_{2}, …, v_{h} (v_{1} = R; v_{h} = S). Then, (P, Q) and (R, S) will be considered **different** if and only if g ≠ h or there is at least one index i such that deg(u_{i}) ≠ deg(v_{i}).

## QUICK EXPLANATION

For each node, consider the sequence of degress of the nodes that are visited on the shortest path traversal to node 1. Consider the sequences as strings over alphabet {1, 2, …, N-1}. Then we will have N strings. The problem can be reduced to finding the number of distinct prefixes from these N string. This can be solved efficiently using suffix array data structure.

## EXPLANATION

Consider each node. For each node i, let L_{i} be the number of nodes visited on the shortest path traversal from node i to node 1. Let u_{1}, u_{2}, …, u_{Li} be the sequence of the visited nodes, where u_{1} = i and u_{Li} = 1. Let S_{i} = deg(u_{1}), deg(u_{2}), …, deg(u_{Li}). Consider each S_{i} as a string over alphabet {1, 2, …, N-1}. Therefore, we will have N strings S_{1}, S_{2}, …, S_{N}.

### Counting Prefixes

We can reduce the problem into finding the number of distinct non-empty prefixes from the N strings. This can be solved efficiently by first sorting the strings lexicographically. For example, suppose the input tree is a straight line of N=5 nodes, where node i and node i+1 are connected. Suppose the degrees of the nodes are deg(1) = 2, deg(2) = 5, deg(3) = 2, deg(4) = 5, deg(5) = 1. We will represent each string as sequence of digit characters without any spaces for convenience. Therefore,

S_{1} = 2

S_{2} = 52

S_{3} = 252

S_{4} = 5252

S_{5} = 15252

Note that the above example is actually **invalid** as we cannot have a tree with such configuration. It is manually created so that the algorithm is easier to explain.

To solve the reduced problem, sort the strings lexicographically:

S_{5} = 15252

S_{1} = 2

S_{3} = 252

S_{2} = 52

S_{4} = 5252

Now, we will start counting the number of different non-empty prefixes. We start with S_{5}. It has 5 prefixes: 1, 15, 152, 1525, and 15252.

We continue to S_{1}. It has only 1 prefix: 2.

We continue again to S_{3}. It has 3 prefixes. However, prefix 2 has been counted before. Therefore, we only find 2 new prefixes: 25 and 252.

We continue once more to S_{2}. It has 2 prefixes: 5 and 52. None of them has been found before.

Finally, we consider S_{4}. It has 4 prefixes. However, prefixes 5 and 52 have been counted before. Therefore, we only find 2 new prefixes: 525 and 5252.

Therefore, the number of different prefixes of this configuration is 5 + 1 + 3 + 2 + 4= 15.

We can conclude that the algorithm is as follows.

// S[] = {S_{1}, S_{2}, ..., S_{N}} sort(S) res = S[1].length() for i = 2; i ≤ N; i++: res += S[i].length() res -= lcp(S[i], S[i-1])

where lcp(A, B) returns the length longest common prefix of A and B, i.e. the largest i such that the first i characters of A are equal to the first i characters of B.

### Sorting Strings

The unexplained part is how to sort the strings. Each string can have at most O(N) characters, so a naive sorting would require O(N^{2} lg N) time. We have to sort the strings with a method that is similar to the method used in suffix array. This method will use the fact that the nodes are arranged in a tree.

First, we sort the strings based on their first 2^{0} = 1 character:

1: S_{5} = **1**5252

2: S_{1} = **2**

2: S_{3} = **2**52

3: S_{4} = **5**252

3: S_{2} = **5**2

Note that equal strings must have equal ranks; for example, since S_{1} and S_{3} have the same first character, their ranks are equal.

Then, we will sort the strings based on their first (at most) 2^{1} = 2 characters. We can use the previous result to do this. Note that each string is actually a **suffix** of the string 15252. Let T_{i} be the suffix that starts from the i-th character. To compare two strings based on their first 2 characters, we can first compare the ranks of their first characters. If they are different, we have their comparison, else we compare the ranks of their second characters.

The rank of the second character of a string can be retrieved in constant time. Suppose we want to know the rank of the second character of S_{4}. Since S_{4} = T_{2}, its second character is actually the first character of T_{5} = S_{3}. Therefore, the rank of the second character of S_{4} is the rank of S_{3} in the previous sorting.

So, instead of sorting the strings we sort N pairs of two values: (the rank of the first character, the rank of the second character). If there is no second character, we can set the value to 0 so that it ranks higher before all characters. For the above strings, the pairs are:

((1, 3), 5)

((2, 0), 1)

((2, 3), 3)

((3, 2), 4)

((3, 2), 2)

Note that we augment the pair to store the index of each string as well. So actually we sort N pairs of (pair(first character’s rank, second character’s rank), index). After the second sorting, the order of the strings becomes:

1: S_{5} = **15**252

2: S_{1} = **2**

3: S_{3} = **25**2

4: S_{2} = **52**

4: S_{4} = **52**52

Next, we will sort the strings based on their first (at most) 2^{2} = 4 characters. For each string, we store a pair of (the rank of the first 2 characters, the rank of the next 2 characters) in a similar way. For example, the rank of the next 2 characters of S_{5} is actually the rank of S_{3} in the previous sorting.

1: S_{5} = **1525**2

2: S_{1} = **2**

3: S_{3} = **252**

4: S_{2} = **52**

5: S_{4} = **5252**

Finally, sort the strings based on their first (at most) 2^{3} = 8 characters.

1: S_{5} = **15252**

2: S_{1} = **2**

3: S_{3} = **252**

4: S_{2} = **52**

5: S_{4} = **5252**

The complexity of one sorting reduces to only O(N lg N) since we need only O(1) time to compare two strings. There are O(lg N) steps, so this sorting part needs O(N lg^{2} N) time.

### Storing Strings

Another unexplained part is how to store the strings in the memory. Since there are at most O(N^{2}) characters, we cannot store them in the memory. Fortunately, we don’t have to store all characters to implement the sorting method explained above. For each string, we only need to store the indices of the strings that are the suffixes starting with the 2^{p}-th character for all 1 ≤ 2^{p} < N. Since the nodes are arranged in a tree, the j-th character of S_{i} is the degree of the (j-1)-th parent of i.

We can use this recurrence to precompute the j-th parent of all nodes, where f[i][j] = the 2^{p}-th parent of node i. In short, the 2^{p}-th parent of node i is the 2^{p-1}-th parent of (the 2^{p-1} parent of node i). We have to make the tree rooted at the node 1 and store the direct parent of each node in parent[] array.

reset all values in f to 0 for i = 1; i ≤ N; i++: f[i][0] = parent[i] for j = 1; 2^{j}< N; j++: for i = 1; i ≤ N; i++: if f[i][j-1] ≠ 0: f[i][j] = f[f[i][j-1]][j-1]

Since there are only O(lg N) possible values for j, we only need O(N lg N) time and space for storing the strings.

### Code

Here is a pseudocode of the sorting method as described above.

for i = 1; i ≤ N; i++: // we can safely set the ranks of the first character of each string // to be the degree of the corresponding node. rank[i][0] = deg(i) for j = 1; j < N; j++: for i = 1; i ≤ N; i++: a = rank[i][j-1] if f[i][j-1] ≠ 0: b = rank[f[i][j-1]][j-1] else: b = 0 data[i] = ((a, b), i) sort(data[1], data[2], ..., data[N]) rank[data[1].second] = 1 for i = 2; i ≤ N; i++: // if the current pair is equal to the previous pair, // the rank should be equal if i > 1 and data[i].first == data[i-1].first: rank[data[i].second] = rank[data[i-1].second] else: rank[data[i].second] = rank[data[i-1].second] + 1 for i = 1; i ≤ N; i++: pos[rank[i][lg N]] = i

The final order of the strings is stored in pos[] array.

### Longest Common Prefix

Finally, the last unexplained part is how to compute the LCP of two strings in an efficient way. We can utilize the rank[][] table produced in the previous sorting algorithm. Suppose we want to compute the length of the LCP of S_{x} and S_{y}. We compute the largest p such that rank[x][p] == rank[y][p], i.e. the largest k such that the first 2^{p} characters of S_{x} are equal to the first 2^{p} characters of S_{y}. That means we have found a common prefix of length 2^{p}. The problem is then reduced to finding the length of the LCP of S_{f[x][p]}, S_{f[y][p]} and can be computed in a similar way.

function lcp(S_{x}, S_{y}): res = 0 for k = lg N; k ≥ 0; k--: if rank[x][k] == rank[y][k]: res += k x = f[x][k] y = f[y][k] return res

### Complexity

The time complexity of the whole solution is O(N lg^{2}N). The space complexity is O(N lg N).

## SETTER’S SOLUTION

Can be found here.

## TESTER’S SOLUTION

Can be found here.