PROBLEM LINKS
DIFFICULTY
HARD
PREREQUISITES
Tree Data Structure, Suffix Array
PROBLEM
There is a tree of N nodes numbered 1 through N. Let deg(i) be the number of nodes connected to node i. Count the number of different pairs (A, B) such that:
- 1 ≤ A, B ≤ N,
- Node B will be visited during the shortest path traversal from node A to node 1.
Let (P, Q) and (R, S) be two pairs. Suppose that the shortest path traversal from node P to node Q visits g cities u1, u2, …, ug (u1 = P; ug = Q) and the shortest path traversal from node R to node S visits h cities v1, v2, …, vh (v1 = R; vh = S). Then, (P, Q) and (R, S) will be considered different if and only if g ≠ h or there is at least one index i such that deg(ui) ≠ deg(vi).
QUICK EXPLANATION
For each node, consider the sequence of degress of the nodes that are visited on the shortest path traversal to node 1. Consider the sequences as strings over alphabet {1, 2, …, N-1}. Then we will have N strings. The problem can be reduced to finding the number of distinct prefixes from these N string. This can be solved efficiently using suffix array data structure.
EXPLANATION
Consider each node. For each node i, let Li be the number of nodes visited on the shortest path traversal from node i to node 1. Let u1, u2, …, uLi be the sequence of the visited nodes, where u1 = i and uLi = 1. Let Si = deg(u1), deg(u2), …, deg(uLi). Consider each Si as a string over alphabet {1, 2, …, N-1}. Therefore, we will have N strings S1, S2, …, SN.
Counting Prefixes
We can reduce the problem into finding the number of distinct non-empty prefixes from the N strings. This can be solved efficiently by first sorting the strings lexicographically. For example, suppose the input tree is a straight line of N=5 nodes, where node i and node i+1 are connected. Suppose the degrees of the nodes are deg(1) = 2, deg(2) = 5, deg(3) = 2, deg(4) = 5, deg(5) = 1. We will represent each string as sequence of digit characters without any spaces for convenience. Therefore,
S1 = 2
S2 = 52
S3 = 252
S4 = 5252
S5 = 15252
Note that the above example is actually invalid as we cannot have a tree with such configuration. It is manually created so that the algorithm is easier to explain.
To solve the reduced problem, sort the strings lexicographically:
S5 = 15252
S1 = 2
S3 = 252
S2 = 52
S4 = 5252
Now, we will start counting the number of different non-empty prefixes. We start with S5. It has 5 prefixes: 1, 15, 152, 1525, and 15252.
We continue to S1. It has only 1 prefix: 2.
We continue again to S3. It has 3 prefixes. However, prefix 2 has been counted before. Therefore, we only find 2 new prefixes: 25 and 252.
We continue once more to S2. It has 2 prefixes: 5 and 52. None of them has been found before.
Finally, we consider S4. It has 4 prefixes. However, prefixes 5 and 52 have been counted before. Therefore, we only find 2 new prefixes: 525 and 5252.
Therefore, the number of different prefixes of this configuration is 5 + 1 + 3 + 2 + 4= 15.
We can conclude that the algorithm is as follows.
// S[] = {S1, S2, ..., SN} sort(S) res = S[1].length() for i = 2; i ≤ N; i++: res += S[i].length() res -= lcp(S[i], S[i-1])
where lcp(A, B) returns the length longest common prefix of A and B, i.e. the largest i such that the first i characters of A are equal to the first i characters of B.
Sorting Strings
The unexplained part is how to sort the strings. Each string can have at most O(N) characters, so a naive sorting would require O(N2 lg N) time. We have to sort the strings with a method that is similar to the method used in suffix array. This method will use the fact that the nodes are arranged in a tree.
First, we sort the strings based on their first 20 = 1 character:
1: S5 = 15252
2: S1 = 2
2: S3 = 252
3: S4 = 5252
3: S2 = 52
Note that equal strings must have equal ranks; for example, since S1 and S3 have the same first character, their ranks are equal.
Then, we will sort the strings based on their first (at most) 21 = 2 characters. We can use the previous result to do this. Note that each string is actually a suffix of the string 15252. Let Ti be the suffix that starts from the i-th character. To compare two strings based on their first 2 characters, we can first compare the ranks of their first characters. If they are different, we have their comparison, else we compare the ranks of their second characters.
The rank of the second character of a string can be retrieved in constant time. Suppose we want to know the rank of the second character of S4. Since S4 = T2, its second character is actually the first character of T5 = S3. Therefore, the rank of the second character of S4 is the rank of S3 in the previous sorting.
So, instead of sorting the strings we sort N pairs of two values: (the rank of the first character, the rank of the second character). If there is no second character, we can set the value to 0 so that it ranks higher before all characters. For the above strings, the pairs are:
((1, 3), 5)
((2, 0), 1)
((2, 3), 3)
((3, 2), 4)
((3, 2), 2)
Note that we augment the pair to store the index of each string as well. So actually we sort N pairs of (pair(first character’s rank, second character’s rank), index). After the second sorting, the order of the strings becomes:
1: S5 = 15252
2: S1 = 2
3: S3 = 252
4: S2 = 52
4: S4 = 5252
Next, we will sort the strings based on their first (at most) 22 = 4 characters. For each string, we store a pair of (the rank of the first 2 characters, the rank of the next 2 characters) in a similar way. For example, the rank of the next 2 characters of S5 is actually the rank of S3 in the previous sorting.
1: S5 = 15252
2: S1 = 2
3: S3 = 252
4: S2 = 52
5: S4 = 5252
Finally, sort the strings based on their first (at most) 23 = 8 characters.
1: S5 = 15252
2: S1 = 2
3: S3 = 252
4: S2 = 52
5: S4 = 5252
The complexity of one sorting reduces to only O(N lg N) since we need only O(1) time to compare two strings. There are O(lg N) steps, so this sorting part needs O(N lg2 N) time.
Storing Strings
Another unexplained part is how to store the strings in the memory. Since there are at most O(N2) characters, we cannot store them in the memory. Fortunately, we don’t have to store all characters to implement the sorting method explained above. For each string, we only need to store the indices of the strings that are the suffixes starting with the 2p-th character for all 1 ≤ 2p < N. Since the nodes are arranged in a tree, the j-th character of Si is the degree of the (j-1)-th parent of i.
We can use this recurrence to precompute the j-th parent of all nodes, where f[i][j] = the 2p-th parent of node i. In short, the 2p-th parent of node i is the 2p-1-th parent of (the 2p-1 parent of node i). We have to make the tree rooted at the node 1 and store the direct parent of each node in parent[] array.
reset all values in f to 0 for i = 1; i ≤ N; i++: f[i][0] = parent[i] for j = 1; 2j < N; j++: for i = 1; i ≤ N; i++: if f[i][j-1] ≠ 0: f[i][j] = f[f[i][j-1]][j-1]
Since there are only O(lg N) possible values for j, we only need O(N lg N) time and space for storing the strings.
Code
Here is a pseudocode of the sorting method as described above.
for i = 1; i ≤ N; i++: // we can safely set the ranks of the first character of each string // to be the degree of the corresponding node. rank[i][0] = deg(i) for j = 1; j < N; j++: for i = 1; i ≤ N; i++: a = rank[i][j-1] if f[i][j-1] ≠ 0: b = rank[f[i][j-1]][j-1] else: b = 0 data[i] = ((a, b), i) sort(data[1], data[2], ..., data[N]) rank[data[1].second] = 1 for i = 2; i ≤ N; i++: // if the current pair is equal to the previous pair, // the rank should be equal if i > 1 and data[i].first == data[i-1].first: rank[data[i].second] = rank[data[i-1].second] else: rank[data[i].second] = rank[data[i-1].second] + 1 for i = 1; i ≤ N; i++: pos[rank[i][lg N]] = i
The final order of the strings is stored in pos[] array.
Longest Common Prefix
Finally, the last unexplained part is how to compute the LCP of two strings in an efficient way. We can utilize the rank[][] table produced in the previous sorting algorithm. Suppose we want to compute the length of the LCP of Sx and Sy. We compute the largest p such that rank[x][p] == rank[y][p], i.e. the largest k such that the first 2p characters of Sx are equal to the first 2p characters of Sy. That means we have found a common prefix of length 2p. The problem is then reduced to finding the length of the LCP of Sf[x][p], Sf[y][p] and can be computed in a similar way.
function lcp(Sx, Sy): res = 0 for k = lg N; k ≥ 0; k--: if rank[x][k] == rank[y][k]: res += k x = f[x][k] y = f[y][k] return res
Complexity
The time complexity of the whole solution is O(N lg2N). The space complexity is O(N lg N).
SETTER’S SOLUTION
Can be found here.
TESTER’S SOLUTION
Can be found here.