DIFTRIP - Editorial

fushar · December 11, 2012, 6:21pm

PROBLEM LINKS

Practice

Contest

DIFFICULTY

HARD

PREREQUISITES

Tree Data Structure, Suffix Array

PROBLEM

There is a tree of N nodes numbered 1 through N. Let deg(i) be the number of nodes connected to node i. Count the number of different pairs (A, B) such that:

1 ≤ A, B ≤ N,
Node B will be visited during the shortest path traversal from node A to node 1.

Let (P, Q) and (R, S) be two pairs. Suppose that the shortest path traversal from node P to node Q visits g cities u₁, u₂, …, u_g (u₁ = P; u_g = Q) and the shortest path traversal from node R to node S visits h cities v₁, v₂, …, v_h (v₁ = R; v_h = S). Then, (P, Q) and (R, S) will be considered different if and only if g ≠ h or there is at least one index i such that deg(u_i) ≠ deg(v_i).

QUICK EXPLANATION

For each node, consider the sequence of degress of the nodes that are visited on the shortest path traversal to node 1. Consider the sequences as strings over alphabet {1, 2, …, N-1}. Then we will have N strings. The problem can be reduced to finding the number of distinct prefixes from these N string. This can be solved efficiently using suffix array data structure.

EXPLANATION

Consider each node. For each node i, let L_i be the number of nodes visited on the shortest path traversal from node i to node 1. Let u₁, u₂, …, u_{L_i} be the sequence of the visited nodes, where u₁ = i and u_{L_i} = 1. Let S_i = deg(u₁), deg(u₂), …, deg(u_{L_i}). Consider each S_i as a string over alphabet {1, 2, …, N-1}. Therefore, we will have N strings S₁, S₂, …, S_N.

Counting Prefixes

We can reduce the problem into finding the number of distinct non-empty prefixes from the N strings. This can be solved efficiently by first sorting the strings lexicographically. For example, suppose the input tree is a straight line of N=5 nodes, where node i and node i+1 are connected. Suppose the degrees of the nodes are deg(1) = 2, deg(2) = 5, deg(3) = 2, deg(4) = 5, deg(5) = 1. We will represent each string as sequence of digit characters without any spaces for convenience. Therefore,

S₁ = 2

S₂ = 52

S₃ = 252

S₄ = 5252

S₅ = 15252

Note that the above example is actually invalid as we cannot have a tree with such configuration. It is manually created so that the algorithm is easier to explain.

To solve the reduced problem, sort the strings lexicographically:

S₅ = 15252

S₁ = 2

S₃ = 252

S₂ = 52

S₄ = 5252

Now, we will start counting the number of different non-empty prefixes. We start with S₅. It has 5 prefixes: 1, 15, 152, 1525, and 15252.

We continue to S₁. It has only 1 prefix: 2.

We continue again to S₃. It has 3 prefixes. However, prefix 2 has been counted before. Therefore, we only find 2 new prefixes: 25 and 252.

We continue once more to S₂. It has 2 prefixes: 5 and 52. None of them has been found before.

Finally, we consider S₄. It has 4 prefixes. However, prefixes 5 and 52 have been counted before. Therefore, we only find 2 new prefixes: 525 and 5252.

Therefore, the number of different prefixes of this configuration is 5 + 1 + 3 + 2 + 4= 15.

We can conclude that the algorithm is as follows.

// S[] = {S₁, S₂, ..., S_N}
sort(S)
res = S[1].length()
for i = 2; i ≤ N; i++:
	res += S[i].length()
	res -= lcp(S[i], S[i-1])

where lcp(A, B) returns the length longest common prefix of A and B, i.e. the largest i such that the first i characters of A are equal to the first i characters of B.

Sorting Strings

The unexplained part is how to sort the strings. Each string can have at most O(N) characters, so a naive sorting would require O(N² lg N) time. We have to sort the strings with a method that is similar to the method used in suffix array. This method will use the fact that the nodes are arranged in a tree.

First, we sort the strings based on their first 2⁰ = 1 character:

1: S₅ = 15252

2: S₁ = 2

2: S₃ = 252

3: S₄ = 5252

3: S₂ = 52

Note that equal strings must have equal ranks; for example, since S₁ and S₃ have the same first character, their ranks are equal.

Then, we will sort the strings based on their first (at most) 2¹ = 2 characters. We can use the previous result to do this. Note that each string is actually a suffix of the string 15252. Let T_i be the suffix that starts from the i-th character. To compare two strings based on their first 2 characters, we can first compare the ranks of their first characters. If they are different, we have their comparison, else we compare the ranks of their second characters.

The rank of the second character of a string can be retrieved in constant time. Suppose we want to know the rank of the second character of S₄. Since S₄ = T₂, its second character is actually the first character of T₅ = S₃. Therefore, the rank of the second character of S₄ is the rank of S₃ in the previous sorting.

So, instead of sorting the strings we sort N pairs of two values: (the rank of the first character, the rank of the second character). If there is no second character, we can set the value to 0 so that it ranks higher before all characters. For the above strings, the pairs are:

((1, 3), 5)

((2, 0), 1)

((2, 3), 3)

((3, 2), 4)

((3, 2), 2)

Note that we augment the pair to store the index of each string as well. So actually we sort N pairs of (pair(first character’s rank, second character’s rank), index). After the second sorting, the order of the strings becomes:

1: S₅ = 15252

2: S₁ = 2

3: S₃ = 252

4: S₂ = 52

4: S₄ = 5252

Next, we will sort the strings based on their first (at most) 2² = 4 characters. For each string, we store a pair of (the rank of the first 2 characters, the rank of the next 2 characters) in a similar way. For example, the rank of the next 2 characters of S₅ is actually the rank of S₃ in the previous sorting.

1: S₅ = 15252

2: S₁ = 2

3: S₃ = 252

4: S₂ = 52

5: S₄ = 5252

Finally, sort the strings based on their first (at most) 2³ = 8 characters.

1: S₅ = 15252

2: S₁ = 2

3: S₃ = 252

4: S₂ = 52

5: S₄ = 5252

The complexity of one sorting reduces to only O(N lg N) since we need only O(1) time to compare two strings. There are O(lg N) steps, so this sorting part needs O(N lg² N) time.

Storing Strings

Another unexplained part is how to store the strings in the memory. Since there are at most O(N²) characters, we cannot store them in the memory. Fortunately, we don’t have to store all characters to implement the sorting method explained above. For each string, we only need to store the indices of the strings that are the suffixes starting with the 2^p-th character for all 1 ≤ 2^p < N. Since the nodes are arranged in a tree, the j-th character of S_i is the degree of the (j-1)-th parent of i.

We can use this recurrence to precompute the j-th parent of all nodes, where f[i][j] = the 2^p-th parent of node i. In short, the 2^p-th parent of node i is the 2^p-1-th parent of (the 2^p-1 parent of node i). We have to make the tree rooted at the node 1 and store the direct parent of each node in parent[] array.

reset all values in f to 0
for i = 1; i ≤ N; i++:
    f[i][0] = parent[i]
for j = 1; 2^j < N; j++:
    for i = 1; i ≤ N; i++:
        if f[i][j-1] ≠ 0:
            f[i][j] = f[f[i][j-1]][j-1]

Since there are only O(lg N) possible values for j, we only need O(N lg N) time and space for storing the strings.

Code

Here is a pseudocode of the sorting method as described above.

for i = 1; i ≤ N; i++:
    // we can safely set the ranks of the first character of each string
    // to be the degree of the corresponding node.
    rank[i][0] = deg(i)
    
for j = 1; j < N; j++:
    for i = 1; i ≤ N; i++:
        a = rank[i][j-1]
        if f[i][j-1] ≠ 0:
            b = rank[f[i][j-1]][j-1]
        else:
            b = 0    
        data[i] = ((a, b), i)
    
    sort(data[1], data[2], ..., data[N])
    
    rank[data[1].second] = 1
    for i = 2; i ≤ N; i++:
        // if the current pair is equal to the previous pair,
        // the rank should be equal
        if i > 1 and data[i].first == data[i-1].first:
            rank[data[i].second] = rank[data[i-1].second]
        else:
            rank[data[i].second] = rank[data[i-1].second] + 1
    
    for i = 1; i ≤ N; i++:
        pos[rank[i][lg N]] = i

The final order of the strings is stored in pos[] array.

Longest Common Prefix

Finally, the last unexplained part is how to compute the LCP of two strings in an efficient way. We can utilize the rank[][] table produced in the previous sorting algorithm. Suppose we want to compute the length of the LCP of S_x and S_y. We compute the largest p such that rank[x][p] == rank[y][p], i.e. the largest k such that the first 2^p characters of S_x are equal to the first 2^p characters of S_y. That means we have found a common prefix of length 2^p. The problem is then reduced to finding the length of the LCP of S_f[x][p], S_f[y][p] and can be computed in a similar way.

function lcp(S_x, S_y):
    res = 0
    for k = lg N; k ≥ 0; k--:
        if rank[x][k] == rank[y][k]:
            res += k
            x = f[x][k]
            y = f[y][k]
    return res

Complexity

The time complexity of the whole solution is O(N lg²N). The space complexity is O(N lg N).

SETTER’S SOLUTION

Can be found here.

TESTER’S SOLUTION

Can be found here.

bcurcio · December 11, 2012, 6:39pm

I tried using Ukkonen’s suffix tree, but it required to much memory, so I took a slightly different approach that tries to factorize prefixes. The idea comes from making an indeterministic automaton to a deterministic one.

First I convert the tree to a DAG and I create a fake ‘root’ node and connect it to all the nodes in the tree (this way, I can create substrings starting from any character).

Next I travel the DAG and for each node, if more than one child has the same letter, I create a new node with that letter and connect it to all the childs of the nodes with the same letter.

Once this is done, I have a DAG that each node, no two childs have the same letter. Since I don’t have common prefixes I can run a DP on DAG to get the number of distinct paths.

triplem · December 12, 2012, 1:54am

Virtually the same code from any accepted solution of http://www.codechef.com/APRIL12/problems/TSUBSTR will work here; that problem is identical (other than an extra part on top).

xcwgf666 · December 12, 2012, 2:40am

Yes, you are right. I have noticed that during the contest, I have pointed that out in my AC code. Till the end of the contest I’ve thought that there is some linear (or fast nlogn) solution that uses the fact that the “letter” written in the vertices corresponds to it’s number of neighbors.

mugurelionut · March 16, 2013, 4:27am

I know this is an old discussion, but I’ve just noticed now that the time complexity of the official solution is O(N * log^2(N)). That’s only because at each step of the suffix array construction, an N*log(N) sorting algorithm is used. Instead, an O(N) sorting algorithm can be used there (just like in the standard suffix array construction algorithm), to get the overall time complexity to O(N * log(N)) (actually, that’s what I did in my solution, but I guess the running time doesn’t really show it).

selfcompiler · June 28, 2013, 9:18am

rank[data[1].second] = 1 this part is not clear bcz rank is a 2D array how it can access the rank with only 1 index and other part is

enter code here
for j = 1; j < N; j++:
for i = 1; i ≤ N; i++:
    a = rank[i][j-1]
    if f[i][j-1] ≠ 0:
        b = rank[f[i][j-1]][j-1]
    else:
        b = 0    
    data[i] = ((a, b), i)

from the above pseudo code i think the rank array size is NN but it can be reduced with just only two columns to (N2) if i did this the how can i compute the LCP from the above pseudo code where the requirement of rank is N*[log2(N)+1]; can any explain it @admin and any other user who is intrested what is size of rank array i used and how i used

africamonkey · February 27, 2017, 8:12pm

Tester’s program got Time Limit Exceed in the following case:

100000
1 2
2 3
3 4
...
99998 99999
99999 100000