Centroid decomposition of a tree
Let’s reformulate the problem a little bit. For a given tree of N nodes, where each node v is colored to color c[v], the task is to answer Q queries. A single query consists of two integers v and c and asks for the distance from node v to the closest node with color c from v. Notice that colors corresponds to counties in the original statement of the problem.
Process queries off-line. Perform centroid decomposition. When a centroid is found for some part of the tree, update the answers for all queries from each node v in the current part of decomposition with the distance to the closest node with asked color lying in any of others subtrees of the current centroid.
In the first subtask both N and Q are not greater than 100, so one can try to solve each query in O(N^2) method, like for example iterating over all nodes with the color asked in the query and for each such node compute the distance from that node to the node given in the query by performing DFS on the whole tree.
In the second subtask both N and Q are not greater than 4000, so handling a single query in O(N) does the job. For a single query asking for the closest node of color c to node v, one can perform DFS from v and during traversing the tree keep track and update the distance to the closest node with color c.
In the third subtask both N and Q are up to 10^5, so any methods working in O(Q \cdot \sqrt N) should work fine as well as slower implementation of the intended approach for the fourth subtask described below. Using Sqrt-optimisation should work perfectly fine there as well as maybe methods based on Heavy-light decomposition. The problem with these constraints is very similar to problem Xenia and Tree from Codeforces (in our problem we don’t have to handle update operations). The discussion as well as the short editorial to the mentioned problem is also available on Codefores here.
In the last subtask both N and Q can be as big as 5 \cdot 10^5 and the intention was to not allow any methods suggested for the third subtask to pass. We are going to process queries offline and general idea here is very simple and based on the following fact:
Let’s take a look at a single query asking for the closest node of color c to the node v and let u be this closest node. Let also r be a root of the tree (for now chosen arbitrary). Moreover let’s also assume that we computed the closest node of each color to r using DFS and distances to all nodes from r. Then there are two cases to consider: either v and u are in the same subtree of r or not. If they are in different subtrees then the distance between them can be computed as the distance from r to v plus the distance to the closest node of color c from r (notice that u has to be that node). Otherwise, if v and u are in the same subtree of r, then the answer for this query can be computed recursively by using the same approach to this subtree isolated from the rest of the tree.
However, this approach has a running time of O(N^2), because we can have a bad luck when r is chosen not carefully, which may results in reducing the size of subproblems to solve just by a constant. If only r can be chosen in such a way that none of its subtrees has more than half of the nodes that the original tree has, then each subtree will generate a subproblem of size at most half the size of the original problem, so at each level of recursion we will be reducing the size of the initial problems by at least factor of 2, which leads to O(N \cdot \log N) time complexity. The only remaining thing to do is to assure that r is chosen in such a good way and this can be done in linear time in terms of the number of nodes in the tree using technique called Centroid decomposition of a tree. The method is well described there in this presentation as well as in this GeeksforGeeks article based on the presentation.