It seems to be a standard problem but I am not able to think of any optimize solution neither could I find using search.

The problem is such that there are N vertices of a tree given all isolated in the beginning. An edge can be introduced any time between any two vertices. Gradually many individual trees or forest is created. Any time between the introduction of edges the distance between any two nodes is asked. How to solve each distance query optimally if the number of distance queries and edge introduction queries are large.

Given a tree of N vertices you can obtain the distance between 2 nodes in \mathcal{O}(\log N) by computing their LCA using preprocessing of their $2^i$th ancestors and depth.

Let each vertex store the root of its own tree. When an edge is added between two trees, traverse the smaller tree from its root and for each vertex in this tree set its root to the larger tree’s root and also calculate its new $2^i$th ancestors and depth. By using the small-to-large trick you can be sure that no vertex will face this recalculation more than \mathcal{O}(\log N) times.

Assuming both queries are in the order of \mathcal{O}(N), overall complexity is \mathcal{O}(N\log^2 N).

Thanks meooow. I thought of similar lines but missing that small-to-large trick. I’ll try to implement this.

I would like to attempt it too, can you share the problem link?