I think, here the only possible solution can be using Heavy Light Decomposition and segment trees.

Is there any other way to do it?

Yes, Mo’s algorithm on Trees + MO’s with Updates.

A recent problem in hackerearth came on this trick. You can see my solution here, which has the link to the problem and the comments in the code as well.

We can’t use a divide and conquer technique to solve the problem more efficiently than the naive method due to the nature of queries(number of distinct numbers) hence segment trees and HLD can’t be used . As @likecs said we need to solve this problem in offline using tricks such as Mo’s algorithm . But in order to make Mo’s algorithm work for a tree we need a way to represent the tree as an array . This is where Euler tour for trees comes to help . You can read more about how euler tour of trees works with Mo’s algorithm [here][1] . But in the blog the author discusses Mo’s algorithm without updates . But there exists a way to use Mo’s algorithm for updates also . You can read more about it [here][2] .

My Solution :

```
[3]
[1]: http://codeforces.com/blog/entry/43230
[2]: http://codeforces.com/blog/entry/44711
[3]: https://www.codechef.com/viewsolution/12879967
```

Converting tree to array is not called Divide and Conquer. It is just called linearising the tree. Divide and Conquer on tree refers to Centroid Decomposition.

Any good references/blogs to learn MO’s algorithm on trees + MO’s with Updates?

I didn’t say that converting a tree to an array is divide and conquer. By Divide and conquer , I meant trying to solve the problem using segment tree or HLD

I did solve it by HLD and segment tree and storing bitset on each node of the segment tree. However , i don’t think this was the intended solution as it was really hard to fit in the given TL. Complexity is roughly O(Q* N / 64).

More Explaination :

First of all there will be O(N) distinct numbers at max in array+queries. Let there be X distinct numbers total.We will compress all numbers and reduce to range 1-X. We will store on a node of the segment tree(of a chain) a bitset, i.e a compressed bool array of length X where i th bit is 1 if the number i is present in the set of vertices represented by this node of the segment tree. A bitset of length X will take only X/64 space. The operation set xth bit, reset xth bit , get xth bit,etc take O(1) time while count(bitset) and operations like (bitset1 OR bitset2) will take O(X/64) time.

Updates are straightforward , we need to update the segment tree of the corresponding chain of the vertex, at max logn nodes would be changed, and we will just need to change 2 bits at max.(one bit representing the old value on this vertex and one representing the new). So O(log n).

For queries it is not so simple. Idea is to get a bitset from all the required chains and take OR of all these bitsets. First of all we will have to visit O(logn) chains, then query for the required bitset, so logn nodes on top, and now we need to do OR operations which require O(X/64) time. So basically, O(log^2(n) * X/64) . Now , this is useless unless we cut down the log^2(n) factor drastically. Now asymptotically it is hard to see why it works, but what i do is, use this solution for only part of the query, and use naive brute force for the remaining part. If the query is L,R. Then i take ~2000 nodes on the path from L to R and naively get the bitset of this in O(2000). For rest of the path do our normal segment tree queries. This will cut down the number of chains visited in the worst case so reduce that log factor.

The 2000 nodes i pick are near to lca(L,R). This helps because, near to lca there might be a small part of a big chain, which i would have had to query otherwise. And querying a small part of a big chain would lead to a big log factor because of the number of nodes visited in that query. So in a way i avoid deep queries by brute forcing for these edge points.

Also if you analyse the space complexity this way it will be O(N * N/64) since we are storing bitset on each node. This won’t work so to cut this down i divide X into some ~20 parts and repeat the whole solution 20 times. Also , the count() function of STL’s bitset was too slow for some reason, so i had to implement it on my own with array of long longs.

Can you please elaborate your approach

I thought this problem could only be done by using Mo’s Algorithm (http://codeforces.com/blog/entry/43230 and http://codeforces.com/blog/entry/44711 ), as I couldn’t think of how to combine solutions from different chains.(This was the first time I actually made use of Mo, and read about HLD). Thanks for sharing this approach!