Cumulative distribution approach for range operations

inovation123 · February 27, 2017, 6:21pm

Well firstly this is not some standard algorithm or method but I would rather say that it is a technique to solve range operations very efficiently beating some standard well known techniques like segment trees,sqrt decomposition, MO’s query optimization technique. Now this technique may not be applicable to all types of range queries or its formulation might be difficult but if we can find the cumulative approach for a problem then it can work wonders.

Now lets analyze range query problems and some standard techniques and try to form basis for cumulative approach. This method is quite handy if we have to perform various operations and give final result in the end.

So, firstly lets talk about sqrt decomposition. In this method we make chunks of sqrt(n) size and perform our queries on buckets and get our results from pre-computed bucket values. So it is O(sqrt(n)) per query in worst case which is decent.

Now segment trees do bit better job with O(logn) complexity per query.

Now the question arises can we do something better ? The answer is yes obviously. We can do O(1) complexity per operation and O(n) pre-processing. Lets see this method. I will give an examples to give slight idea.

Question is simple. n is given as input size of array and there are q operations. Initially all elements of array are initialized to 0 and in every operation we have two indices l,r and we add x to all elements in sub array [l,r]. We need to find minimum/maximum value in array after q operations.

Naive approach will not tackle this problem if n,q are large of order 10^5-10^6. Thus, we need to do something better. Segment tree and sqrt decomposition can be applied but we can do better asymptotically and with about 6-7lines of code.

Firstly, we are not concerned with array as x is added to all numbers between l,r thus we just need to somehow remember that x was added in [l,r] because if we do naive implementation then each query will be O(n). Thus, we will use cumulative array.

Thus let F be cumulative array of size n+1 with all elements initialized to 0. Now for evry operation l,r,x we do F(l)+=x,F(r+1)-=x

a[i]=summation of all F[k] where k<=i

Thus if we need to add x to [l,r], we can add x to F(l) and so this increment will be propagated to all elements after l. But we want this to happen till only r , thus we nullify this by doing F(r+1)-=x. Now we perform all operations in O(1).

Finally we just make cumulative sum to get original a[i] and then find maximum/minimum of all n possible cumulative sums which is O(n). We then output the result.

Thus, we tackled this problem with O(1) per query and O(n) computation to get final result which is much better and simpler than other standard techniques. This is the beauty of this technique, ease and efficiency are the perks of cumulative distribution. Intuition for cumulative distribution and its importance can be realized by anyone who has some idea about probability, statistics ( chi squared distribution,normal distribution etc).

meooow · February 27, 2017, 7:33pm

This is a very clever approach, however it should be noted that this can be applied in very limited scenarios. This works only when the range updates are some type of invertible operation (such as sum).

utkarsh1997 · March 1, 2017, 10:18am

While it is good that you have devised a method to solve a specific type of problem, and I appreciate that, the issue here is that this method suggested by you has some very limited applications.

There is only one query operation, and that is after we have performed the Q updates. There is a very low chance that you might encounter question of such kind. And even if you do, it will not be anything more than a cakewalk, since it is a lot more easier to handle Q updates and just one query, rather than a combination of updates and queries. Feels more like an ad-hoc approach to me. Nevertheless, thanks for sharing it. It will be great if more people come up and share their piece of mind. It really helps everyone around you learn new stuff.

This update idea of yours can have a more generalized application. Read this problem - https://www.codechef.com/problems/SPREAD

Update is much similar to what you do, except that it is on a BIT, and thus costs O(logn). However it allows you to query at any moment, and not just at the end. Further, query is at an index only. One can even think of solving for the problem you presented above using this method, but I did not put much thought in that. I can be wrong here.

Cheers!

inovation123 · March 1, 2017, 2:49pm

Thanks mate. I agree with you. Problem with cumulative distribution approach is that if you have repetitive queries then it takes O(n) which is worse what we would ever want.

I am thinking to optimize it to get better performance for queries, may be aim for pseudo polynomial or logarithmic complexity however we have various techniques for logn query but trying to devise a new technique.