STDPRGS - Editorial



Author: Pavel Sheftelevich
Tester: Sergey Kulik
Editorialist: Pawel Kacprzak




MO’s algorithm


Given an array A of N non-negative integers, you have to handle M queries. For a single query [L, R], your task is to find the value of (S[2] - S[1])^2 + (S[3] - S[2])^2 + \ldots + (S[R - L + 1] - S[R - L])^2 where S is the sorted segment A[L, R].


Use MO’s algorithm to handle queries offline.



In the first subtask, both N and M are quite small, up to 200, so the most straightforward solution will work here. For each query [L, R], make a copy of a segment A[L, R], sort it, and compute the result using the equation given in the statement. If you use a quadratic time sorting algorithm, the solution has running time O(M \cdot N^2) which is enough here.


In the second subtask, N and M can be up to 10^4. We can solve the problem with these constraints, if we replace a quadratic time sorting algorithm with O(N \cdot \log N) method, and implement the solution efficiently, using static allocated memory. The total time complexity of this method is O(M \cdot N \cdot \log N).


Here we have to deal with the original limits for N and M, i.e. N, M \leq 5 \cdot 10^4, but elements in the array A are not greater than 100. We can exploit that last constraint and develop the following solution.

Let pos[k] be the sorted vector of indices at which element with value k occurs in A. There are at most 100 such vectors, and we can compute all of them in O(N) time. Now, in order to handle a single query [L, R], we iterate over all possible values from 0 to 100, and check if a particular value v occurs at positions from L to R at least once. This can be done by a binary search on pos[v]. Then, in order to compute the result, we can just compute it using only single occurrences of values of elements in A[L, R], which we just have computed. This is sufficient, because if a certain value v occurs more than once in A[L, R], any of these addition occurrences do not have any impact on the result, because they are consecutive elements in a sorted segment A[L, R], so they add nothing to the sum, because (v - v)^2 = 0. The total time complexity of this method is O(N + M \cdot 100 \cdot \log N)


Now we are about to solve a more serious problem. We have N, M \leq 5 \cdot 10^4 and there are no extra restrictions on the values in A different than the original constraints.

At the first sight, the problem look really complicated. It is not clear how to figure out the solution for one query knowing the solution for an other one, due to computing their sums in a sorted order. However, we do not have to answer these queries one by one, i.e online. Instead, we can compute and store the answers to all of them in any order and at the end return the stored results. This approach is called an offline processing of queries. More specifically, we are going to use MO’s algorithm. The general idea is of this method is also explained here.

We are going to divide array A into bins of size B. We will pick the best suitable B at the end of the analysis. Next, we are going to distribute queries to these bins in the following way: the first bin will contain queries with left end point in a range [0, B - 1], the second one, queries with left end point in a range [B, 2 \cdot B - 1] and so on. We are going to compute answer to queries in each bin independently of another bins, so for now, let’s focus on answering queries from one bin, and remember that we will have to do it for all \lceil N / B \rceil bins.

Now we are about to answer queries from one bin. The first observation we can make, is that all these queries have left point not further away than B to other queries from the bin. This fact was the motivation of distributing queries to bins and we will use it later.

First, we are going to sort queries in the bin by their right end point in ascending order, and process them one by one in this sorted order.

Next, we will need a data structure which allows us to perform fast the following operations: insert, delete, find elements as long as finding predecessors and successors of elements. Any balanced binary search tree is perfect here, and if you are using C++, then std::set is a good choice. Let’s call our data structure S.

At the beginning, we are going to insert all elements from the range of the first query to S and we will compute the answer for this query using the equation given in the statement. This takes O(N \cdot \log N) time, because the range can have up to N elements and inserting each one takes O(N \cdot \log N), if we picked an appropriate implementation of S. The invariant here is that, S contains all elements from the last processed query and that a variable ans contains the answer to the last processed query.

We are going to update S, as long as ans, for the next query, based on the values from the previous one. How to do that? Well, let’s consider any two adjacent queries in the sorted order [L_i, R_i], [L_j, R_j]. We know that R_j \geq R_i and that the absolute difference between L_i and L_j is not greater than B. Based on these two facts, we are going to insert to S all the elements from a range [R_{i + 1}, R_j] keeping the ans up to date after each insertion. Similarly, if L_j < L_i, we will insert to S all elements from a range [L_j, L_{i - 1}]. Otherwise, we will delete elements from a range [L_i, L_{j - 1}]. Notice that we can perform both insertion and deletion keeping ans up to date. This is easy to do, because after an insertion, we place a new element in between two elements, so we need to subtract from the ans their square difference. In addition, since we inserted the new element, we need to add to ans the squared difference between its value and the value of its predecesor. We have to do the same for its successor too. Deletion is analogous. After all these updates, we have computed the answer to the current query, which is very good.

The only remaining question is what is the time complexity of all these operations. Notice that, since all queries are ordered by their right end points in ascending order, we will perform at most N insertions from the right side of queries for the whole bin! This is very nice, because is follows that the total cost of all insertions to right sides of queries for a single bin takes O(N \cdot \log N) time. What about insertions and deletions from the left side of queries? Well, do you remember when we noticed that all these endpoint do not differ more that by B? Using this fact, we know that insertions and deletions from the left side of a query to update ans to the answer for the next query take O(B \cdot \log N).

In order to get the total time complexity, we just need to sum up achieved results for all bins. Insertions to all right sides in all bins takes then O(N / B \cdot N \cdot \log N) and insertions/deletions to all left sides to all queries takes O(M \cdot B \cdot \log N). It is easy to notice that picking B = \sqrt N is the best choice here. Doing this, the total time complexity is O(\sqrt N \cdot N \cdot \log N + \sqrt N \cdot M \cdot \log N).


Author’s solution can be found here.
Tester’s solution can be found here.

My solution is exactly the same, but it got TLE for the 4th sub task. Were the time limits too strict? And if yes, where could the code have been optimized ?

In both add and remove you’re searching the map more than once. Search for the right element once (using upper_bound and a hint for insertion) you can reduce it to searching once for both operations.

I guess when using std::set as suggested in the editorial you have to insert pairs (A[i],i) or something similar. Using multiset or map (much faster in the 3rd task) makes somewhat more sense.

TIL: copying into a array using loop and using std::sort is faster than std::partial_sort_copy …

if we replace a quadratic time
sorting algorithm with O(N⋅logN)
method, and implement the solution
efficiently, using static allocated
memory and without unnecessary

We need to copy elements to preserve original order. So what unnecessary copying is this referring to?

The copying you mentioned is necessary, I meant to avoid any unnecessary copying, like for example the one which is implicitly used when using dynamic arrays. Probably this note is misleading as you pointed, so I’m erasing it. Thanks.

When I was solving the problem before the contest, it turned out that using a set of pairs was slightly faster than using a multiset, which makes a perfect sense if you know how multiset is implemented.

the bound n,m<10000 was somewhat strange anyway. I don’t think any solution using simply sorting an adding up the differences could pass strong testcases.

my solution complexity is O(N.LogN) but still it is giving TLE for subtask 2 … As mentioned above NlogN complexity will pass … please help me !!

Link to the solutions are saying ‘Access Denied’.

Correct me if I’m wrong, but with a segment tree and using merge sort, this can be done in O(nlogn +mlogn) time and O(n) space.

Instead of multiset or pairs in set, we can just use a set, and an array to keep count of the element, and this count can be updated in O(1), and operations on set are logN.
It passed for me.

I also feel the time limits are strict.
My code gave TLE in C++4.3.2 but the same code got AC in C++4.9.2 and C++14.


Why link to the solutions are saying ‘Access Denied’?

when I used

1)all data typed as Long long and faster inp method(getchar_unlocked) got TLE

2)all data typed as int and no faster inp method(getchar_unlocked) got TLE

3)all data typed as int and faster inp method(getchar_unlocked) got AC.

When B= sqrt(N) then I am getting TLE in Java whereas for B = 2*sqrt(N) its getting passed in 8.99 seconds.
Here is my submission list,ankurverma1994
Could anyone suggest me further optimization to my code?

I am unable to view author and tester’s solutions… getting XML error. Can anyone please provide the correct links/fix the existing links?