CHEFDOMA - Editorial

dpraveen · February 27, 2016, 10:49pm

Problem Link:

Practice
Contest

Author: Misha Chorniy
Tester: Pushkar Mishra
Editorialist: Praveen Dhinwa

Difficulty:

medium hard

Pre-Requisites:

sqrt decomposition

Problem Statement

An array is called dominated if there exists an element in it whose frequency in the array is strictly greater than the half the size of array.
You will be given an array A of size n. You have to find out number of subarrays of A which are dominated.

Explanation

There is a unique dominating element of an array if there is any
An array is called dominated if there exists an element in it whose frequency in the array is strictly greater than the half the size of array. Note that by half of size the array, we mean the \lfloor \frac{\text{size of array}}{2} \rfloor. According to this definition, array [1, 1, 2, 3] won’t be called dominated as maximum frequency of an element in it is 2 which is equal to \lfloor \frac{4}{2} \rfloor.
Also note that this definition makes sure that there will be a unique dominating element of an array. We will call that element a dominator of the array. So, instead of counting number of dominating subarrays, we can as well find count the number of subarrays dominated by some dominator element.

Sqrt decomposition based solution

Let us first try to find out the subarrays dominated by A_i.
Let j \leq i be the smallest index such that A_i dominates the subarray [A_j, A_i].
Similarly, let k \geq i be the largest index such that A_i dominates the subarray [A_j, A_k].

Let freq(A_i) denote the frequency of A_i in the array A. Note that A_i can’t dominate a subarray of size > 2 * freq[A_i].
So, we notice that subarrays [j, i] and [i, k] will have their sizes at max 2 * freq[A_i].

Intuitively, we can understand that the number of subarrays dominated by an element will depends on its frequency, also the block of array surrounding it ([j, k] as described above) will have its size proportional to its frequency.

This suggest us to think of solutions based on frequency of an element. We will employ a sqrt decomposition solution which will deal with low and high frequency elements differently. For categorizing an element A_i into high or low frequency category, we have to define a boundary M (whose value will be decided in the end, consider it variable for now), such that elements with frequency < M are called low frequency elements, others are called high frequency elements.

Dealing with low frequency elements
We iterate over i from 1 to n and want to find number of j's such that j \geq i and subarray [A_i, A_j] is dominated by some low frequency element.

Notice that we don’t need to extend j all the way till N. In fact, we can note that i.e. j will be at most i + 2 * M. This is due to the fact that frequency of a low frequency element can be at max M, so the maximum size of array which can be dominated by this element will be at max 2 * M.

So, we iterate over j from i to at max i + M and find the maximum frequency element in the subarray [A_i, A_j] and check whether it is a low frequency element or not. If yes, then this subarray is dominated by some low frequency element, so we add it our answer.


for i = 1 to N
	map<int, int> countMap; // key is an element and value is the count of the element
	int which = 0
	int maxCount = 0
	for j = 1 to min(N, i + 2 * M)
		countMap.add(A[j])
		// Update if A[j] has become the maximum frequency element.
		if (countMap.get(A[j]) > maxCount)
			maxCount = countMap.get(A[j])
			which = A[j];
		// Check whether which is a low frequency element
		if (freq[which] <= M) 
			ans++

Overall, this process will take O(N M log M) where log M factor comes from using maps for maintaining frequencies of the elements.

Dealing with high frequency elements
Now, we will see how to find out number of subarrays dominated by some high frequency element. So, we will take up each high frequency element in the array one by one and will find the number of subarrays dominated by it. Let x be such an element.

Let us take an example array [1 2 3 1 4 1 1 1 5 1]. Let us say that we want to compute the dominating subarrays that have 1 as the dominating number.
So we can ignore the other numbers and view our array as [1 X X 1 X 1 1 1 X 1] where X is some element other than 1.

How to check whether the subarray [A_j, A_i] is dominated by x.
Let us write this condition this mathematically. Let S be another array such that S[i] denotes the frequency of x in the subarray A_1 to A_i. In competitive programming lingo, such an array S is called prefix sum array of an array. So, frequency of x in subarray [A_j, A_i] will be S[i] - S[j - 1].
The subarray [A_j, A_i] would be dominated by x if
S_i - S_{j - 1} > \frac{i - \, (j - 1)}{2}
\implies 2 * (S_i - S_{j - 1}) > i - (j - 1)
\implies 2 * S_{j - 1} - (j - 1) < 2 * S_i - i.

Convert to some already known problem
Let us create an array B such that B_i = 2 * S_i - i. Now, the problem of finding number of subarrays dominated in array A by element x will be same as finding number of pair of indices (i, j), i \leq j in array B such that B_i < B_j.

What if we can find the number of inversion pairs in the array B? A pair of indices (i, j) is called inversion pair if i < j and B[i] > B[j]. We can notice that answer of our subproblem is \frac{n \, (n - 1)}{2} \, - \, \text{number of inversion pairs in } B.

We can solve the above inversion pair problem in O(N \, log N) where N denotes the number of elements in the array B. Note that the time complexity does not depend on range of elements of the array. You can submit your solutions for this problem at http://www.spoj.com/problems/INVCNT/. There are many possible approaches for this problem. The ones that I know are using merge sort, binary indexed tree (BIT) and segment tree. Please use the following resources to learn about these techniques.

Binary indexed tree based approach
Now, I will mention the approach by binary indexed tree for finding number of subarrays dominated by high frequency elements.
So, we iterate over array A from left to right. We need to maintain some information so as to find out number of j’s that
2 * S_j - j < 2 * S_i - i. We maintain an array cnt where cnt[x] denotes the number of j’s such that 2 * S_j - j = x. As S_j can take value at max N, so the size of the cnt array will be at max 3 * N. Now, for finding number of j's for an i$i$, such that 2 * S_j - j < 2 * S_i - i, we can find the sum of elements in the range [-N, 2 * S_i - i].

So, we need a data structure in which we can support the following operations

Increment cnt[x] by 1.
Find the sum of elements from cnt[1] to cnt[L] for a given L.

These operations can be efficiently performed using binary indexed tree (BIT), or by using segment tree.


long long solve(int x) {
  long long ans = 0
  for i = 1..n
     S[i] = S[i - 1] + (a[i] == x) ? 1 : 0;
     add(2 * S[i] - i, +1)
     ans += query(2 * S[i] - i);
  return ans;
}

So, for an high frequeny element x, we can find the number of subarrays dominated by it in O(N \, log N) time. Note that number of high frequency elements are bounded by \frac{N}{M}, So, time complexity will be O(N \, log N \cdot frac{N}{M}).

Choosing appropriate value of M
By combining the above two approaches for dealing with low and high frequency elements separately, we get the total time as O(N\, M\, log N + N \, log N \cdot \frac{N}{M}). As M \leq N, log M \leq log N. So, we can as well write the expression as O(N \, (M + \frac{N}{M}) \, log N. Now we will choose M in such a way overall value of expression is minimized. As factor N \, log N is fixed for a given N, we need to minimize (M + \frac{N}{M}) which takes it minimum value at M = sqrt(N).

So, overall time complexity is O(N \, sqrt N \, log N). Did you get the reason why did we call this method sqrt decomposition method

Subtask 4

Now, this solution takes the approach of dealing with high frequency elements in the previous approach and smartly generalize this approach for all the elements in total O(N log N) time. So, similar to that approach, we will process each unique element x one by one.

Let (i_1, i_2, i_3, .., i_k) denote the positions in array A (in increasing order) where number x occurs. , i.e. A[i_1] = A[i_2] = \dots = A[i_k] = x.

Now, let us see two neighbouring occurrences of x, [i, j], so A[i] = A[j] = x.
So we do the following kind of operations at each step while moving from position i to j
add(2 * S_i - i, +1)
query(2 * S_i - i)

add(2 * S_{i + 1} - (i + 1), +1)
query(2 * S_{i + 1} - (i + 1)

add(2 * S_{i + 2} - (i + 2), +1)
query(2 * S_{i + 2} - (i + 2))

…

add(2 * S_{j - 1} - (j - 1), +1)
query(2 * S_{j - 1} - (j - 1))

add(2 * S_{j} - j, +1)
query(2 * S_{j} - j)

Now, we can re-write this as below by using the observation that value of S_k for k from i to j - 1 will be equal to S_i whereas the value of S_j will be S_i + 1.
add(2 * S_i - i, +1)
query(2 * S_i - i)

add(2 * S_i - (i + 1), +1)
query(2 * S_i - (i + 1))

add(2 * S_i - (i + 2), +1)
query(2 * S_i - (i + 2))

…

add(2 * S_i - (j - 1), +1)
query(2 * S_i - (j - 1))

add(2 * (S_i + 1) - j, +1)
query(2 * (S_i + 1) - j)

So, this means that from index i to j - 1, the corresponding index in cnt array gets decreasing by 1. So, this will form a continuous range. Similarly, query operation will also be forming a decreasing range.

Note that add operation done corresponding on i-th element, does not affect the next query operation till the j^{th} element. It is due to the fact the query index is strictly decreasing similar to the add operation. Please convince yourself with by working with some example. This is very crucial point to understand.

So, we can simulate the effect of all these operations in a single step by simply finding the sum of elements of cnt array from [2 * S_{j - 1} - (j - 1), \, 2 * S_i - i] and adding it to our answer and then increasing the cnt array range [2 * S_{j - 1} - (j - 1), \, 2 * S_i - i] by 1.

These range update and query operations can be done by maintaining a segment tree or a binary indexed tree…

Finally, at the end of processing the number x, we have to remove/undo the changes/update that we did during the processing so that the upcoming new element can be processed correctly.
For that, we have to somehow undo the changes that we did to cnt array during the processing of number x. We will store all the ranges at which did the update operations during the processing of x. We will go again over those ranges and undo their effect by making updates of adding -1 as opposed to addition of 1 earlier.

So, processing a unique number x with frequency freq_x takes O(freq_x \, log N) time. This enables us to process all the elements of array A in O(N \, log N) time.

Solution:

Setter’s solution can be found here
Tester’s solution can be found here

Please feel free to post comments if anything is not clear.

mgch · February 27, 2016, 10:54pm

Setter’s solution - http://ideone.com/sssrlP

Tester’s solution - http://ideone.com/zyPAFG

pushkarmishra · February 27, 2016, 10:58pm

Updated code for tester’s solution.

likecs · February 27, 2016, 11:01pm

Can Someone explain the approach using an example?

Would be of great help.

Thanks in advance.

dpraveen · February 27, 2016, 11:03pm

Sure, I will explain it clearly, please do visit tomorrow morning for update.

dushsingh1995 · February 28, 2016, 9:53am

O(n^2) approach passes for second subtask which is surprising because for second subtask n <= 10^4.

O(n^2) approach

likecs · February 28, 2016, 10:56am

Thanks, for the nice detailed explanation. Liked the approach.

rajat1603 · February 28, 2016, 12:02pm

Alternate solution for solving elements with higher frequency
You can assign that element as +1 and any other element as -1 now the problem reduces to finding number of subarrays with strictly positive sum which can be easily done with a bit .


[1]


  [1]: https://www.codechef.com/viewsolution/9502740

dpraveen · February 28, 2016, 12:05pm

@rajat1603: I have just updated it in detail

likecs · February 28, 2016, 12:58pm

Not surprising as time limit was 3 seconds. For online servers mostly 1 sec means 10^8 operations.

dreamoon4 · February 28, 2016, 2:24pm

No matter high or low frequency elements, we can handle them with time linear to number of elements. So this problem can be solved the problem with O(n).


[1]


  [1]: http://www.codechef.com/viewsolution/9502072

dpraveen · February 28, 2016, 3:47pm

Hi @dreamoon4, can you please explain idea behind your code?
Also I saw that following lines in your code


REP(i,n){
        a[i]=lower_bound(d,d+dn,a[i])-d;
        p[a[i]].PB(i);
    }

Won’t having lower_bound be O(n log n)?

dreamoon4 · February 28, 2016, 4:31pm

This part can use hash to let it become O(n). I will post more detail later.

foraml · February 28, 2016, 4:50pm

why undo the changes on each new x? Can’t we just clear the segment tree/ BIT ?