CHEFLKJ - Editorial

pushkarmishra · July 23, 2016, 7:45pm

PROBLEM LINK:

Practice
Contest

Author: Misha Chorniy
Tester: Karan Aggarwal
Editorialist: Pushkar Mishra

DIFFICULTY:

Medium

PREREQUISITES:

Segment Trees

PROBLEM:

Given is an array A of length N. There are two types of operations that we need to handle on this array:

1 x y: change the value of A[x] to y, i.e., A[x] = y.
2 l r: Answer in a “Yes” or a “No” whether the subarray A[l..r] is dominating or not.

Here dominating means that there is a number in the subarray that appears in the subarray at least as many times as one more than half of the length of the subarray.

EXPLANATION:

The nature of the problem clearly hints towards segment trees. We need to efficiently perform two kinds of operations: update a value at an index, and answer for a subarray whether there exists a number in it which appears at least (one more than half its length) times in it.

Clearly, There isn’t a notion of an elegant lazy propagation here since the updates we perform are to single elements.

Let us start by making some observations about dominant arrays and subarrays. Let us consider the entire array A[1..N]. The first important observation is that it will only be dominating if either subarray A[1..\frac{N}{2}] is dominating or subarray A[\frac{N}{2}+1..N] is dominating. This is simple to prove. We use proof by contradiction. Assume that neither of the subarrays are dominatinng. That means all the numbers in either of the subarrays appear at maximum \frac{N}{4} times only. That means, if we combine the two arrays, there is no number which appears more than \frac{N}{2} times. This is a contradiction. Therefore, A is not dominating.

We can further extend the observation to say that if A[1..N] is dominating, then the dominating number must either be the dominating number of A[1..\frac{N}{2}] or the dominating number of A[\frac{N}{2}+1..N]. This follows from the same arguments that we made above.

This leads us to our method of building our segment tree. Each node of the segment tree stores 2 fields: an integer ‘Dominant’ which stores the dominant number in the subarray covered by the node, and a map (it can be a hash map for better bounds) ‘CountOccurrences’ which counts the number of times each number appears in the subarray covered by the node. The memory used in this is \mathcal{O}(N\log N) because each number is stored in \log N nodes only. So memory isn’t an issue in our structure at all.

The pseudocode for the build operation is pretty straightforward now:

void buildNode (node r):

		if(r == leaf) {
			//let us say that this leaf corresponds to A[index] of the given array.

			SegTree[r].CountOccurrences.clear(); // clear map of any garbage

			// Increment the occurrence of the number at this leaf
			SegTree[r].CountOccurrences[a[x]] += 1;
			SegTree[r].Dominant = a[x]; // a number dominates its own cell

			return;
		}

		if(r != leaf) {
			buildNode(r->left);
			buildNode(r->right);

			length = r->length; // length of the subarray at this node.

			// We merge the maps of children. Merging simply means adding the
			// number of occurrences of each number in either of the children maps
			// to the parent map.
			SegTree[r].CountOccurrences = merge(SegTree[r->left].CountOccurrences,
												SegTree[r->right].CountOccurrences);

			if(SegTree[r].CountOccurrences[SegTree[r->left].Dominant] > length/2) {
				SegTree[r].Dominant = SegTree[r->left].Dominant;

			} else if (SegTree[r].CountOccurrences[SegTree[r->right].Dominant] > length/2) {
				SegTree[r].Dominant = SegTree[r->right].Dominant;

			} else {
				// indicates that subarray at this node is non-dominating.
				SegTree[r].Dominant = -1; 
			}
		}
	}

The update routine is very similar. We just subtract 1 from the number of occurrences of the old value and add 1 to the number of occurrences of the new value at the correspoing leaf plus all its ancestors. Recalculating the ‘Dominant’ variable is done in the same way as the build function.

With this data structure, how can we query whether a subarray is dominant or not. We use the same logic as before. To query a subarray, we will be visiting at most \mathcal{O}(\log N) nodes in the segment tree. Each of these nodes will have its dominant number (or -1 in case it is not dominating). If the subarray we are querying about is dominating then the dominating number must be one of the dominating numbers of the nodes we visit in the query operation. Therefore, the first thing to find out is the nodes that the query function is going to visit in the segment tree. Then for the dominating number of each of them, we just count how many times they appear over all the nodes that the query function visited. If any of the dominating numbers appeared (more than half the length of the subarray) times, then the subarray is dominating; otherwise no.

What is the complexity of this approach? Build is same as the standard segment tree function which takes \mathcal{O}(N\log N). This is when we are using hash maps in our segment tree nodes. If we use normal maps, then the complexity will be \mathcal{O}(N\log^2 N). Similarly, update takes \mathcal{O}(\log N) with hash maps and \mathcal{O}(\log^2 N) with normal maps. The query function on the segment tree itself is \mathcal{O}(\log N) but we need to count the occurrences of each of \mathcal{O}(\log N) dominating numbers in the \mathcal{O}(\log N) nodes queried. Therefore, the total complexity per query is \mathcal{O}(\log^2 N) with hash maps and \mathcal{O}(\log^3 N) with normal maps.

ALITER

There is one more way to solve this problem that uses randomisation. For a subarray A[l..r], choose any number x randomly; the probability that this number is the dominating number is \frac{1}{2}. We can check by counting the number of its occurrences using the CountOccurrences structure we build before. If it is not the dominating number, we can choose some other number and try. So, in i such iterations, the probability that we don’t find the dominant number is \frac{1}{2^i}. As we can see, this decreases exponentially. Thus, after 20 or so iterations, if we find that there is a dominant number, then we output that number, otherwise, we can be pretty sure that there isn’t a dominant number, i.e., the subarray isn’t dominating.

Please see editorialist’s/setter’s program for implementation details.

COMPLEXITY:

\mathcal{O}(N\log^2 N)

SAMPLE SOLUTIONS:

Author
Tester
Editorialist
Admin

vsp4 · July 25, 2016, 12:11am

Was (O(log n)^3) supposed to pass with normal maps? I tried this by maintaing a sorted subarray in every subsegment and then counting kth element in given subsegment which runs in ((logn^2 * log(10^9)) per query, but it gave tle

dpraveen · July 25, 2016, 12:24am

No, it was not supposed to pass.

rajat1603 · July 25, 2016, 12:29am

it can be solved in O((N+Q)logN) as well

Suppose we have an array with N elements , if you pick up any 2 different elements with unequal value and remove them from the array and keep doing this until only 1 distinct value is left , that value is the only possible candidate for answer

Now To make this faster , we store this information in segtree , segtree node denotes that if we do this stuff on segment [l , r] , what will be the value left and its frequency.
Now we have to check the frequency of an element from l to r , we can either do it online with a bst or do it offline with a simple bit in nlogn time.


[1]


  [1]: https://www.codechef.com/viewsolution/10897065

anudeep2011 · July 25, 2016, 12:32am

My solution uses square root decomposition. For segments of length < 2000, just loop over and calculate frequency of elements. Report answer.

For segments of length > 2000, the dominant element should have frequency > 1000. There will be at most 200 such elements. So I maintain one BIT (binary index tree, or seg tree) for each of those elements, loop over all 200, find the number of elements in range l…r using BIT.

Update is 2 fold, update the base array. Also update BIT.

Update is O(log N), you update at most one BIT.
Query is O(2000) or O(200 * log N).

lohit_97 · July 25, 2016, 12:35am

Seems, like, you can use Square Root Decomposition on each and every question

dpraveen · July 25, 2016, 12:36am

Won’t there be total 500 such elements with frequency > 1000?

anudeep2011 · July 25, 2016, 12:42am

Initially we have 10^5 elements. Even if you consider the 10^5 query operations as “append” operation instead of replace, at the end you would have 2 * 10^5 elements => at most 200 with >= 1000 frequency

dpraveen · July 25, 2016, 12:47am

@anudeep2011, yes, you are right.

likecs · July 25, 2016, 2:24am

Even Solution using maps (Complexity O(log^3n) per dominant query) pass within time limit. Here is


[1].


  [1]: https://www.codechef.com/viewsolution/10901084

likecs · July 25, 2016, 2:27am

@vsp4, I think your solution gives tle because of the ordered statistic set used in your solution. Though, the complexity of ordered statistic set is as desired, but constants are too large. It happened with me too when orderedd statistic set gave tle, but bit with binary search passed easily.

morass · July 25, 2016, 3:11am

I have also passed with (somehow) “not good” complexity.

Worst case [H - means complexity of hash-map]:

Update query: O(H^2log(10^9))

Get query: O(sqrt(N)*H+sqrt(N)*Hlog(10^9)*log(N))


[1] (well just to see, it is barely readable :/ )

Thought: I had (for each distinct numbers) a Fenwick-Tree [BIT Tree] in which I held positions. I also held a "priority queue" in which I stored numbers sorted by total frequency.

Get query was then to choose between two:

A) If the range was lesser than sqrt(N), I brute-forced it.

B) If the range was higher than sqrt(N), I looked upon all distinct numbers which had frequency greater than "RANGE/2" and checked (by Fenwick) how many are there on my range. (PS: there could be at most sqrt(N) of these numbers)

Anyway I'm wondering how it managed to pass TLE :)

Have nice day ^_^

  [1]: https://www.codechef.com/viewsolution/10897137

rishivikram · July 25, 2016, 5:09am

@vsp4 Can you tell me what is the idea of your solution?

vsp4 · July 25, 2016, 12:26pm

@rishivikram Find the middle element of the sorted segment [l, r]. If there is a dominator this middle element has to be that dominator. Finding kth element could be done in (logn)^3 time in subarray with updates. Then count how many times this element appears in the array [l,r] This could be done in (logn)^2

noob333 · July 25, 2016, 2:30pm

I feel the explanation for dominant sub-arrays in the editorial is insufficient since it deals with only sub-arrays of size N/2. This is not always the case - especially in case of queries where the query range can span multiple segments(sub-arrays). I think this would be better way to put it,

Over a range(i,j) of length N, let there be k sub-arrays {1,2,3…,k} each of length {n1,n2,…nk} respectively. If a majority element ‘e’ exists in range(i,j) then one of the k sub-arrays must have e as the majority element.
Proof is by contradiction (along the same lines as in the editorial),

n1 + n2 + … + nk = N

Assume e exists which means freq(e) in (i,j) > N/2

Now lets say none of the k sub-arrays have e as the majority element. Then,

freq(e) in sub-array {1] <= n1/2

freq(e) in sub-array {2} <= n2/2

…

freq(e) in sub-array{k} <= nk/2

Therefore, freq(e) in (i,j) <= n1/2 + n2/2 + … + nk/2 <= N/2 which contradicts our assumption that e is a majority element.
So at least one of the k sub-arrays must have e as the majority element.

Thanks!

rishivikram · July 25, 2016, 4:12pm

@vsp4 Kth element in sorted segment can be found in O((logN)^2) by merge sort tree. I don’t know how to handle updates though, how do you find it in O((logN)^3) when updates are there as well?

xellos0 · July 25, 2016, 4:29pm

This was my solution too.

amit_369 · July 25, 2016, 4:38pm

Can anybody tell me how to define structure of segment treee such that i have an array of occurences of variable length in c.
Also plz tell me how to define map in c.

vsp4 · July 25, 2016, 4:41pm

@rishivikram
For updates, erase the old element from each of the logn segments where the given index occurs and add the new element. It takes (logn)^2 time with a bst at each node of segment tree.

I also don’t know how to do updates with merge sort tree online. However i think a offline approach is possible by creating the tree with all possible values initially and erasing/inserting positions at old/new values respectively.

vsp4 · July 25, 2016, 7:19pm

I tried with approach similar to merge sort tree: https://www.codechef.com/viewsolution/10904469 But again got TLE with both operations on (log (n + q))^2. The order statistics bst seems to have very high constant!!!

It passes only if i distinct b/w heavy and light elements: https://www.codechef.com/viewsolution/10904880