SMARKET - Editorial

dpraveen · April 15, 2017, 10:08pm

PROBLEM LINK

DIFFICULTY

medium

PREREQUISITES

segment tree, sqrt decomposition, offline processing of queries, binary search

PROBLEM

You are given an array A of integers of size N. A block is defined as maximal consecutive subarray with equal values. For example, if A = [20, 10, 10, 7, 7, 7, 10], then there are four blocks in the array, i.e. [20], [10, 10], [7, 7, 7] and [10].

You are given Q queries, each consisting of L, R, K, you have to find number of blocks in the subarray [L, R] that are of length at least K.

Solving first subtask

For solving first subtask, you can find the length blocks in a given subarray in the range L to R in a single iteration over the subarray [L, R]. Thus you can also find the count of blocks having length at least K in \mathcal{O}(|R - L + 1|) time, which can be \mathcal{O}(n) in the worst case. So, the worst case complexity of answering all the queries will be \mathcal{O}(n * Q), which will take around 3000^2 = 10^7 operations, sufficient to run under a second which is good enough for the first subtask.

You can not afford to iterate over all the elements in the range of the query for answering it. You have to either somehow do some preprocessing or build some sort of data structure that can efficiently answer the query in time asymptotically better than \mathcal{O} (n).

A smart solution of \mathcal{O}(\log n) complexity per query

Let us take the example given in the statement, [20, 10, 10, 7, 7, 7, 10]. We will create another array B as follows. B_i will denote the number of suffixes of A[1..i] that have equal elements. Intuitively, it denotes the length of the overlap of block consisting A_i with A[1..i]. For our above example, the B array will be [1, 1, 2, 1, 2, 3, 1]. If we would have to find the number of blocks with length at least K in the entire array, we can notice that this will be precisely equal to number of B_i's equal to K. This is true because if there is a block is of length X, then we will have written numbers 1, 2, 3, \dots, X written as B values at the corresponding indices of block. So, this block will be considered for each of the K's of the query where K is in the range 1 to X. In our example, you can see that there are two blocks of length at least 2 (i.e. the block [10, 10] and the block [7, 7, 7]), which is same as number of 2’s in the array B. The number of blocks of length 1 are 3, which is also same as number of 1’s in the array B.

Now, what if our query does not span the whole array? Let the query be generic range [L, R]. There can be three type of blocks that we will might have to consider.

- The blocks that lie completely inside.
- The left side block lying partially in range
- The right side block lying partially in the range.

For example, for the arary [20, \textbf{20, 10, 10, 7, 7}, 7, 10] and the range [2, 6], the block [10, 10] lies completely inside the range, and the left side block [20, \textbf{20}] lies partially inside the range. Similarly the right stable block [\textbf{7, 7}, 7] also lies partially inside the range.

For easiness in the explanation, let us assume that there are blocks of type 1 present in the query. Later we will learn how to deal with the other case too.

You can see that for the blocks that are completely inside the range, the number of B_i equal to K will give the correct number of blocks of length K in the range [L, R]. What’s the case with left and right side partial blocks? The contribution of right side block will also be counted correctly in this. This is because, all the values in the range 1, 2, \dots B_R are present in the range [L, R]. But for the left block, the B_i's don’t accurately represent the B_i values if we would have considered the only the subrray [L, R] in the consideration. In the later case, we should have started the B values from 1, 2 onwards. For example, B_2 = 2, but that doesn’t mean that block [20] should be counted as of length 2. This block is of length 1. So, it could happen that the left side block is of not length at least K. We handle this case separately by checking whether the left side block is indeed counted in the answer when we calculate the number of K's in the range [L, R], and whether it should be really counted or not. Identifying which is the case is a matter of easy bookkeeping observations that can be made.

Now, remains the case when there is a no block that lies completely inside the range [L, R]. This special case is very easy to deal as one just have to check the lengths of left and right blocks.

For a given fixed array, the queries of finding the number of elements in a range L, R that are equal to K can be done in \mathcal{O}(log n) time after preprocessing of the array. For each number x in the array, you can create a sorted list of indices where x appears. After that finding number of elements in the range [L, R], will require you to do two binary searches to find the smallest index \geq L and largest index \leq R.

Thus, each query will have \mathcal{O}(\log n) time complexity. Overall time complexity of the solution will be \mathcal{O}(Q \log n).

Alternate view of the problem.

Let us create another array B that denotes the lengths of the blocks of the array A. Number of elements in B will be equal to the number of blocks of the array A. For A = [20, 10, 10, 7, 7, 7, 10], the array B will be [1, 3, 2, 1]. From the previous discussion, you have seen that finding the answer for the blocks that completely lie inside the range is difficult. Processing the two left and right blocks is just a matter of small bookkeeping. So, we will see how to find the number of blocks of length at least K that lie completely inside the range [L, R].

Using two binary searches, we can identify the range l, r such that B[l..r] denote the length of blocks that lie inside the range [L, R]. We have to find number of elements in the range [l, r] that are at least K.

So, we have to solve the following problem.

Given an array a. You have to queries [l, r], find the number of elements in the range that are greater than or equal to K.

Solution using segment tree

The above problem can be solved by many standard data structures. You can create a segment tree whose each node will store the sorted list of the values in the range. Space and time complexity of creating this structure will be \mathcal{O}(n \log n), where n is number of elements in array a. Answering a query K can be done by going through the nodes in the desired range and doing a binary search in each node of the segment tree to find the number of values in the node that are at least K. This algorithm provides you a time complexity of \mathcal{O}({\log n}^2) per query.

Solution using sqrt decomposition.

You can also apply a sqrt decomposition algorithm for it. Divide the array a into \sqrt(n) buckets. For each bucket, also maintain a sorted list of values in the bucket. When you have to answer a query in the range [l, r], there are three kind of scenarios that can exist.
- The buckets that lie completely inside the range.
+ You can go through each of the buckets and for each bucket do a binary search to find the number of values in the bucket that are at least K. Number of suck buckets will be \sqrt{n}.
- The buckets lie partially in the range.
+ There can be at most two such buckets, the left and right buckets that intersect with the range [l, r]. You can simply iterate over the values of these buckets that lie in the range and find the count of values that are at least K. As each bucket is of length \sqrt(n). Its time complexity will still be \mathcal{O}(\sqrt{n}).

Time complexity of answering each query is \mathcal{O}(\sqrt{n} \, \log{n}).

This can be optimized to \mathcal{O}(\sqrt{n}) by making the following observation. As we have to answer queries offline, we are allowed to shuffle or sort them in any way. We sort them w.r.t K values in the increasing order. For each bucket, we maintain a pointer that points to the current value \leq k in the sorted list of values of the bucket. You can notice while processing a query this pointer will always move in the forward direction for a particular bucket. Hence, in the worst case, the above algorithm will give a amortized \mathcal{O}(\sqrt{n}) time complexity per query.

SETTER’S SOLUTION

Can be found here.

TESTER’S SOLUTION

Can be found here.

mathecodician · April 17, 2017, 3:46pm

We can also use the offline method with segment tree.

ashwanigautam · April 17, 2017, 6:12pm

This problem can also be solved using MO’s algorithm.
see here for solution.

raja379 · April 17, 2017, 6:13pm

I solved using Mo’s algorithm and deque data structure… It was very direct that way.

ashwanigautam · April 17, 2017, 6:13pm

Using MO’s algorithm and segment tree in O((N+Q)* ROOT(N) * LOG(N)) time ?

mathecodician · April 17, 2017, 6:20pm

No, not using the MO’s algo, only segment tree with storing the nodes and the queries together.

narendra1897 · April 17, 2017, 6:37pm

can some one tell what B array the author is talking about and how to get it?
I didn’t understood at all.

sagarpreet07 · April 17, 2017, 9:17pm

can someone tell me why my solution is showing TLE for second Subtask .
(I did using Segment trees) …Time per Query is O(logn)
https://www.codechef.com/viewsolution/13332019

dybbuk · April 17, 2017, 10:48pm

I solved this using only Mo’s Algorithm -> solution

Mo’s Algorithm tutorial by Anudeep Nekkanti -> tutorial

deep1996 · April 18, 2017, 9:48am

Square Root Decomposition + Binary Indexed Tree based Solution.

ashwanigautam · April 18, 2017, 1:42pm

I don’t know why you haven’t written about it yet

mathecodician · April 18, 2017, 2:58pm

See my solution - https://www.codechef.com/viewsolution/13288076

ambujverma8 · April 18, 2017, 8:14pm

I tried doing this using the segment tree, stored the sorted values of the range at each node and then tried searching, still getting TLE in second subtask. Can anyone please tell the reason,i couldnt find the mistake…
https://www.codechef.com/viewsolution/13307321

forhadsidhu · April 19, 2017, 9:00am

@narendra1897 in log(n) approach like in first solution array B is nothing but comulative frequency of numbers suppose A[]={1,2,3} so B={1,1,1} A[]={1,1,1}so B[]=(1,2,3} in next case B[]={length of block}

yaswant · April 19, 2017, 11:50am

I used Segment tree but got TLE in second Subtask. Please,someone tell why is it happening.
Here, is link to my code https://www.codechef.com/viewsolution/13312106

fane_faiz · April 19, 2017, 12:33pm

i am trying to traverse the array using MO’s and keeping track of frequency using segment tree.I don’t know what’s wrong with my code b/c i tried and run my code on many testcases but thr result are perfectly fine.Can anyone help me find bug in my code
link to solution->link text

studyonline · April 20, 2017, 4:25pm

The solution is accepted using Segment tree where each node is storing the sorted list of elements in the range (this is basically a merge sort tree ). Make sure to use vectors while creating segment tree nodes and don’t pass vectors in function but instead pass their references. I got many TLE’s because of it.
My solution using Segment trees - Solution

prokopton · April 21, 2017, 8:40pm

This part “For each number x in the array, you can create a sorted list of indices where x appears. After that finding number of elements in the range [L,R], will require you to do two binary searches to find the smallest index ≥L and largest index ≤R”. What does it mean? Can someone please explain? I have created a segment tree where each node contains sorted scores from left to right of that segment. How do I compute the the correct number of elements between L and R? Thanks.

yvetal_1 · April 21, 2017, 9:26pm

I did everything in O(n+q) by pushing queries into a vector at location l and r and processing with line sweep

Credits to bluescorp for sharing a unique observation, we both had a O(Q*logN) algorithm ready before we communicated.
My solution(AC 0.23s): https://www.codechef.com/viewsolution/13370003

Do i get stars now?

yvetal_1 · April 21, 2017, 10:34pm

i cant delete my comment so im doing this instead ;_;
get my solution on the first page plx XD