Problem Link
Author: Ivan Fekete
Tester: Misha Chorniy
Editorialist: Bhuvnesh Jain
Difficulty
MEDIUM-HARD
Prerequisites
Sqrt Decomposition, Mex
Problem
You are given an array A of size N. You are also given Q queries. For each query, given L and R, find the mex (greater than 0) of the number of occurrences of all the numbers in the given range.
Explanation
The first thing to notice is that the size of frequency list of the subarrays can be the order of length of the subarray (as the numbers can be unique in the subarray). Thus, even building the frequency list even in order length of subarray would be costly per query. This will be sufficient for subtask 1 but not for the full problem.
The next thing to note that while finding the mex of an array, if the mex is x, it means all the numbers from 1 to (x-1) are present in the array. For our problem, this means there exists a number whose frequency in the given range lies between 1 and (x-1). This means the number of numbers in this range is at least 1 + 2 + \cdots + (x-1) = (x-1) * x/2. Since the maximum size of a range can be N. Using this 2 equations, we get that maximum mex for frequency of numbers can be atmost 2 * sqrt(N).
The last idea tells us that maintaining frequency greater than 2 * sqrt(N) is not useful and it will never contribute to our answer. This fact will be useful for our precomputation step and also help to reduce the amount of memory required. More of this will be clear towards the end of the editorial.
To efficiently calculate the above function, we will rely on the technique of sqrt decomposition. The basic idea of sqrt decomposition of arrays is to split the array into sqrt blocks, calculate the function of each sqrt block efficiently and iterate over the edges points (if they exist) in each query. But here the function we want to evaluate deals on the frequency of numbers which is difficult to update with each passing block as in normal sqrt decomposition.
The algorithm for the sqrt decomposition part will be as follows:
- Split the array into blocks of size K.
- Calculate the frequency of each number inside each block.
- For every block, we also maintain a table how many times does the frequency of an element occur i.e. frequency fo frequencies.
- Here we also precompute the frequency of frequencies across 2 blocks as well for only the required elements (i.e. frequency up to 2 * sqrt(N). This means we have the frequency of frequency available from block i to j.
- For dealing with frequency, we store the total frequency of number starting from block “x” to some index in block “y”, both in left to right and right to left order which would help us deal with queries later.
- For every query, we apply brute force approach if the blocks are same or adjacent. Otherwise, we have the frequency of frequency table for complete blocks available. Only the corner elements in half blocks i.e. containing L and R is left. For this, we simply traverse the elements and update the frequency as well as the frequency of frequency count.
- Now finally, for calculating the mex in the query we simply iterate from 1 to … till we find the frequency of frequency for some “x” as 0. This step will take atmost O(2 * sqrt(N)) operations as the answer is bounded by this.
Once, you are clear with the above idea, you can see the author implementation below for help. I have added comments to make the solution more readable and understandable on lines of the editorial. I request you to go through the solution and if anything is not clear and requires more understanding, please comment below.
The time complexity of the above approach is O(N * (N/K) + Q * K). The first part is for pre-computation which is done for all blocks (N/K) in number and for each, it takes O(N) time. The next part is for answering the queries. Since we do not traverse the complete blocks and only iterate over elements in sub-complete blocks, it takes atmost O(K), i.e. size of the block, for each query. This is true both for brute force and optimised query operation.
For, the optimal value of K.
N * (N/K) == Q * K
meaning K = N / sqrt(Q). Though you can hard code it to some nearby value as well. The author chose the value as 600 in his code.
The space complexity as seen from author’s solution is O((N/K) * (N/K) * sqrt(N)). For optimal value of K, this is O(Q * sqrt(N)).
Feel free to share your approach, if it was somewhat different.
Time Complexity
O(N * sqrt(Q))
Space Complexity
O(Q * sqrt(N))
Problem with similar ideas and tricks
AUTHOR’S SOLUTION:
Author’s solution can be found here.