CHSQARR - Editorial

Author: Vasya Antoniuk
Testers: Istvan Nagy
Editorialist: Praveen Dhinwa

medium

PREREQUISITES:

partial sums, dequeue, dynamic programming

PROBLEM:

You are given 2-D matrix A[n][m]. The matrix will be called good if there exists a sub-matrix (continuous rectangular block of matrix) of dimensions a \times b whose all of its elements are equal. In a single operation, you can increase any element of the matrix by 1. Find out minimum number of operations required to make the matrix good.

QUICK EXPLANATION:

For finding sum of a sub-matrix, you can use partial sums.
For finding maximum element in a sub-matrix, your can use doubly ended queue or deque.

EXPLANATION:

Let us consider some a \times b dimension sub-matrix. Let x be the largest element in the sub-matrix. Let S be the total sum of elements in the sub-matrix. For making all the elements equal in minimum number of operations, we should aim to make all the elements equal to x. So, the minimum number of operations required will be x * n * m - S.

So, we have to find the sum and maximum element in all the possible sub-matrices of dimension a \times b.

For finding sum of all sub-matrices of sizes a \times b, we can use maintain partial sum sub-matrix. For that, we maintain another array, sum[n][m], where sum[i][j] will denote the sum of elements of the sub-matrix with left top end coordinate being (1, 1) and right bottom coordinate being (i, j). After computing sum matrix, we can find the sum of any sub-matrix.

Now let us learn about how to find maximum element in all sub-matrices of sizes a \times b.
Let maxCol[i][j] be the maximum element of the sub-matrix in the column starting at (i, j - b + 1) and ending at (i, j) (i.e. of column of length b).

Assume that we have computed maxCol array efficiently, let us find how we can use this to calculate maximum of a \times b sub-matrix.

Let max[i][j] denote the maximum value of sub-matrix of A ending at (i, j) (bottom right point), and of dimensions a \times b. Note that max[i][j] is maximum of maxCol[i - a + 1][j], maxCol[i - a + 2][j], \dots, maxCol[i][j].
Note that for calculating max[i][j + 1] from max[i][j], we have to add a new element maxCol[i][j + 1] and remove the element maxCol[i - a + 1][j].

That means, that we have to maintain maximum of a constant size subarray of an array, i.e. at each step, maintain maximum with inserting and deleting one element at each step. Note that we can compute maxCol in similar way.

This can be done either maintaining a multi-set (balanced binary search tree) of K elements, in which insertion/deletion and finding maximum can be done in \mathcal{O}(log(n)) time. Sadly this method is slower for passing largest subtask.

You can find a \mathcal{O}(n) solution for it by using doubly ended queue (aka deque). Please see this link for its very detailed explanation.

Time Complexity:

\mathcal{O}(n m) for finding both maxCol and max matrices.

AUTHOR’S AND TESTER’S SOLUTIONS:

4 Likes

The method described above to find the maximum value in all possible subarrays of K is also called Sliding Window Maximum. Though this method pretty much passes all the subtask within time, I am not able to understand why is the combination of Sparse Table + Sliding Window Maximum failing to pass the Subtask # 3.

I precomputed the Sparse Table for each row, that is O(nmlogm), where n is the number of rows and m is the size of each row. So, now I can get minimum for each subarray(of any size, let say a) of each row in O(1), and then I apply Sliding window there to find minimum for each row, in O(b), where b is the number of rows in target submatrix.

So total time complexity becomes O(nmlogm + 4QMN) where as, if use two sliding windows in each query it becomes O(5QMN), how does the double sliding window pass and sparse+sliding window isn’t passing?

2MN is required to calculate sum matrix, MN is required for finding answer, and 2MN for two sliding windows for each submatrix.

This is only possible when logM>Q, but since Q can be 50 but logM is maximum 10, why isn’t the Sparse Table solution passing. Please correct me if I am wrong somewhere.

This is my Sparse Table Solution.

the complexity formula is just a generalization. there are often many more constants to it that could have made the complexity more “complex”.

You can also use a 2-d version of the spare table to compute the maximum. With MNlogMlogN computation you can answer a max query on a any given sub-matrix in O(1).

Basically apply RMQ on every row of the matrix, then along the columns of the output of the first RMQ.

5 Likes

I think that for each matrix of dimension a*b, the query complexity is O(log(a)+log(b)) and not O(1) , though I would agree that according to the given constraints O(log(a)+log(b) ) is small enough to be considered as O(1).

I have used 2-d version of sparse table for maximum element in sub matrix and pre-compute sum in o(2*n) which is O(n) time. but i cant pass third sub task can anyone tell me where i go wrong or how can i do better than this.
https://www.codechef.com/viewsolution/10499476

Its not log(a) + log(b) but its not just 1 operation either. Technically you take max of 4 values. Which is 3 comparison operations. And yes, it is small enough that it can be considered O(1).

Hey,
When you are using __builtin_clz(b) or __builtin_clz(a) isn’t it taking O(log(b)) and O(log(a)) to compute the leading zeros in the binary representation?

you should find sum in o(1) using summed area table… and optimize 2-d version of your sparse matrix.

i too did using multiple sliding windows ,
actually log(n) sliding windows , 1x1 , 2x2, 4x4 , 8x8 ,16x16 till 512x512.

I used multiple sliding window for getting the maximum element in sub-matrix.

I maximum element in all the sub-matrix of sizes 1x1 , 2x2 ,4x4,16x16 till 512x512 and for every query used appropriate window to find the maximum element in that sub-matrix.

Time complexity for pre-computation : 10xnxm Overall Time compleity = qnm*logn

Here is a link to my Solution using 2-D sparse tables.

For those who got TLE in some test cases/full subtask-3 using this, here are a few optimisations:

1. Multi-dimensional arrays take a lot of time for array access. To increase their caching and access speed, it is advised to keep the indices from smaller to larger i.e. declaring rmq array like rmq[11][11][1002][1002]. (Try this problem for more details)
2. You should pre-compute the log values, although internal log implementation is O(1), the constant factor is large. So computing log the number of times in query increases the time of your program.
3. Also, a not so needed optimisation is to pre-compute the powers of 2 as well and declare array size as much as required only.

I hope it helps.

3 Likes

Not able to view solutions.

Even I used the same approach but my 2d sparse table caused TLE in subtask 3 . I think O(MNlog(M)log(N)) build time is insufficient

Submission : https://www.codechef.com/viewsolution/10438764

I created 2 D table to store maximum element inside each submatrix. Using DP, I am filling this spare table in O(n*m). I am not getting TLE but I [got] WA in many cases. Please help me figure out what am I doing wrong here. Link to my solution : https://www.codechef.com/viewsolution/10506486

A lot of people have implemented 2d sparse table like this int sparse [m] [n] [logm] [logn] and they got an tle in subtask#3
Now the catch is if u would implemented like this int sparse [logm] [logn] [m] [n] then it would have passed all casses with flying colors.
Link to my submission by using 2d sparse table: https://www.codechef.com/viewsolution/10464323

Unable to view setter’s or tester’s solution. And also in the Explanation it should be: “the minimum number of operations required will be x∗a∗b−S.” instead of: “the minimum number of operations required will be x∗n∗m−S.”

@farzi, I think if the CPU supports it, those are run in a single instruction and therefore technically O(1) but I am not 100% sure about that.

I used 2D Sparse Table. I got TLE at first when using the log2() function, then I got AC when I used __builtin_clz().
https://www.codechef.com/viewsolution/10296148

Really liked your technique of calculating log2.

But do you think A[2][100] can be accessed faster than A[100][2]?How?

//