HBIRD - Editorial

kevinsogo · March 13, 2016, 2:24pm

PROBLEM LINK:

Author: Sunny Aggarwal
Tester: Roman Furko
Translators: Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Easy

PREREQUISITES:

Dynamic programming, binary search

PROBLEM:

Given an N\times M grid of numbers \{H_{i,j} \}, answer Q queries of the following kind:

Given G, find the size of the largest set of “triangles” each of whose sum is positive and the sum of their sums is \le G.

There are eight kinds of triangles, depending on orientation.

QUICK EXPLANATION:

Compute all sums of all triangles in O(N^2 M) time using dynamic programming. (Handle each of the eight kinds separately.)
Sort these sums in increasing order. (Counting sort is useful because the sums are at most H_{\max} NM.)
Compute the cumulative array of sums.
To answer each query, perform binary search in the cumulative array.

EXPLANATION:

How many triangles?

The problem suggests that we need to find the list of sums of all possible triangles.

However, that looks infeasible because there might be too many triangles to begin with, which seems to suggest we shouldn’t enumerate them all. But with a little counting, we can actually determine the number of triangles.

Let’s consider the following type of triangle first:

. . . . . . . . . . .
. . . . . # . . . . .
. . . . # # # . . . .
. . . # # # # # . . .
. . # # # # # # # . .
. . . . . . . . . . .

The way to count triangles like this is to count bounding boxes instead. (We define the bounding box as the smallest rectangle that contains the figure.) This works because any given bounding box (of the right height-width ratio) determines exactly one triangle of this type. In the example above, the bounding box has dimensions 3\times 6. In general, bounding boxes of this type of triangle all have a height-width ratio of 1:2. Thus, the number of triangles of this type is the number of 1:2 rectangles contained in a rectangle of dimensions (N-1)\times (M-1). (Remember that N\times M is the size of the number grid, but the rectangle itself has dimensions (N-1)\times (M-1).) This can be computed as the following sum:

\sum_{\substack{1 \le k \le N-1 \\\ 1 \le 2k \le M-1}} (N-k)(M-2k)

We can find a very simple upper bound for this:

\begin{aligned} & \sum_{\substack{1 \le k \le N-1 \\\ 1 \le 2k \le M-1}} (N-k)(M-2k) \\\ &\le \sum_{\substack{1 \le k \le N-1 \\\ 1 \le 2k \le M-1}} (N-k)M \\\ &\le \sum_{1 \le k \le N-1} (N-k)M \\\ &\le \sum_{1 \le k \le N-1} kM \\\ &\le \frac{1}{2} N^2 M \end{aligned}

Thus, we have shown that there are at most \frac{1}{2} N^2 M triangles of this type. For the remaining seven types, we can perform a similar calculation and find out that there are also at most \frac{1}{2} N^2 M triangles of every other type. Thus, the total number of triangles overall is at most 4N^2 M, which in our current problem (based on the constraints) is small enough to be manageable. Thus, we should feel no guilt in trying to enumerate all triangles!

Computing the sum of all triangles

We have just established that the number of triangles isn’t too large. However, we can’t enumerate their sums naïvely because it takes a while to get the sum of each triangle. In other words, while the number of triangles is small, the sum of their areas is not. Thus, what we need is an efficient way of computing the sum of all triangles.

The key here is to notice that many triangles share a lot of entries, and some even differ only in very specific ways. For example,

. . . . . . . . . . .   . . . . . . . . . . .   . . . . . . . . . . .   
. . . . . # . . . . .   . . . . . . . . . . .   . . . . . # . . . . .   
. . . . # # # . . . .   . . . . . . . . . . .   . . . . # # # . . . .   
. . . # # # # # . . . + . . . . . . . . . . . = . . . # # # # # . . .   
. . # # # # # # # . .   . . . . . . . . . . .   . . # # # # # # # . .   
. . . . . . . . . . .   . # # # # # # # # # .   . # # # # # # # # # .   
. . . . . . . . . . .   . . . . . . . . . . .   . . . . . . . . . . .

The left triangle and the right triangle differ in only a single horizontal line segment. In other words, to get the sum of the right triangle, we only need to get the sum of the left triangle and add to it the sum of the horizontal line segment!

Similar relationships can be found in other triangle types:

. . . . . . . . . . .   . . . . . . . . . . .   . . . . . . . . . . .   
. . . . . . . . . . .   . # # # # # # # # # .   . # # # # # # # # # .   
. . # # # # # # # . .   . . . . . . . . . . .   . . # # # # # # # . .   
. . . # # # # # . . . + . . . . . . . . . . . = . . . # # # # # . . .   
. . . . # # # . . . .   . . . . . . . . . . .   . . . . # # # . . . .   
. . . . . # . . . . .   . . . . . . . . . . .   . . . . . # . . . . .   
. . . . . . . . . . .   . . . . . . . . . . .   . . . . . . . . . . .   


. . . . . . .   . . . . . . .   . . . . . . .
. . . . . # .   . . . . . . .   . . . . . # .
. . . . # # .   . . . . . . .   . . . . # # .
. . . # # # . + . . . . . . . = . . . # # # .
. . # # # # .   . . . . . . .   . . # # # # .
. . . . . . .   . # # # # # .   . # # # # # .
. . . . . . .   . . . . . . .   . . . . . . .


. . . . . . .   . . . . . . .   . . . . . . .
. . . . . . .   . # # # # # .   . # # # # # .
. . # # # # .   . . . . . . .   . . # # # # .
. . . # # # . + . . . . . . . = . . . # # # .
. . . . # # .   . . . . . . .   . . . . # # .
. . . . . # .   . . . . . . .   . . . . . # .
. . . . . . .   . . . . . . .   . . . . . . .


. . . . . . .   . . . . . . .   . . . . . . .
. # . . . . .   . . . . . . .   . # . . . . .
. # # . . . .   . . . . . . .   . # # . . . .
. # # # . . . + . . . . . . . = . # # # . . .
. # # # # . .   . . . . . . .   . # # # # . .
. . . . . . .   . # # # # # .   . # # # # # .
. . . . . . .   . . . . . . .   . . . . . . .


. . . . . . .   . . . . . . .   . . . . . . .
. . . . . . .   . # # # # # .   . # # # # # .
. # # # # . .   . . . . . . .   . # # # # . .
. # # # . . . + . . . . . . . = . # # # . . .
. # # . . . .   . . . . . . .   . # # . . . .
. # . . . . .   . . . . . . .   . # . . . . .
. . . . . . .   . . . . . . .   . . . . . . .


. . . . . .   . . . . . .   . . . . . .
. . . . . .   . . . . # .   . . . . # .
. . . # . .   . . . . # .   . . . # # .
. . # # . .   . . . . # .   . . # # # .
. # # # . . + . . . . # . = . # # # # .
. . # # . .   . . . . # .   . . # # # .
. . . # . .   . . . . # .   . . . # # .
. . . . . .   . . . . # .   . . . . # .
. . . . . .   . . . . . .   . . . . . .


. . . . . .   . . . . . .   . . . . . .
. . . . . .   . # . . . .   . # . . . .
. . # . . .   . # . . . .   . # # . . .
. . # # . .   . # . . . .   . # # # . .
. . # # # . + . # . . . . = . # # # # .
. . # # . .   . # . . . .   . # # # . .
. . # . . .   . # . . . .   . # # . . .
. . . . . .   . # . . . .   . # . . . .
. . . . . .   . . . . . .   . . . . . .

Thus, to enumerate the sums of all triangles efficiently, we only need to compute the sums of horizontal and vertical line segments quickly! Since the horizontal and vertical cases are very similar, we’ll only focus on the horizontal case. Specifically, given i and j_1 \le j_2, we want to find out the sum H_{i,j_1} + H_{i,j_1+1} + \ldots + H_{i,j_2}.

To do it quickly, we can simply compute the cumulative sums of each row! In more detail, for 1 \le i \le N and 0 \le j \le M, let R_{i,j} be the sum of the first j entries in the i th column, i.e. the sum H_{i,1} + H_{i,2} + \ldots + H_{i,j}. The R_{i,j} values can be efficiently computed with dynamic programming using the recurrence R_{i,j} = H_{i,j} + R_{i,j-1} and base case R_{i,0} = 0. Most importantly, the R_{i,j} values can be used to compute the sum H_{i,j_1} + H_{i,j_1+1} + \ldots + H_{i,j_2} efficiently as simply R_{i,j_2} - R_{i,j_1-1}!

(Similarly, we compute cumulative sums of each column to compute vertical line sums quickly.)

With these tools and methods, we now have an efficient way of enumerating all triangle sums!

The following pseudocode illustrates one way to do it:

// compute cumulative sums
for i = 0..n-1:
    for j = 0..m-1:
        D[i+1][j] = H[i][j] + D[i][j]
        R[i][j+1] = H[i][j] + R[i][j]

// compute all triangle sums
for i = 0..n-1:
    for j = 0..m-1:
        for (k = 1, s = H[i][j]; i-k >= 0 and j+k <  m; k++)
            yield (s += R[i-k][j+k+1] - R[i-k][j])

        for (k = 1, s = H[i][j]; i+k <  n and j+k <  m; k++)
            yield (s += R[i+k][j+k+1] - R[i+k][j])

        for (k = 1, s = H[i][j]; i-k >= 0 and j-k >= 0; k++)
            yield (s += R[i-k][j+1] - R[i-k][j-k])

        for (k = 1, s = H[i][j]; i+k <  n and j-k >= 0; k++)
            yield (s += R[i+k][j+1] - R[i+k][j-k])

        for (k = 1, s = H[i][j]; i-k >= 0 and j-k >= 0 and j+k < m; k++)
            yield (s += R[i-k][j+k+1] - R[i-k][j-k])

        for (k = 1, s = H[i][j]; i+k <  n and j-k >= 0 and j+k < m; k++)
            yield (s += R[i+k][j+k+1] - R[i+k][j-k])

        for (k = 1, s = H[i][j]; j-k >= 0 and i-k >= 0 and i+k < n; k++)
            yield (s += D[i+k+1][j-k] - D[i-k][j-k])

        for (k = 1, s = H[i][j]; j+k <  m and i-k >= 0 and i+k < n; k++)
            yield (s += D[i+k+1][j+k] - D[i-k][j+k])

(Note that this pseudocode is zero-indexed, but the idea is the same.)

This code yields every sum of some triangle exactly once.

Preparing the triangle sums for the queries

Now that we have a list of triangle sums, it’s time to answer queries. For a given query G, we need to find the size of the largest set of triangles each of whose sum is positive and the sum of their sums is \le G. Since we only care about positive triangle sums, we can simply ignore all triangle sums equal to zero.

Let’s make this more specific. Let’s say the nonzero triangle sums are [s_1, s_2, s_3, \ldots, s_k]. What is the size of the largest subset of these sums whose sum is \le G? Clearly, a greedy approach will work here, and we need to get the triangles with the smallest sums first. (Why?) Thus, our first step is to sort the list [s_1, \ldots s_k].

Any fast sorting algorithm will do, because the list is somewhat manageable. However, counting sort is particularly useful here, because 0 \le H_{i,j} \le 50, which means that the largest sum of any triangle is 50NM, which is less than 6 million.

Next, now that [s_1, \ldots, s_k] is sorted, the smallest sums are now in front of the array. Thus, in order to maximize the number of sums we get, we will need to take the leftmost element one by one until the sum exceeds G. In other words, the answer for the query G is the length of the longest prefix of [s_1, \ldots, s_k] whose sum is \le G.

It looks like we’ll be needing prefix sums again, so let’s try calculating all prefix sums beforehand. Let p_i be the sum of the first i elements of s, i.e. p_i = s_1 + s_2 + \ldots + s_i. As before, the $p_i$s can be computed with dynamic programming with the recurrence p_i = s_i + p_{i-1} and base case p_0 = 0. Thus, we can now rephrase the answer to a query: the answer to the query G is the largest i such that p_i \le G. But since p_i is increasing, this index can be computed using binary search!

The following pseudocode illustrates it: (Here k is the length of [s_1, \ldots, s_k])

def answer_query(G):
    L = 0
    R = k+1
    while R - L > 1:
        // we will maintain the invariant p[L] <= G < p[R]
        M = (L + R) / 2
        if p[M] <= G:
            L = M
        else:
            R = M
    return L

Binary search is very fast (O(\log k)), so this gives us a very fast way of answering queries. In our case, k \le 4N^2 M, so processing all Q queries takes O(Q \log (N^2 M)). Overall, the time complexity is O(N^2 M + H_{\max} NM + Q \log (N^2 M)):

O(N^2 M) for computing triangle sums,
O(H_{\max} NM) for sorting the sums,
O(N^2 M) for computing prefix sums, and
O(Q \log (N^2 M)) to answer queries with binary search.

Time Complexity:

O(N^2 M + H_{\max} NM + Q \log (N^2 M))

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester
editorialist

grebnesieh · March 15, 2016, 4:07pm

Any idea why this fails for the small cases?

I am storing for each i, birds[i] = number of birds with hunger = i. Then a linear iteration over the sorted queries to get the answer.

likecs · March 15, 2016, 4:12pm

Exactly the same solution as the editorial. You may refer to the solution here . You may remove the comments to check the execution at various steps.

Happy coding

aky1994 · March 15, 2016, 7:19pm

N^2*M for the case N=M=335 will come out something around 10^8. How is this computable under given time constraints?

code_hard123 · March 16, 2016, 10:13am

@aky1994 The total number of birds wont exceed 8e7 which is easily computable within 2 seconds in O(n * m * min ( n , m) ) time complexity.

kevinsogo · March 16, 2016, 10:35am

Enumerating even up to 2e8 triangles is doable within 2 seconds as long as you enumerate efficiently.

yougaindra · March 16, 2016, 11:41pm

But How can we store 2e8 triangles?

sharad07 · March 17, 2016, 12:03am

@yougaindra… u dont’t have to store 2e8 triangles…the largest possible triangle is 335 units high,with 335 units base width & has 335 units of diagonal length. It contains about (1+2+3…+335) cells which is about 56280 cells in total.Since the largest value any cell can hold is 50…that gives the largest sum to be about 50 * 56280 = 2814000 (less than 3 *10^6) and the smallest sum would be 0. Creating an array of that length is feasible also doing Binary search over this array can be done under given time constraint.

(If u find it helpful do upvote)

kevinsogo · March 17, 2016, 12:07am

You don’t need to store the triangles. You only need to compute the frequency distribution of the sums of the triangles, which can be stored in an array of size H_maxNM < 6 million.

likecs · March 17, 2016, 11:48am

Actually the test cases were weak according to me. Initially I was getting SIGABRT error due to using large memory. (https://www.codechef.com/viewsolution/9561215). But changing some parameters on the variable “limit” passed the solution easily for 100 points. Here is the code (https://www.codechef.com/viewsolution/9561479).

PS: Though I later managed to get the actual solution as explained in the editorial.

yougaindra · March 17, 2016, 1:29pm

But if you compute frequency distribution in an array why do you need to sort them as mentioned in editorial because this array would already be sorted?

kevinsogo · March 17, 2016, 1:31pm

Yes, the array would already be sorted so no need for a separate sorting step. “sorting” is just mentioned in the editorial to emphasize that we need the sums sorted.

yougaindra · March 17, 2016, 1:37pm

Also in editorialist’s solution prefix vector ‘ans’ contains sum(ct) elements and from calculations above sum(ct) is 2e18.

yougaindra · March 17, 2016, 1:41pm

in editorialist’s solution prefix vector ‘ans’ contains sum(ct) elements and from calculations above sum(ct) is 2e18, where ct is array containing frequency of different triangles.

prudvinit · March 17, 2016, 7:54pm

Why should we perform binary search in the end? We could check for the largest index where pi <= G in a single iteration.

vsp4 · March 17, 2016, 8:14pm

No of queries is large - 10**5. That would tle

sharad07 · March 17, 2016, 11:03pm

@yougaindra Well by largest sum I meant the sum of values of all the cells in the largest possible triangle and not the cumulative sum itself.

also the array bound that I had used in my code was 3 *10^6 which worked well.Had this not been the upper limit then I would have got SIGSEGV error.

kevinsogo · March 17, 2016, 11:06pm

So apparently you need to store 2e8 values anyway because of ‘ans’. But since CodeChef accepts it, it means we don’t need to worry any more. Also, note that 2e8 is a very loose bound, and most likely you’ll actually need much less.

kevinsogo · March 18, 2016, 12:19am

By the way, there’s still a way to not store 2e8 sums but at the cost of making “binary search” more complicated.

nil96 · March 19, 2016, 12:24pm

grid is NM so we can say there are n horizontal line and m vertical line so number of rectangle is NC2MC2 which n^2m^2 and each rectangle will produce 4 distinct right angle triangle where is flaw in my logic as I get order O(N^2M^2)
@aky1994 @likecs