GOODPROB - Editorial

xellos0 · September 20, 2016, 2:47am

PROBLEM LINK:

Practice
Contest

Author: Harshil Shah
Tester: Sergey Kulik
Editorialist: Xellos

DIFFICULTY:

MEDIUM-HARD

PREREQUISITES:

submask sums, heavy-light decomposition

PROBLEM:

Given an array A, find the sum of \mathrm{max}(A_{i..j}) over all i < j such that A_i \mathrm{\;AND\;} A_j is equal to either A_i or A_j.

QUICK EXPLANATION:

Look at a largest element, solve the problem for subarrays to its left and right recursively. For each element in the smaller subarray, count the ones in the larger subarray that satisfy the AND-condition using meet-in-the-middle on the first 7 and last 7 bits - count for all x e.g. the elements that are submasks of x with the last 7 bits equal to those of x. To speed up, count them by adding to their numbers computed for the larger subarray.

EXPLANATION:

Let’s view each number in binary representation as a bitmask. The bitwise x \mathrm{\;AND\;} y = y is only possible if y is a submask of x; it’s clear that it means x \ge y.

We’ll build on the following divide and conquer algorithm for computing the answer for an array A:

find a largest element of A (doesn’t matter which one if it’s not unique); this element A_c splits the array into a left subarray A_l and right subarray A_r (plus itself)
solve recursively for A_l and A_r
count the pairs A_i \mathrm{\;AND\;} A_j \in \left\lbrace {A_i,A_j} \right\rbrace (let’s call them submask-pairs) for A_i \in A_l or i=c and A_j \in A_r or j=c (but not i=j=c), add that number multiplied by A_c to the answer

Instead of summing over i,j first, we’re first summing over the element that gives the maximum and then over ranges [i,j] that contain it.

Note that while the maximum element in a subarray can be found using various direct algorithms like RMQ with the RMQ table, there’s another approach that’s common in this type of maximum-counting problems that in essence determines it as well: if we find the nearest larger element A_{il} to the left and nearest larger or equal element A_{ir} to the right of each A_i, then A_i will be the maximum we’d find in the D&C algorithm for the subarray A_{il..ir} - specifically, the leftmost maximum element. The nearest larger / not smaller element can be found using a well-known stack algorithm in O(N).

Counting submask-pairs

We’re facing the following problem: given l,r,c, count submask-pairs A_i,A_j such that i \in [l,c] and j \in [c,r). Let’s just focus on pairs where A_i \mathrm{\;AND\;} A_j = A_i, since =A_j is done the same way and the total number of submask-pairs is their sum minus the number of pairs A_i=A_j, which are counted twice minus 1 (for i=j=c).

The key is using meet-in-the-middle on the B=14 bits of A_i,A_j. For any such pair, we have exactly one way to change some of the B/2 first bits of A_i from 1 to 0 and some of the last B/2 bits of A_j from 0 to 1 so that the changed numbers are equal. Therefore, if we generate all those half-submasks of A_i, half-supermasks of A_j, then the number of submask-pairs we’re looking for is the number of equal pairs half-submask/half-supermask.

We can’t actually generate them all independently for each l,r,c, but we’re getting pretty close. For each half-submask x, we could count the elements A_j which have x as a half-supermask. And if we’re careful, then we could generate all those half-submasks.

HLD tricks

Suppose that we know the number of elements A_j that have x as a half-supermask for the left and right subarray in the recursion, call it S_l[x] and S_r[x]. We need to update it to get the same number for the current step (l,r) of the recursion; above, we pretty much described how to use it to count submask-pairs. Doing it in a straightforward way is too slow, though.

The key trick is that we take S[] for the larger subarray (let’s say S_l[] if r-c-1 < c-l), look at half-submasks x of elements in the smaller subarray and sum up S_l[x] over them; that lets us count submask-pairs. There aren’t many half-submasks in total generated this way: since we’re doing it for the smaller subarray, we can only do it O(\log{N}) times for each element - this is the same principle as why there are at most O(\log{N}) paths to the root from any vertex in HLD. And with at most 2^{B/2} half-submasks of any number, that makes O(N\log{N} 2^{B/2}).

So how do we update S[] for the whole subarray [l,r)? We need to continue with the “smaller subarray” here: take S[] for the larger subarray again and add all half-supermasks of elements in the smaller subarray to it this time. For the same reasons as with the counting part above, adding them has also O(N\log{N} 2^{B/2}) net complexity, but we can’t simply copy (e.g.) S_l[] to S[], since that’s an O(2^B) operation. We don’t have to copy arrays at all, though - just say that S_l[] became S[] for [l,r), for example by keeping one array for each path in the HLD (it really is the HLD of the recursion tree).

There’s another problem here: deleting arrays is also costly. To get around it, we can recycle them - when exiting a path, clear the S[] array corresponding to it and mark it as free to use again; when creating a new path (which is done in each leaf of the recursion), we can simply take one of the clear arrays. Since we only have O(\log{N}) paths open at any step of the recursion, it suffices to allocate O(\log{N}) clear arrays of size 2^B initially.

Small note: we need to add half-supermasks of A_c first, then count the submask-pairs and then add the half-supermasks of A_{c+1..r-1} (or A_{l..c-1}).

The time complexity is O(N\log{N}2^{B/2}), memory complexity O((N+2^B)\log{N}).

Alternative approaches

Instead of using meet-in-the-middle on bits, we can split the array into buckets of size \sqrt{N} and use them to count full submasks/supermasks of all elements of the smaller subarray, bruteforcing small subarrays in the recursion. This approach is simpler, but slower - I easily made a program that runs in 0.4 seconds and after precomputing submasks in 0.12 seconds, and the asymptotics is probably something like O(NB\sqrt{N}).

AUTHOR’S AND TESTER’S SOLUTIONS:

The author’s solution can be found here.
The tester’s solution can be found here.
The editorialist’s solution can be found here.