PROBLEM LINK:
Author: Harshil Shah
Tester: Sergey Kulik
Editorialist: Xellos
DIFFICULTY:
MEDIUMHARD
PREREQUISITES:
submask sums, heavylight decomposition
PROBLEM:
Given an array A, find the sum of \mathrm{max}(A_{i..j}) over all i < j such that A_i \mathrm{\;AND\;} A_j is equal to either A_i or A_j.
QUICK EXPLANATION:
Look at a largest element, solve the problem for subarrays to its left and right recursively. For each element in the smaller subarray, count the ones in the larger subarray that satisfy the ANDcondition using meetinthemiddle on the first 7 and last 7 bits  count for all x e.g. the elements that are submasks of x with the last 7 bits equal to those of x. To speed up, count them by adding to their numbers computed for the larger subarray.
EXPLANATION:
Let’s view each number in binary representation as a bitmask. The bitwise x \mathrm{\;AND\;} y = y is only possible if y is a submask of x; it’s clear that it means x \ge y.
We’ll build on the following divide and conquer algorithm for computing the answer for an array A:

find a largest element of A (doesn’t matter which one if it’s not unique); this element A_c splits the array into a left subarray A_l and right subarray A_r (plus itself)

solve recursively for A_l and A_r

count the pairs A_i \mathrm{\;AND\;} A_j \in \left\lbrace {A_i,A_j} \right\rbrace (let’s call them submaskpairs) for A_i \in A_l or i=c and A_j \in A_r or j=c (but not i=j=c), add that number multiplied by A_c to the answer
Instead of summing over i,j first, we’re first summing over the element that gives the maximum and then over ranges [i,j] that contain it.
Note that while the maximum element in a subarray can be found using various direct algorithms like RMQ with the RMQ table, there’s another approach that’s common in this type of maximumcounting problems that in essence determines it as well: if we find the nearest larger element A_{il} to the left and nearest larger or equal element A_{ir} to the right of each A_i, then A_i will be the maximum we’d find in the D&C algorithm for the subarray A_{il..ir}  specifically, the leftmost maximum element. The nearest larger / not smaller element can be found using a wellknown stack algorithm in O(N).
Counting submaskpairs
We’re facing the following problem: given l,r,c, count submaskpairs A_i,A_j such that i \in [l,c] and j \in [c,r). Let’s just focus on pairs where A_i \mathrm{\;AND\;} A_j = A_i, since =A_j is done the same way and the total number of submaskpairs is their sum minus the number of pairs A_i=A_j, which are counted twice minus 1 (for i=j=c).
The key is using meetinthemiddle on the B=14 bits of A_i,A_j. For any such pair, we have exactly one way to change some of the B/2 first bits of A_i from 1 to 0 and some of the last B/2 bits of A_j from 0 to 1 so that the changed numbers are equal. Therefore, if we generate all those halfsubmasks of A_i, halfsupermasks of A_j, then the number of submaskpairs we’re looking for is the number of equal pairs halfsubmask/halfsupermask.
We can’t actually generate them all independently for each l,r,c, but we’re getting pretty close. For each halfsubmask x, we could count the elements A_j which have x as a halfsupermask. And if we’re careful, then we could generate all those halfsubmasks.
HLD tricks
Suppose that we know the number of elements A_j that have x as a halfsupermask for the left and right subarray in the recursion, call it S_l[x] and S_r[x]. We need to update it to get the same number for the current step (l,r) of the recursion; above, we pretty much described how to use it to count submaskpairs. Doing it in a straightforward way is too slow, though.
The key trick is that we take S[] for the larger subarray (let’s say S_l[] if rc1 < cl), look at halfsubmasks x of elements in the smaller subarray and sum up S_l[x] over them; that lets us count submaskpairs. There aren’t many halfsubmasks in total generated this way: since we’re doing it for the smaller subarray, we can only do it O(\log{N}) times for each element  this is the same principle as why there are at most O(\log{N}) paths to the root from any vertex in HLD. And with at most 2^{B/2} halfsubmasks of any number, that makes O(N\log{N} 2^{B/2}).
So how do we update S[] for the whole subarray [l,r)? We need to continue with the “smaller subarray” here: take S[] for the larger subarray again and add all halfsupermasks of elements in the smaller subarray to it this time. For the same reasons as with the counting part above, adding them has also O(N\log{N} 2^{B/2}) net complexity, but we can’t simply copy (e.g.) S_l[] to S[], since that’s an O(2^B) operation. We don’t have to copy arrays at all, though  just say that S_l[] became S[] for [l,r), for example by keeping one array for each path in the HLD (it really is the HLD of the recursion tree).
There’s another problem here: deleting arrays is also costly. To get around it, we can recycle them  when exiting a path, clear the S[] array corresponding to it and mark it as free to use again; when creating a new path (which is done in each leaf of the recursion), we can simply take one of the clear arrays. Since we only have O(\log{N}) paths open at any step of the recursion, it suffices to allocate O(\log{N}) clear arrays of size 2^B initially.
Small note: we need to add halfsupermasks of A_c first, then count the submaskpairs and then add the halfsupermasks of A_{c+1..r1} (or A_{l..c1}).
The time complexity is O(N\log{N}2^{B/2}), memory complexity O((N+2^B)\log{N}).
Alternative approaches
Instead of using meetinthemiddle on bits, we can split the array into buckets of size \sqrt{N} and use them to count full submasks/supermasks of all elements of the smaller subarray, bruteforcing small subarrays in the recursion. This approach is simpler, but slower  I easily made a program that runs in 0.4 seconds and after precomputing submasks in 0.12 seconds, and the asymptotics is probably something like O(NB\sqrt{N}).
AUTHOR’S AND TESTER’S SOLUTIONS:
The author’s solution can be found here.
The tester’s solution can be found here.
The editorialist’s solution can be found here.