### PROBLEM LINK:

**Author:** Arjun Arul

**Tester:** Kevin Atienza

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Easy-medium

### PREREQUISITES:

Dynamic programming

### PROBLEM:

Given an array A_1, \ldots A_N. This array will be reordered to a new array P_1, \ldots P_N according to the following method:

- P is initially empty.
- At each step, either the current leftmost or rightmost value of A is removed and added to the right or left of P.
- This proceeds until A is empty.

What is the minimum number of inversions in P?

### QUICK EXPLANATION:

Let’s define these functions:

- Let L(v,i,j) be the number of values A_k such that A_k < v, for all k outside of the range (i,j].
- Let G(v,i,j) be the number of values A_k such that A_k > v, for all k outside of the range (i,j].
- Let M(v,i,j) be \min(L(v,i,j),G(v,i,j)).

Each can be computed in O(N) time.

For 0 \le i \le j \le N, let f(i,j) be the minimum number of additional inversions assuming the elements A_1, \ldots, A_i and A_{j+1}, \ldots, A_N have already been placed in P. Note that the answer is f(0,N), and we have the base case f(i,i) = 0 for 0 \le i \le N. We also have the following recursive relationship:

Thus, by tabulating f (i.e. dynamic programming), we can solve the problem in O(N^3) time. More sophisticated ways of computing M(v,i,j) can result in a faster running time of O(N^2).

### EXPLANATION:

A brute-force solution for this problem (trying out all possibilities) is very slow, because for each of the N steps, there are up to four possible choices. In fact, we can show that the maximum number of possible arrays is O((2+\sqrt{2})^N) = O(3.415^N), which is obviously exponential. The exact figures can be found in OEIS. If you want the closed form, here it is (for N > 1):

# A dynamic programming solution

Let’s assume that we’re in the middle of the process: Say there are K values remaining in A (and thus N-K values in P). Since N-K values have already been placed, we have no more way to modify the number of inversions currently in P. However, we can still control the number of *additional* inversions we incur by selecting our next moves optimally (though we still don’t know how). There are a few insights that will help us get a faster solution:

- The first is that the number of additional inversions are
*independent of the order of the values currently in P*. This is because we can only add values at the beginning and end of P, and the relative order of the current values in P only affect the inversions between elements already in P. - At any point during the process, the remaining values in A form a contiguous subarray in the original A (because we only take values from the beginning or end of A).

Given these insights, we can now define a function f(i,j). For 0 \le i \le j \le N, let f(i,j) be the minimum number of additional inversions assuming the only remaining elements in A are A_{i+1}, A_{i+2}, \ldots, A_j (and the rest have been placed in P). Note that the answer is simply f(0,N), and we have the base case f(i,i) = 0 for all 0 \le i \le N (because there are no more elements to place).

Let’s now try to express f(i,j) recursively. Here are a few observations:

- The elements currently in P are [A_1, A_2, \ldots A_i, A_{j+1}, A_{j+2}, \ldots, A_N] (though not necessarily in that order).
- The next element we place in P is either A_{i+1} or A_j, and we only place it either at the beginning or at the end.

Let’s assume that we will place A_j next. The following two possibilities describe the number of additional inversions we get by adding A_j to P.

- If we place it at the beginning of P, the number of additional inversions is equal to the number of values currently in P that are
*less than*A_j. Let’s call this value L. - If we place it at the end of P, the number of additional inversions is equal to the number of values currently in P that are
*greater than*A_j. Let’s call this value G.

Also, in both cases, we still need to place the remaining elements: A_{i+1}, \ldots A_{j-1}. The minimum we can get is f(i,j-1) inversions, by definition. Thus, the minimum we can get is:

Similarly, if we will instead place A_{i+1} next, then the minimum we can get is:

where L and G are redefined so that A_j is replaced with A_{i+1}.

Since these are all the possibilities, we now have a recurrence for f(i,j). First, let’s define these functions:

- Let L(v,i,j) be the number of values A_k such that A_k < v, for all k in ranges [1,i] and (j,N].
- Let G(v,i,j) be the number of values A_k such that A_k > v, for all k in ranges [1,i] and (j,N].
- Let M(v,i,j) be \min(L(v,i,j),G(v,i,j)).

Each of these can be computed in O(N) time. Then we have the following relationship:

This means that we can build a 2D table for values of f incrementally by computing f(i,j) one by one in a certain order: in increasing order of j-i, such as in the following pseudocode:

```
f = Array[N+1][N+1]
for diff in 0...N:
for i in 0...N-diff:
j = i + diff
// compute f[i][j]
if i == j:
f[i][j] = 0
else:
f[i][j] = min(f[i][j-1] + M(A[j],i,j), f[i+1][j] + M(A[i+1],i,j))
```

Other orders are possible, such as in the following:

```
for i in 0...N:
for j in i...0 by -1:
// compute f[i][j]
// ...
```

After filling up the table `f`

, the answer can now be found in `f[0][N]`

! Now, let’s see how fast this runs. There are N(N+1)/2 = O(N^2) relevant entries in `f`

, and in each entry, we call M twice. Each call to M runs in O(N) in the worst case. Therefore, the running time is O(N)\cdot O(N^2) = O(N^3), which passes the time limit of the problem!

# Faster algorithms

Although the algorithm above is already good enough for the problem, it’s helpful to know faster techniques to solve the problem. Our techniques will revolve in calculating M(v,i,j) quickly, which implies we want to calculate L(v,i,j) and G(v,i,j) quickly. Since these two functions are symmetric, we can just focus on one function, say L(v,i,j).

Remember that L(v,i,j) is the number of values A_k such that A_k < v, for all k in ranges [1,i] and (j,N]. Suppose we plot N points in the 2D plane, (x_i,y_i) = (i,A_i) for all 1 \le i \le N. In other words, we plot the points (1,A_1), (2,A_2), \ldots, (N,A_N). Then L(v,i,j) is simply the number of points in two rectangles: [1,i]\times [0,v-1] and [j+1,N]\times [0,v-1]. Thus, we have reduced computing L(v,i,j) into two 2D range queries! Such queries can be answered quickly, in O(\log^2 N) time each, with the use of the standard range tree structure, with O(N \log N) preprocessing. Similarly, G(v,i,j) can also be computed in O(\log^2 N) time (and thus M(v,i,j) too). Therefore, our running time is now the faster O(N^2 \log^2 N). Using fractional cascading reduces this further to O(N^2 \log N).

In fact, we can go even faster, with the use of summed area tables! A *summed area table* is a structure on a 2D grid of numbers, which can compute the sum of any rectangular subgrid in O(1) time. The idea is a natural extension of idea of a *cumulative sums*. Suppose we want to process the 2D grid G[1\ldots R][1\ldots C]. The summed area table is simply another 2D grid S[0\ldots R][0\ldots C] such that S[i,j] is the sum of the subgrid with corners G[1,1] and G[i,j]. (In particular, S[R,C] is the sum of the whole grid.) Constructing S is easy with the following recurrence: (verify)

with base case S[i,j] = 0 when i = 0 or j = 0.

With S, we can now get the sum of any subgrid of G in O(1) time. The sum of the subgrid determined by [a,b]\times [c,d] is:

Now, back to our problem about 2D range queries. Notice that we can view a part of the 2D plane as a 2D grid. Since in all our points (x,y) it holds that 1 \le x \le N and 0 \le y \le 1000, we can easily create the relevant grid G, so that G[x,y] = 1 for every point (x,y) and 0 everywhere else. But what if y can be larger, say 10^9? In this case, we can no longer build G and S because they are too large. But we can fix this by using *coordinate compression*. This compression can be done in O(N \log N) time, and we can ensure that the resulting coordinates fall in the subgrid [1,N]\times[1,N]. Thus, we can take G to be simply an N\times N grid.

Once we have G, we can now compute S in O(N^2) time, and once we have S, we can now answer 2D range queries in O(1) time. Thus, the overall running time we have is O(N^2)!

### Time Complexity:

O(N^3) or O(N^2)