### PROBLEM LINK:

**Author:** Dymtro Berezein

**Tester:** Mugurel Ionut Andreica

**Editorialist:** Praveen Dhinwa

### DIFFICULTY:

medium

### PREREQUISITES:

dp

### PROBLEM:

You are given an array a of size n. You have to find number of non-intersecting subarrays of this array. Two subarrays [i, j], [k, l] (where i \leq j < k \leq l), are said to be non-intersecting if there does not exist an element x which occurs in both the subarrays.

### EXPLANATION:

The simplest solution will be to iterate over boundaries of both the subarrays, i.e. over i, j, k, l and check whether the subarrays [a_i, a_j] and [a_k, a_l] are intersecting or not. For checking whether two subarrays are intersecting or not, you can iterate over each element of one of the subarrays and check whether it is present in the other subarray or not. For each element of first subarray, you can check in \mathcal{O}(log n) time if you maintain a set (balanced binary search tree) of elements of second subarray. You can do this check in \mathcal{O}(1) time if you maintain a hash of elements of second subarray. One thing that you can observe is that order of numbers in the problem does not matter, so you can remap the numbers to smaller numbers starting from 1. For example, if a = [5, 4, 10^9, 10^8], you can remap this array to a = [2, 1, 4, 3]. Overall complexity of this solution will be \mathcal{O}(n^4) which is too slow.

Let us iterate over the second subarray and instead of iterating entirely over the first subarray, can we somehow find the number of first subarrays faster? Note that there is nothing special with second subarray, you can even iterate over first subarray and find the number of second subarrays faster. We are iterating over second subarray, as the second subarray lies entirely to the right of the first subarray, so if the second subarray is a_k, a_l, then first subarray will lie in the elements from 1 \dots k - 1.

Again, let us store a hash of the all elements of the right subarray. Now, we iterate from each i from 1 to k-1 and check whether a_i is present in the second subarray or not. We create a boolean array of size k-1, where b_i denotes whether a_i is present in the second subarray or not. So, now number of first subarrays in the range 1 to k-1, will be equal to number of subarrays of b with all zeros, which can be found easily by breaking the array b into continuous parts of equal elements. For example, if b = [1, 0, 0, 1, 0, 0], then we can split the array b into [1], [0, 0, 0], [1], [0, 0]. Number of subarrays of [0, 0, 0] will be \frac{3 * (3 + 1)}{2} = 6, and those of [0, 0] will be \frac{2 * (2 + 1)}{2} = 3. This algorithm is faster than previous and has a time complexity of \mathcal{O}(n^3). Can we do it faster?

Notice that for a fixed left end point of second subarray (i.e. k), we choose each right end point l \geq k, and for each right end point, we are iterating over the initial part of the array to find the number of first subarrays. Instead of iterating over the entire initial part of the array again while adding an element to right of second subarray, can we do it fast?

So, now we will iterate over each left end point of second subarray k, and keep adding elements to the right of this subarray one by one. We will check its effect on the initial array b of first k-1 elements, and find number of subarrays of all zeros in b faster. You can think b as a group of consecutive zeros or ones as discussed above. Now, when we add an element a_l to the right of the subarray a_{l-1}, there might be some elements in the first k-1 elements which are equal to a_{l}. For each such element we should set corresponding b_i to zero. For example, [1, 1, 1, 1, 0, 1, 1] be the b array before adding a_l. We can write it as [4, 1, 2], where the corresponding elements represent the length of consecutive group of same elements in order. If a_l = a_3, then the updated b should be [1, 1, 0, 1, 0, 1, 1], which is [2, 1, 1, 1, 2]. Note that only group changed is the group containing element a_3. It has split into three smaller groups. Note that if a_l = a_5, then there would have been no change in groups of b array at all. We can maintain these groups of array b using a set of segments where a segment will consist of the range of the corresponding group, and whether it contains zero or one element. For fixed k, in the worst case, you will split maximum k-1 segments. If you store segments into a C++ set (red black tree), the deletion and insertion of new segments will take \mathcal{O}(logn) step per operation. Hence, overall time complexity will be \mathcal{O}(n^2 \, logn). Can we get something faster than this?

Let us view the same solution in a slightly different way. For a fixed k (left end point of second subarray), we start enumerating l from n to k (now in the reverse order). This time, when we move from a_l to a_{l-1}, we will have to merge all the segments that contain a_l. This merging step can be implemented faster by a disjoint set union data structure. In disjoint set union, for each element, we keep track of the set in which it belongs. For that, we keep track of parent element of each set. In our disjoint set, we will also maintain the length of the set and the whether the set consists of all zeros or all ones. In the merge step, we can change the parents correspondingly. Overall time complexity of this algorithm will be \mathcal{O}(\alpha(n) . n), where \alpha(n) is inverse Ackerman function which is very slow growing function and is very small (less than 5 for practical values of n, as A(4, 4) is of the order of {2^{2^{2^{2^{16}}}}}). So, it can be considered a constant and you can say that complexity is \mathcal{O}(n^2).

Big takeway from this problem is to start with simplest approach and gradually making modifications in it to get a better complexity