Gmedian - editorial

taran_1407 · November 12, 2018, 4:14pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2

Setter: Teja Vardhan Reddy
Tester: Zhong Ziqian
Editorialist: Taranpreet Singh

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Combinatorics and Factorials. Modular Arithmetic.

PROBLEM:

Given an array A of size N, Count the number of ways we can select a subset from A such that the median of selected subset is also present as an element in the subset.

SUPER QUICK EXPLANATION

Every subset of odd size has its median present in the subset, so, we can directly add 2^{N-1} to answer.
For even size subset, The subset is good, if and only if middle two elements are equal.
We can fix the left middle element, and for every possible even size of the subset, say 2*X, try to

EXPLANATION

This problem is probably the best example of how we can use combinatorics to optimize our solution from O(N^3) to O(N) (except sorting).

First of all, let us formulate the definition of median.

If the size of the selected subset is odd, the median is just the middle element of subset after sorting. Since the middle element is present in the subset, all subsets of odd size are valid. It can be easily proven that there are 2^{N-1} such subsets. So, we can directly count these and move toward even size subsets.

If the size of the selected subset is even, the median is defined as the average of two middlemost elements after sorting. Now, say we have two middle elements x and y, with condition x \leq y. Let z = (x+y)/2 be the median of sequence. If we write y = x+d, d \geq 0, we can see, z = x+d/2 and also, z = y-d/2. This way, we can see, that the median of a sequence can never be smaller than x and greater than y. So, For z to be present in subset, we need either z = x or z = y. But, this would imply d = 0, Hence leading to the conclusion that

For an even size subset to be valid, the two middlemost elements should be equal. This forms the crux of our solution, and now, we need to count the number of even sized subsets with equal middle elements.

After all this, there are a number of approaches to solving this problem, all of which required us to sort the array A first.

Let us consider a O(N^3) solution first.

Iterate over every pair of equal elements (i, j) such that A_i = A_j and iterate over the size 2*X of subset from X = 1 to N. The number of ways to make the subset of size X with two fixed middle elements is just the product of the number of ways we can select X-1 elements from [1, i-1] and X-1 elements from [j+1, N].

This solution requires to iterate over every pair (i,j) which takes O(N^2) time and O(N) time per pair, leading to Overall time complexity O(N^3) which shall pass only the first subtask.

For a faster implementation, We will see two implementations, one from setter, and another interesting solution which we can further optimize to O(N) except for sorting.

Setter’s Implementation

First of all, see what we were doing in the previous solution.

We were fixing two equal elements and tried to count the number of ways we can make subsets of all sizes. Now, We shall fix only the Left Middle Element (Or Right one, whichever implementation you prefer).

Suppose we fixed the ith element as the left middle element. Now, We will iterate over all sizes 2*X and try to include X-1 elements from [1, i-1] and X elements from [i+1, N]. We need the right middle element to be same as the left middle element. So, When choosing X elements from [i+1, N], we need at least one occurrence of A[i]. This is same as subtracting all the ways to select X elements in the range [i+1, N] which do not have A[i] at all. Suppose Number of occurrence of A[i] in range [i+1, N] is f, then we can count the number of ways to select X elements from range [i+1, N] such that it contains at least one occurrence of A[i] as

T = Number of ways to select X elements out of N-i elements less Number of ways to select X elements out of N-i-f elements.

We can select X-1 elements from [1, i-1] in suppose U ways. Then, Number of ways we can have good subsets with A[i] as the left middle element is U*T. Summation of this product for all sizes for all elements gives us the number of good subsets of even size. We can add to it, the number of good odd sized subsets and print the answer.

Alternative Implementation.

For this solution, We will not count good subsets, but subtract bad subsets from total subsets. The total number of subsets is 2^N out of which one is empty. Empty subset doesn’t contain its median, hence a bad subset. So left with 2^N-1 subsets.

In this solution, we will count the number of subsets which have the different value of middle elements. For this solution, we iterate over all distinct elements and try to the number of ways to select subsets of all even sizes.

In this solution too, we fix the left middle element and try to count the number of bad subsets.

Suppose we have cnt elements smaller than A[i] and There are total f occurrences of A[i] and we have to make a subset of size 2*X, Then, we need to select X elements from cnt+f such that it contains atleast 1 occurrence of A[i] and selecting X elements from Elements strictly greater than A[i], THere are N-cnt-f such elements. The required number of ways is the product of both. We can find the summation of this product over all even sizes of subsets for all distinct values.

This solution also has time complexity O(N^2), thus fits the time limit easily.

Hint to O(N) Solution. Note: This optimization is not exactly necessary.

We rely on the same idea as the alternative implementation. We still iterate over all distinct values, but now, we will find a closed form to count bad subsets of all sizes. But the thing is, we try to find a closed form for all sizes.

We see, the summation is like ^nC_x*^mC_x for all sizes x. Can you find the closed form? Seems like many users actually did, and submitted a Linear time solution during the contest itself.

For those who did not reduce, this is left as an exercise for them.

Looking for something helpful? See Vandermonde’s identity here.

Computing Number of ways to select X elements out of N elements

The thing is, we need to calculate N select X efficiently. We know from the definition of Combinations that we can represent them in form of factorials. We can precompute Factorials and their inverses for the division. Then, ^NC_R = N!*inv(R!)*inv((N-R)!).

For details, refer this excellent post here.

Time Complexity

Time complexity is O(N^2) per test case for the intended solution.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution
Tester’s solution
Editorialist’s solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed.

eugalt · November 15, 2018, 4:34am

My python solution.

eugalt · November 17, 2018, 12:33pm

What’s the point of the constraint A_i \leq 2N?

islingr · November 18, 2018, 9:04pm

Here’s my O(n) time complexity solution.

Let \epsilon = \{a | a \in A \} (i.e., the set of distinct elements of A), let k = |\epsilon| and n_i refer to the number of occurrences of \epsilon_i in A, assume that \epsilon is sorted.
Obviously \sum_{i=1}^{k} n_i = n.

Also, we use the notation \binom{n}{r} to denote the number of ways of selecting r elements out of n objects.

\binom{n}{r} = \binom{n}{n - r} = \begin{cases} 1 & \text{if } n = r = 0 \\ \frac{n!}{r!(n-r)!} & \text{if } r \in \mathbb{N}_0 \wedge r \leq n\\ 0 & \text{in any other case} \end{cases}

Now, clearly a subsequence B \subseteq A is good iff |B| is odd or the middle two elements of B are equal.

We could add 2^{n-1} to the answer because that’s the number of subsequences of odd length, however this would cause us a lot of problems later on.

Now, for each \epsilon_i, we count all the subsequences such that \epsilon_i is the median of those subsequences. Clearly summing this up over all \epsilon_i's gives us this answer.

Now comes the matter of actually counting this.

We do this in the following manner:

First take one \epsilon_i, now, the only way \epsilon_i can be the median is if the array is of odd length and the number of elements preceding and succeeding it are equal. Thus, there are

\binom{n_i}{1}\sum_{p - q = 0} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}

of these sequences.
2. Next choose two \epsilon_i. \epsilon_i can be the median if the number of elements preceding and succeeding it are equal or if the number of elements preceding and succeeding it differ by 1. Thus, there are

\binom{n_i}{2} \sum_{-1 \le r \le 1} \sum_{p - q = r} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}

and so on…

In general, if we choose m \epsilon_i's, we have

\binom{n_i}{m}\sum_{r = 1 - m}^{m - 1} \sum_{p - q = r} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}

that is the number of elements preceding and succeeding \epsilon_i differ by atmost m - 1.

Thus, our final answer is

\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}\sum_{r = 1 - m}^{m - 1} \sum_{p - q = r} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q}.

Now, you maybe wondering in how in the world I could claim this solution to have O(n) time complexity with 6 of those summations, not mentioning the time required to compute a binomial coefficient.

Vandermonde’s Identity to the rescue! (I can not stress enough as to how useful this identity has been in a lot of questions for me)
Here’s the identity:

\sum_{x+y=k} \binom{m}{x}\binom{n}{y} = \binom{m+n}{k}.

It might not be directly obvious how to use this identity in our question, given that we we have p - q = r and not p + q = r, however this is rather easy to work around using the identity \binom{n}{r} = \binom{n}{n-r}. Thus, we have

\begin{aligned} \sum_{p - q = r} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{q} &=\sum_{p +(\sum_{j=i + 1}^{k} n_{j} - q) = \sum_{j=i + 1}^{k} n_{j} + r} \binom{\sum_{j=1}^{i-1} n_{j} }{p} \binom{\sum_{j=i + 1}^{k} n_{j} }{\sum_{j=i + 1}^{k} n_{j}-q}\\ &= \binom{\sum_{j=1 \wedge j \neq i}^{k} n_{j}}{\sum_{j=i + 1}^{k} n_{j}+r} = \binom{\sum_{j=1 \wedge j \neq i}^{k} n_{j}}{\sum_{j=1}^{i-1} n_{j}-r}\\ &= \binom{n-n_i}{p_i-r} \end{aligned}

It looks so much nicer already! Here p_i = \sum_{j=1}^{i-1} n_i, clearly p_{i+1} = p_i + n_i.

Here’s our answer now,

\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}\sum_{r = 1 - m}^{m - 1} \binom{n-n_i}{p_i-r}.

We are pretty much done here, it does not look like an O(n) time algorithm but it indeed is, using a few recurrence relations. If we define

t_{m, i} := \sum_{r = 1 - m}^{m - 1} \binom{n-n_i}{p_i-r}

then we have the recurrence, t_{m + 1, i} = t_{m, i} + \binom{n-n_i}{p_i-m} + \binom{n-n_i}{p_i+m} and t_{1, i} = \binom{n-n_i}{p_i}.

Our final answer becomes

\sum_{i=1}^{k}\sum_{m=1}^{n_i}\binom{n_i}{m}t_{m,i}.

The time complexity of this is O(\sum_{i=1}^{k} n_i T) = O(nT) where T is the time complexity for calculating \binom{n}{r}. We can easily have T = O(1) by simple computing all n! and (n!)^{-1} \mod 10^9 + 7 beforehand since n \leq 1000.

Also, I calculated the modular inverse by using this (the last method).

Here’s my code.

Remarks: This post is pretty long already
Here’s some remarks which don’t really add on much to the solution, read them if you feel like it.

In my implementation, rather than actually precomputing all values of n! and (n!)^{-1} beforehand I computed them till the maximum value they were required to whenever the factorial function was called and a static variable to keep track of the largest factorial computed up until to that point, same goes for modular inverse.

Also, in my original code, the one I submitted during the competition I did not precompute factorials though I did use caching for modular inverses. The solution was still well within the time limit.
Here’s that code.

For the set \epsilon, I used a map structure and incremented map[\epsilon_i] by 1 each time it encountered \epsilon_i in the input. You can instead also use a bookeeping array of length 2n + 1 and increment that instead whenever you encounter \epsilon_i instead, just because A_i \leq 2n.

Also, wow this post was pretty unecessarily long I could have made it shorter, but oh well. ¯_(ツ)_/¯

islingr · November 18, 2018, 9:06pm

Umm, why is the \LaTeX rendering so weird, it looks fine in the preview.

vijju123 · November 18, 2018, 9:09pm

Let me fix that.

islingr · November 18, 2018, 9:11pm

It works now, I edited it.
For some reason using square brackets instead of doubble dollar signs cause problem on codechef.

vijju123 · November 18, 2018, 9:13pm

Oh no…I overwrote those changes because I didnt saw this message. Any chance you can do it again please?

islingr · November 18, 2018, 9:15pm

Well it renders fine now, so I don’t think that’s necessary. Thanks!

vijju123 · November 18, 2018, 9:17pm

Cool!

You’re welcome

utkarsh911 · November 18, 2018, 10:40pm

This solution gives WA for one of the test cases :https://www.codechef.com/viewsolution/21514246
Strangely, removing the header bits/stdc++.h and putting stdio.h in place of it gives Correct Answer for that test case itself : https://www.codechef.com/viewsolution/21514268
Can anyone provide a test case where the first solution fails?
I don’t know what’s wrong and I don’t know which header file to use in future

vjvjain0 · February 2, 2019, 7:54pm

I took help from this series for calculating n \choose x*m \choose x for 0<=x<=min(n,m)

https://oeis.org/A046899
My code