### PROBLEM LINK:

**Author:** Sunny Aggarwal

**Tester:** Vasya Antoniuk

**Editorialist:** Pushkar Mishra

### DIFFICULTY:

Easy

### PREREQUISITES:

Greedy

### PROBLEM:

Given is an array A of N integers. We want to destroy the array by performing two types of operations. The first operation lets us pick two numbers x and y in the array such that x \neq y and discard both of them. The operation allows us to pick a single number in the array and discard it. Thus, the former operation reduces the array size by 2 and the latter by 1. What is the minimum number of steps in which we can reduce the array to 0 size.

### EXPLANATION:

What is the simplest algorithm that we can think of? We can simply consider the adjacent elements in the array. If they are unequal, we discard both of them. If they are equal, we discard the first one and repeat the process until we exhaust the array. We have some observations to make here:

- When would we be the luckiest? If every time we took a pair of elements and the elements would turn out to be unequal. This is because that would allow us to discard two elements in a single move.
- When would we be the unluckiest? Let us assume element x appears the highest number of times. What if we were left with all the x's in the end. We would have to remove them one by one.

We need to design an algorithm which reduces the chance of being unlucky and increases the change of being lucky. If both these can be accomplished, we would have a strong algorithm which might be optimal. Our first aim is to try to pick up pairs of unequal elements. To do this, let us group the same elements together in the form (x_i, f_i), where x_i is the number and f_i the frequency (number of times it appears in array A). Let us say there are k distinct elements in the array. Thus,

\sum_{i = 1}^{k} f_i = N

Once we have the k pairs of distinct elements with their frequency, i.e., (x_i, f_i), we can proceed with thinking of a way to choose pairs of elements to discard. The simplest way is to choose any two x_i and x_j which have non-zero frequency and reduce their corresponding f_i and f_j by 1. This goes on till only one x_l is left with non-zero frequency f_l. Now comes in the second observation. It will be best for us to minimize the f_l so as to avoid being unlucky. How can we do this?

Let us rely on our intuition to come up with an algorithm. At each step, we will choose two numbers x_p, x_q to discard such that x_p \neq x_q and f_p and f_q are the highest frequencies amongst the elements left with non-zero frequency. This is the optimal approach. It is greedy. But what is the guarantee that the algorithm is correct? Letâ€™s try to prove it using exchange argument. This is a standard approach for proving the optimality of greedy algorithms. You can read more about greedy and their proofs here.

Letâ€™s say that at a particular step in the algorithm, we choose some x_u and x_v such that f_u and f_v are not the two highest frequencies amongst the elements with non-zero frequencies. As per the exchange argument, we have to show that choosing x_p and x_q instead would yield a better solution.

What advantage do we get if we choose (x_p, x_q) instead of (x_u, x_v)? By reducing f_p and f_q by 1 each, we have possibly reduced the maximum value of f_l that can be left over in the end by 1 (since f_p and f_q are the highest frequencies). If we choose x_u and x_v instead, this possibility is gone. Therefore, at each step, choosing two numbers such that their frequencies are the highest two amongst those with non-zero frequency is always a better option. In other words, instead of choosing pair (x_u, x_v), we exchange the move with choosing (x_p, x_q) and discarding it. We have proved the optimality.

How do we implement this? To get the highest two frequencies at each step, we can use a max priority-queue. Such a data structure is in-built in most popular programming languages. One can also implement it in the form of max-heaps. Please see editorialistâ€™s solution for implementation details.

### COMPLEXITY:

\mathcal{O}(N\log N) per test case.