### PROBLEM LINK:

**Author:** Kevin Atienza

**Testers:** Sergey Kulik and Vasya Antoniuk

**Translators:** Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Easy

### PREREQUISITES:

Greedy

### PROBLEM:

Find a matching between two arrays S and T that minimizes the number of pairs that are equal.

### QUICK EXPLANATION:

For a value v, let C_v be the number of times v appears in S and T combined.

Let v' be the value with the maximum such count, i.e. C_{v'} \ge C_v for all v. Then the minimum number of matches is \max(C_{v'} - N, 0).

To obtain such a matching, first match v' with itself \max(C_{v'} - N, 0) times. Then, select the value that appears the most number of times among all unmatched values, then match this with a different value in the other array. It is guaranteed that this value exists. So repeat this until all values are matched.

### EXPLANATION:

As shown in the sample input, there are some cases where we can’t avoid having equal pairs. One clear case is when *the most-occurring value appears more than N times.* In this case, we can show that this value will be paired with itself at least once: There are N pairs, each pair having two values, so by the pigeonhole principle, this value appears twice in some pair. In fact, we can extend this argument to show that if the most-occurring value v appears x > N times, then at least x - N pairs match v with itself. This gives us the lower bound of \max(x - N, 0) on the minimum number of equal pairs.

But is this bound tight? Well, let’s see. if x > N, then certainly this can be achieved. Suppose v appears s times in S and t times in T. Note that s + t = x. Then:

- Match v with itself x - N times. Thus, on S there remain s - (x - N) unmatched v s, and on T, there remain t - (x - N).
- Match the non-v values in T with occurrences of v in S. There are N - t such values. Note that N - t = s - (x - N), so there are enough v s in S to pair them.
- Match the non-v values in S with occurrences of v in T. There are N - s such values. Similarly, note that N - s = t - (x - N).

Thus, we have found a matching where there are exactly \max(x - N, 0) = x - N equal pairs. So this takes care of the case x > N. But what if x \le N? Can we achieve \max(x - N, 0) = 0 equal pairs?

One idea might be to match the most-occurring values first. However, depending on how you do it, it might not work. For example, consider the example S = [1, 1, 1, 2] and T = [3, 3, 3, 2]. 1 and 3 are the most-occurring, so we match them first. Unfortunately, if you do it this way, you’ll be left with just 2 s for the final pair, giving you an equal pair!

Fortunately, we can fix this algorithm. The mistake above is assuming that we can match all 1 s with 3 s. The problem with that is, after matching 1 and 3 twice, the most-occurring value becomes 2, so we need to switch to matching 2 instead. The adjusted algorithm becomes:

- Find the value v that occurs the most number of times in both arrays combined.
- Find an occurrence of it in either S or T.
- If you found an occurrence in S, find a value w \not= v in T, match it with v, and remove the occurrence of v from S and of w from T.
- If you found an occurrence in T, find a value w \not= v in S, match it with v, and remove the occurrence of v from T and of w from S.
- Repeat until S and T become empty.

Clearly, this produces a matching with no equal pairs, but this only works if in step 3 or 4, the value w exists. During the first pass, it’s clear that w exists, but in later passes this isn’t so clear. We can prove it by showing the following *invariant*:

**At any point during the algorithm, the most-occurring value in S and T combined appears at most |S| times.**

If this is true, then during step 3, T cannot consist purely of v s because that would mean v appears at least |T|+1 > |S| times. Thus, there must be some value w \not= v in T. A similar statement is true for step 4.

We can show it with induction and case-by-case analysis. For the base case, this is true by assumption. For the inductive case, assume it holds true at some point during the algorithm. We want to show that it still holds after performing either step 3 or step 4. Specifically, if the most-occurring value appears at most |S| times, then after doing step 3 or 4, the new most-occurring value appears at most |S| - 1 times.

- If the most-occurring value v appears less than |S| times, then after performing step 3 or 4, every value still appears at most |S| - 1 times, so the invariant holds.
- If the most-occurring value v appears exactly |S| times, and the second most-occurring value v' appears less than |S| times, then after performing step 3 or 4, v will have exactly |S| - 1 occurrence, while every other value appears at most |S| - 1 times (because the second most-occurring value appears at most |S|-1 times), so the invariant holds.
- Finally, suppose the most-occurring value v appears exactly |S| times, and the second most-occurring value v' also appears exactly |S| times. Notice there are exactly |S| + |T| = 2|S| positions in both arrays. Thus, all positions in S and T are already occupied by v and v', and S and T must consist entirely of v s and $v’$s! Therefore, the w that we select in step 3 or 4 can only be v', and after the step, they each will appear exactly |S|-1 times each.

This means that the algorithm above is correct!

### Time Complexity:

O(N^2)