FAIRPAIR - Editorial

PROBLEM LINK:

Contest
Practice

Author: Kevin Atienza
Testers: Sergey Kulik and Vasya Antoniuk
Translators: Vasya Antoniuk (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza

DIFFICULTY:

Easy

PREREQUISITES:

Greedy

PROBLEM:

Find a matching between two arrays S and T that minimizes the number of pairs that are equal.

QUICK EXPLANATION:

For a value v, let C_v be the number of times v appears in S and T combined.

Let v' be the value with the maximum such count, i.e. C_{v'} \ge C_v for all v. Then the minimum number of matches is \max(C_{v'} - N, 0).

To obtain such a matching, first match v' with itself \max(C_{v'} - N, 0) times. Then, select the value that appears the most number of times among all unmatched values, then match this with a different value in the other array. It is guaranteed that this value exists. So repeat this until all values are matched.

EXPLANATION:

As shown in the sample input, there are some cases where we can’t avoid having equal pairs. One clear case is when the most-occurring value appears more than N times. In this case, we can show that this value will be paired with itself at least once: There are N pairs, each pair having two values, so by the pigeonhole principle, this value appears twice in some pair. In fact, we can extend this argument to show that if the most-occurring value v appears x > N times, then at least x - N pairs match v with itself. This gives us the lower bound of \max(x - N, 0) on the minimum number of equal pairs.

But is this bound tight? Well, let’s see. if x > N, then certainly this can be achieved. Suppose v appears s times in S and t times in T. Note that s + t = x. Then:

  • Match v with itself x - N times. Thus, on S there remain s - (x - N) unmatched v s, and on T, there remain t - (x - N).
  • Match the non-v values in T with occurrences of v in S. There are N - t such values. Note that N - t = s - (x - N), so there are enough v s in S to pair them.
  • Match the non-v values in S with occurrences of v in T. There are N - s such values. Similarly, note that N - s = t - (x - N).

Thus, we have found a matching where there are exactly \max(x - N, 0) = x - N equal pairs. So this takes care of the case x > N. But what if x \le N? Can we achieve \max(x - N, 0) = 0 equal pairs?

One idea might be to match the most-occurring values first. However, depending on how you do it, it might not work. For example, consider the example S = [1, 1, 1, 2] and T = [3, 3, 3, 2]. 1 and 3 are the most-occurring, so we match them first. Unfortunately, if you do it this way, you’ll be left with just 2 s for the final pair, giving you an equal pair!

Fortunately, we can fix this algorithm. The mistake above is assuming that we can match all 1 s with 3 s. The problem with that is, after matching 1 and 3 twice, the most-occurring value becomes 2, so we need to switch to matching 2 instead. The adjusted algorithm becomes:

  1. Find the value v that occurs the most number of times in both arrays combined.
  2. Find an occurrence of it in either S or T.
  3. If you found an occurrence in S, find a value w \not= v in T, match it with v, and remove the occurrence of v from S and of w from T.
  4. If you found an occurrence in T, find a value w \not= v in S, match it with v, and remove the occurrence of v from T and of w from S.
  5. Repeat until S and T become empty.

Clearly, this produces a matching with no equal pairs, but this only works if in step 3 or 4, the value w exists. During the first pass, it’s clear that w exists, but in later passes this isn’t so clear. We can prove it by showing the following invariant:

At any point during the algorithm, the most-occurring value in S and T combined appears at most |S| times.

If this is true, then during step 3, T cannot consist purely of v s because that would mean v appears at least |T|+1 > |S| times. Thus, there must be some value w \not= v in T. A similar statement is true for step 4.

We can show it with induction and case-by-case analysis. For the base case, this is true by assumption. For the inductive case, assume it holds true at some point during the algorithm. We want to show that it still holds after performing either step 3 or step 4. Specifically, if the most-occurring value appears at most |S| times, then after doing step 3 or 4, the new most-occurring value appears at most |S| - 1 times.

  • If the most-occurring value v appears less than |S| times, then after performing step 3 or 4, every value still appears at most |S| - 1 times, so the invariant holds.
  • If the most-occurring value v appears exactly |S| times, and the second most-occurring value v' appears less than |S| times, then after performing step 3 or 4, v will have exactly |S| - 1 occurrence, while every other value appears at most |S| - 1 times (because the second most-occurring value appears at most |S|-1 times), so the invariant holds.
  • Finally, suppose the most-occurring value v appears exactly |S| times, and the second most-occurring value v' also appears exactly |S| times. Notice there are exactly |S| + |T| = 2|S| positions in both arrays. Thus, all positions in S and T are already occupied by v and v', and S and T must consist entirely of v s and $v’$s! Therefore, the w that we select in step 3 or 4 can only be v', and after the step, they each will appear exactly |S|-1 times each.

This means that the algorithm above is correct!

Time Complexity:

O(N^2)

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester

1 Like

I was getting wrong answer,although i believe the solution is correct(i have put an assertion for the same in my code too)…Can anybody get a case where it isn’t working ?

Solution Id:
https://www.codechef.com/viewsolution/10486331

Thanks in Advance
And one more thing,if the complexity of the solutoin provided in this editorial is O(N^2) then how is it supposed to work ?
This is frustrating!!!

O(N^2) is supposed to work, Check the constraints again, “The sum of the Ns in a single test file is ≤ 3×105”;

I have same problem as everyone asked. How does O(N^2) solution works. Yes I have noted that sum of N is at max 3 x 10^5 . And N at max is 1000. So let us suppose there are 300 test case each with n 1000. Sum is still 3 x 10^5. So it is valid according to constraint. Now As it is O(T*(N^2)) where T is number of test cases. So in our case it will be O(300 x 10^6) that is O(3 x 10^8). How can we expect this to run in just 2 seconds? The reference for ACM-ICPC Competitive Programming Book also says in 3 seconds around O(100M) solution works. The above solution works only if CPU is faster. But we can’t assume that. Am I wrong somewhere ?

I found an algorithm that’s way too simple to implement as well as understand the logic. See my code: https://www.codechef.com/viewsolution/10533045 Its simply scan the loop if student[i] matches with task[i] then first try to find suitable value on the left with which you can swap. Otherwise take the first different value on right side in task array and swap with it. If then also not found then keep it as it is and increment count of matches. All cases will be handled automatically. No edge case remains. Solution Accepted. Complexity O(N^2)

How can O(N^2) get accepted ?? Say we have T = 1, and N = (3*(10^5)),no of computations will be roughly (9*(10^10)) which clearly can’t get AC in 1s time limit…I guess there might be some problem with the test cases :(…Anyways did anyone found anysort of bug in my code ?

Link of the code: https://www.codechef.com/viewsolution/10486331