PROBLEM LINK:
Author: Kevin Atienza
Testers: Sergey Kulik and Vasya Antoniuk
Translators: Sergey Kulik (Russian), Team VNOI (Vietnamese) and Hu Zecong (Mandarin)
Editorialist: Kevin Atienza
DIFFICULTY:
Easy
PREREQUISITES:
Strings, Greedy algorithms
PROBLEM:
A string is K-good if for every two letters in the string, if the first appears x times and the second y times, then |x - y| \le K. Given K and a string w, what is the minimum number of letters to remove to make w K-good?
QUICK EXPLANATION:
For every m, define F[m] as follows:
- Remove every letter that appears < m times.
- For every letter that appears > m+K times, remove just enough letters so that it appears exactly m+K times.
- Let the number of removed letters be F[m].
The answer is then the smallest among \{F[1], F[2], \ldots, F[N]\}. To compute this quickly, simply precompute the letter frequencies in w beforehand, and operate on that array when computing F[m].
EXPLANATION:
Letās use the notation w_{(c)} to denote the number of appearances of the letter c in the string w.
K-goodness
By definition, a string is K-good if for every two characters c and d, \left|w_{(c)} - w_{(d)}\right| \le K. But for a given string w, the value of \left|w_{(c)} - w_{(d)}\right| is maximized if c and d are the most and least occurring characters in w (or vice versa). Thus, we have a potentially simpler criterion for telling whether a string is K-good:
If c and d are the most- and least- occurring characters in w, then w is K-good if and only if w_{(c)} - w_{(d)} \le K.
Notice that we have removed the absolute value because w_{(c)} \ge w_{(d)}.
Removing letters
This immediately gives us a greedy algorithm for turning w into a K-good string in a few letter removals:
- Compute the frequency counts for each of the 26 letters.
- Find the least-occurring character d. (Note that we only consider characters that actually appear in w, that is, whose frequency counts are positive.)
- For every character c appearing more than w_{(d)} + K times, remove w_{(c)} - (w_{(d)} + K) instances of c.
Clearly this algorithm yields a K-good string if you use our criterion above. Unfortunately though, it doesnāt yield the minimum number of removals! For example, consider the input boooooooooo 0
. The algorithm above will have us remove 9 instances of the letter āo
ā, yielding the string ābo
ā which is 0-good. But itās possible to do it with just 1 removal: just remove the āb
ā!
So how do we fix this? Clearly, we canāt assume that the least-occurring character in w will appear in the final string. In fact, it isnāt obvious what the minimum frequency count must be in the final string. So what do we do? We just try them all!
Our adjusted algorithm would be:
- Decide on the minimum frequency count on the final string, and call it m.
- Remove every letter that appears less than m times.
- For every letter c that appears more than m + K times, remove w_{(c)} - (m + K) instances of c.
The answer is then the minimum number of removals for all such values m in the range m \in [1,N]. Now, clearly this is correct, but is it fast enough? Well, it depends on your implementation. Obviously, if you process the string for every m, youāll get an O(N^2) running time which is pretty bad. But thankfully, we can reduce it by noticing that every iteration of the above procedure only requires the frequency counts of all the letters! By computing the frequency counts just once and then using that array instead, we get a running time of O(N)!
Hereās a pseudocode:
// compute counts
counts['a'..'z'] // initialized to zero
for i=0..|w|-1:
counts[w[i]]++
// compute answer
answer = N
for m=1..N:
current = 0
for c='a'..'z':
if counts[c] < m:
current += counts[c]
else if counts[c] > m + K:
current += counts[c] - (m + K)
if answer > current:
answer = current
print answer
Optimization
We can actually still optimize the algorithm above, by noticing that we donāt have to try all values m in the range [1,N]. Instead, we only try the m values that are equal to some w_{(c)}!
Why is that? Suppose we chose a value m and itās not equal to any w_{(c)}. Then if we instead choose m+1, then there arenāt any additional characters whose frequency counts are < m+1, so we donāt have to remove any additional characters. However, we potentially save a few removals, particularly those whose frequency counts are > m+K! Thus, the result of the procedure above for m cannot be strictly less than its result for m+1, so it follows we donāt have to check m if itās not equal to any w_{(c)}!
With this optimization, the number of times we perform the procedure reduces from N to (at most) 26! To formalize the optimization, notice that our original O(N) algorithm actually runs in O(N\alpha) where \alpha is the size of the alphabet. This optimization reduces that to O(N + \alpha^2), which is usually better because N \gg \alpha usually!
Hereās a pseudocode:
// compute counts
counts['a'..'z'] // initialized to zero
for i=0..|w|-1:
counts[w[i]]++
// compute answer
answer = N
for d='a'..'z':
m = counts[d] // consider only the relevant m's
current = 0
for c='a'..'z':
if counts[c] < m:
current += counts[c]
else if counts[c] > m + K:
current += counts[c] - (m + K)
if answer > current:
answer = current
print answer
Time Complexity:
O(N)