PROBLEM LINK:
Setter- Praveen Dhinwa
Tester- Misha Chorniy
Editorialist- Abhishek Pandey
DIFFICULTY:
EASY
PRE-REQUISITES:
Strings, Basic Math, Simulation.
PROBLEM:
Given a string S , with only characters being 'a' and 'b ', find number of good prefixes in the string T which is made by concatenating N copies of S, i.e.-
T = \underbrace{S + S + \dots + S}_{N\text{ times}}
A prefix is a good prefix if and only if number of occurrences of 'a' is strictly greater than the number of occurrences of 'b'.
QUICK EXPLANATION:
We make cases for this problem. If frequency of 'a' is equal to frequency of 'b' in string, then calculating answer is simple- for we can simply use Ans=G(S)*N where G(S) denotes number of Good Prefixes in S.
If frequency of 'a' is more than to frequency of 'b' , we notice that after at most 1000 appendations, each new appendation will contribute maximum possible value only, which is K (K=Length of string) .
Similarly we can see that if frequency of 'a' is less than to frequency of 'b', after 1000 appendations, each new appendation will contribute 0 to answer.
So if frequency of character 'a' is not equal to that of 'b', we make a string T by appending S to it 1000 times, and then use mathematical formulas to find contribution of every other appendation.
EXPLANATION:
As usual, we will be discussing Editorialistâs and testerâs approach. Editorialistâs approach is based on same lines as that of setter- and hence, will form main part of editorial. We will discuss better optimized solutions (testerâs approach) and common mistakes of contestants at the bonus part of editorial
Before that, please first get acquainted with the notations I will be using.
- G(S)= Number of Good Prefix in string S
- freq(a,S)= Frequency of character âaâ in string S.
- freq(b,S)= Frequency of character âbâ in string S.
- {S}^{p}= String formed by concatenation S p times.
1. Editorialistâs/Setterâs Solution-
If you want, you can scroll down to refer to my solution side by side while reading the editorial.
The first thing to note are the constraints. 1\le N \le {10}^{9} while 1\le |S| \le {10}^{3}.
We start by calculating number of Good Prefixes in S , i.e. G(S). Along with that, we also find freq(a,S) and freq(b,S). Now we make cases-
a. When freq(a,S) == freq(b,S)-
In this case, there is no affect of previous appendations of S on incoming/new appendations of S. In other words, when freq(a,S) \neq freq(b,S), then there is some change in value of (freq(a,S)-freq(b,S)) after each appendation of S to T, which affect the contribution of any future appendation of S towards G(T). In most basic words, it means that since freq(a,S) \neq freq(b,S), then number of 'b' in T will not increase at same rate as number of 'a' at each appendation, which will affect contribution of next appendation to the final answer.
This is not the case when freq(a,S) == freq(b,S). Hence, each appendation of S will contribute G(S) towards the final answer. There are N appendations of S and we already found G(S) by simple looping earlier. Hence, for this case, Answer=G(S)*N
b. When freq(a,S) < freq(b,S)-
In this case, because freq(a,S) < freq(b,S), each new appendation of S to T will decrease (freq(a,S)-freq(b,S)). In other words, the number of 'b' in T will increase more quickly than number of 'a', which will reduce the contribution of every future appendation of S towards final answer.
We see that, after every appendation, the contribution of next appendation must reduce by at least 1 , (\because freq(a,S) < freq(b,S)). Remember that length of S is only upto 1000. This simply means, that after at most 1000 appendations, the contribution of incoming appendation reduces to 0. (In fact, 1000 is a loose upper bound, we can go closer to the real value, but for sake of explaining lets take it 1000 right now )
So, we form the string T by appending string S min(N,1000) times as-
T = \underbrace{S + S + \dots + S}_{min(N,1000)\text{ times}}
and our answer is equal to G(T) as any future appendation has 0 contribution towards final answer.
Answer=G(T)
b. When freq(a,S) > freq(b,S)-
Clearly, each new appendation of S to T will increase (freq(a,S)-freq(b,S)). Thus, the number of 'a' in T will increase more quickly than number of 'b', which will increase the contribution of every future appendation of S towards final answer.
The maximum possible contribution of an appendation to our answer can be |S|, i.e. the length of string S. Again, as seen above, after at most 1000 appendations, the next appendation will reach its maximum contribution. Hence, we do same as above, we form T by appending string S min(N,1000) times as-
T = \underbrace{S + S + \dots + S}_{min(N,1000)\text{ times}}
Now each future appendation will have maximum contribution, equal to |S|. Hence, our answer will be-
Answer=G(T)+|S|*max(0,N-1000)
Now, what is the time complexity? How many of you would agree if I said-
Time Complexity= O(min(N,1000)*|S|) \equiv O(1000*|S|) \equiv O(|S|)
Is it correct?
.
.
.
(PS: The above time complexity is a lie )
SOLUTION:
Setter
Tester - O({|S|}^{2})
Tester - O(|S|)
Editorialist - O(???)
CHEF VIJJUâS CORNER:
**1.**Most of the contestants did a good job in framing the simulation part correctly, checking for overflows, and coming at right expressions. But they still got a TLE. And let me reveal the culprit here! It was-
T=S+T
Yes, thats right! This expression should had been T+=S. The + operator takes time comparable to O(|S|+|T|) while the += symbol takes time comparable to O(|S|). Treat it like âCreating a new array with characters of T and S v/s Adding |S| characters to back of char-array T .â For reference, have a look at this solution. Replace the + operator with += and see the time diffference!
2. Testerâs O(|S|log(|S|)) solution is discussed here. When freq(a,S)> freq(b,S), what he does is, keep a track of freq(a,S)- freq(b,S) at various points of the string, and then sort them. Now, we keep a variable curr which stores the count excessive 'a ' we get after each appendation. He uses this to check at most how many prefixes more he can get by the condition have[ptr - 1] + cur > 0. Depending on that, he adds ans += (int)have.size() - ptr; to the answer. In other words, tries to go checks the least value in of freq(a,S)- freq(b,S) which curr, number of excessive 'a' s can afford, and adds to answer accordingly. Same approach used for freq(a,S)< freq(b,S) case, which is left for the readers to work out :).
3. Testers O(|S|) solution is a bit similar. Firstly, Misha eliminated sorting, and replaced it with a cumulative summationâd prefix array. Meaning, he stores the frequency of each value of freq(a,S)- freq(b,S) and does a cumulative sum of that array. Again, curr holds freq(a,S)- freq(b,S). Now, his expression is ans += cnt[2 * SHIFT - 1] - cnt[SHIFT - cur];, meaning âMax possible contribution - Contributions which cannot be afforded due to low currâ. The solution is same in essence to the first one, but he cleverly got rid of sorting, leading to a much simpler O(|S|) approach.
4. Now you have seen two optimized approaches, I will have a question for you. Have a detailed look at the time complexity I mentioned. No doubt, my solution does ,plain and simple, have a complexity of at least O(min(N,1000)*|S|) , but dont you get something fishy? There is a, very common, and intentional error involved here :). Note that, there is no sense in the "1000" I kept in O(min(N,1000)*|S|). Ask yourself, what is this 1000? It is nothing but to denote |S|!! Hence, the real complexity would be O(min(N,|S|)*|S|) \equiv O({|S|}^{2}). Dont be like this editorialist when computing time complexity xD.
5. The string in this question was made up of only 2 characters. Replace 'a' and 'b' with 1 and 0 and you have a binary string. Some questions on similar topics-