PROBLEM LINK:
Setter- Hasan Jaddouh
Tester- Jakub Safin
Editorialist- Abhishek Pandey
DIFFICULTY:
SIMPLE
PRE-REQUISITES:
Frequency Array concept’s application on string, looping, logical reasoning
PROBLEM:
A string is said interesting if, it has at least K distinct characters and frequency of all those characters is same. Given a string S, find number of interesting sub-strings in it.
QUICK EXPLANATION:
If frequency of all characters is same, it goes without saying that the value of maximum frequency is also same as value of any other frequency. Hence, we can simply generate all sub-strings in O({N}^{2}) and check the conditions in O(1) by maintaining a cnt variable to check number of distinct characters in the sub-string, along with using MaxFreq*cnt==length of string to check if frequencies of all characters are same. (Note that cnt is storing the number of distinct characters in the string).
EXPLANATION:
This is a fairly straightforward and simple problem. The editorial will have a single section, which will describe the two approaches used by setter and tester.
1. Full Solution-
By the constraints, it is clear that an O({S}^{2}) can pass, provided we dont indulge in very expensive operations.
Let me first put up the main code for reference and then refer to it for explanation. Also, let me use S(i,j) to denote "Substring of S starting at i and ending at j"
for(int i = 0; i < N; i++) {
int occ[26] = {};
int mxocc = 0, cnt = 0;
for(int j = i; j >= 0; j--) {
if(occ[S[j]-'a'] == 0) cnt++;
mxocc = max(mxocc, ++occ[S[j]-'a']);
if(cnt >= K && mxocc*cnt == i-j+1) ans++;
}
}
What we are doing is, we are fixing the value of i, i.e. end point of the sub-string. From there, we traverse backwards, cumulatively storing information and checking conditions for the S(i,j) and updating the answer.
To those who were not able to solve the question, a word on cumulativeness is given below. (Please note, we will be discussing a general example, which has no relation to code above)
Click to view
Say you checked sub-string S(i,j) for now, and the next sub-string to be checked is S(i,j+1). Currently, our variables are holding data obtained after processing S(i,j). Let me denote D( S(i,j)) to denote "Data obtained on processing sub-string starting at i and ending at index j"
Now, we know that,
D(S(i,j+1))=\underbrace{D(S(i,j))} _ {Already \space present \space in \space our \space variables}+ \underbrace{D(S(j,j))} _ {Need \space to \space process}
Simple brute force will, re-calculate all data for S(i,j+1), while cumulative solution will say "I already have data for S(i,j) , then only thing to be done is process character at j+1 and to get answer for S(i,j+1)
Bonus question for you guys- Can we call it Dynamic Programming? Cumulativeness is a basic principle of things like prefix-sum etc. Is it dynamic programming as well?
The above solution works in O({N}^{2}). There is another alternative solution, which will work in O(26*{N}^{2}). However, the constant factor of 26 is a bit high for constraints of |S|=7000, hence certain constant optimizations will be needed.
Let me denote "Frequency of characters in S(i,j)" as Freq(S(i,j)) .This approach will use the fact that,-
Freq(S(i,j+1))=Freq(S(i,j))+Freq(S(j,j))
We already have Freq(S(i,j)). Again , instead of re-computing things for entire sub-string again, use the we simply update the previous values of frequency array. Then we check if there are at least K distinct characters in string or not. IF AND ONLY IF there are, then we check the frequencies of characters using frequency array. Note that skipping this optimization can lead to TLE, as we would waste 26 iterations checking the next condition for nothing.
We will loop through all characters of alphabet, and check for their frequencies. The allowed frequencies are 0 i.e. character not resent in string (recall we have already checked that at least K distinct characters are present), or if the character is present should be some value P which should also be the frequency of all other characters in the string.
This solution actually runs in O(A*{N}^{2}) where A is size of your alphabet/number of characters allowed in alphabet. As given in question, size of alphabet can be up to 26, hence this approach works.
As an homework, try to-
- The first approach (whose code I posted) traverses substring from S(j,i) to S(j-1,i)…S(0,i). Make a solution traversing it from S(i,j) to S(i,j+1) . In other words, the code traverses substring in a right to left fashion, you should do in opposite direction.
- Write/Implement the code for Second approach. Chef Vijju’s corner has the solution if you’re stuck.
SOLUTION:
For immediate availability of setter and tester’s solution, they are also pasted in the tabs below. This is for your reference, and you can copy code from there to wherever you are comfortable reading them. You now dont have to wait for @admin to link solutions
Click to view
#include <iostream>
#include <algorithm>
#include <string>
#include <assert.h>
using namespace std;
long long readInt(long long l,long long r,char endd){
long long x=0;
int cnt=0;
int fi=-1;
bool is_neg=false;
while(true){
char g=getchar();
if(g=='-'){
assert(fi==-1);
is_neg=true;
continue;
}
if('0'<=g && g<='9'){
x*=10;
x+=g-'0';
if(cnt==0){
fi=g-'0';
}
cnt++;
assert(fi!=0 || cnt==1);
assert(fi!=0 || is_neg==false);
assert(!(cnt>19 || ( cnt==19 && fi>1) ));
} else if(g==endd){
assert(cnt>0);
if(is_neg){
x= -x;
}
assert(l<=x && x<=r);
return x;
} else {
assert(false);
}
}
}
string readString(int l,int r,char endd){
string ret="";
int cnt=0;
while(true){
char g=getchar();
assert(g!=-1);
if(g==endd){
break;
}
cnt++;
ret+=g;
}
assert(l<=cnt && cnt<=r);
return ret;
}
long long readIntSp(long long l,long long r){
return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
return readString(l,r,'\n');
}
string readStringSp(int l,int r){
return readString(l,r,' ');
}
int T;
string s;
int n,k;
int sm_n=0;
int rep[300],cnt,cnt_mx,mx;
int main(){
//freopen("02.txt","r",stdin);
//freopen("02o.txt","w",stdout);
T=readIntLn(1,10);
while(T--){
s=readStringSp(1,7000);
n=s.length();
sm_n+=n;
assert(sm_n<=14000);
k=readIntLn(0,26);
int sol=0;
for(int i=0;i<n;i++){
for(int i=0;i<300;i++){
rep[i]=0;
}
cnt=0;
mx=0;
cnt_mx=0;
for(int j=i;j<n;j++){
if(rep[s[j]] == 0){
cnt++;
}
if(rep[s[j]] == mx-1){
cnt_mx++;
}
if(rep[s[j]] == mx){
cnt_mx=1;
mx = rep[s[j]]+1;
}
rep[s[j]]++;
if(cnt_mx == cnt && cnt >= k){
sol ++ ;
}
}
}
cout<<sol<<endl;
}
assert(getchar()==-1);
}
Click to view
#include <bits/stdc++.h>
// iostream is too mainstream
#include <cstdio>
// bitch please
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <cmath>
#include <iomanip>
#include <time.h>
#define dibs reserve
#define OVER9000 1234567890
#define ALL_THE(CAKE,LIE) for(auto LIE =CAKE.begin(); LIE != CAKE.end(); LIE++)
#define tisic 47
#define soclose 1e-8
#define chocolate win
// so much chocolate
#define patkan 9
#define ff first
#define ss second
#define abs(x) (((x) < 0)?-(x):(x))
#define uint unsigned int
#define dbl long double
#define pi 3.14159265358979323846
using namespace std;
// mylittledoge
using cat = long long;
#ifdef DONLINE_JUDGE
// palindromic tree is better than splay tree!
#define lld I64d
#endif
int main() {
cin.sync_with_stdio(0);
cin.tie(0);
cout << fixed << setprecision(10);
int T;
cin >> T;
for(int t = 0; t < T; t++) {
string S;
int K;
cin >> S >> K;
int N = S.length();
vector<int> lastocc(26, -1);
int ans = 0;
for(int i = 0; i < N; i++) {
int occ[26] = {};
int mxocc = 0, cnt = 0;
for(int j = i; j >= 0; j--) {
if(occ[S[j]-'a'] == 0) cnt++;
mxocc = max(mxocc, ++occ[S[j]-'a']);
if(cnt >= K && mxocc*cnt == i-j+1) ans++;
}
}
cout << ans << "\n";
}
return 0;}
// look at my code
// my code is amazing
Editorialist’s Solution will be put up on demand.
Time Complexity= O({N}^{2}) or O(\alpha* {N}^{2}) where \alpha is number of characters allowed in alphabet/language.
CHEF VIJJU’S CORNER
1. Solution for 2nd Approach-
Click to view
#include <bits/stdc++.h>
// iostream is too mainstream
#include <cstdio>
// bitch please
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <cmath>
#include <iomanip>
#include <time.h>
#define dibs reserve
#define OVER9000 1234567890
#define ALL_THE(CAKE,LIE) for(auto LIE =CAKE.begin(); LIE != CAKE.end(); LIE++)
#define tisic 47
#define soclose 1e-8
#define chocolate win
// so much chocolate
#define patkan 9
#define ff first
#define ss second
#define abs(x) (((x) < 0)?-(x):(x))
#define uint unsigned int
#define dbl long double
#define pi 3.14159265358979323846
using namespace std;
// mylittledoge
using cat = long long;
#ifdef DONLINE_JUDGE
// palindromic tree is better than splay tree!
#define lld I64d
#endif
int main() {
cin.sync_with_stdio(0);
cin.tie(0);
cout << fixed << setprecision(10);
int T;
cin >> T;
for(int t = 0; t < T; t++) {
string S;
int K;
cin >> S >> K;
int N = S.length();
vector<int> lastocc(26, -1);
cat ans = 0;
for(int i = 0; i < N; i++) {
int occ[26] = {};
int cnt = 0;
for(int j = i; j < N; j++) {
if(occ[S[j]-'a']++ == 0) cnt++;
if(cnt < K) continue;
int x = 0;
bool ok = true;
for(int k = 0; k < 26; k++) if(occ[k]) {
if(x == 0) x = occ[k];
else if(occ[k] != x) {
ok = false;
break;
}
}
if(ok) ans++;
}
}
cout << ans << "\n";
}
return 0;}
// look at my code
// my code is amazing
2. Can we not do better than O({N}^{2}) for this problem? Why/Why not?
3. Upto what value of S (length of string), will O({N}^{2}) pass? Is 2*{10}^{4} a good estimate? Try to explore the mathematics behind it.
4. Tester’s Notes-
Click to view
I have O(N^2+26N) by recalculating and checking just the max. frequency of any letter and the number of existing letters.
My solution is extremely simple code-wise and fast, it really does something like 10-20 elementary C++ operations per substring (and just a bit more assembler instructions). That means the constraints for SBSTR can be even higher.
5. A very good problem on topic of frequency array is this one. Also, its editorial (unfortunately for you, again by me xD) has more problems on same topic.