SBSTR - Editorial

PROBLEM LINK:

Div1
Div2
Contest

Setter- Hasan Jaddouh
Tester- Jakub Safin
Editorialist- Abhishek Pandey

DIFFICULTY:

SIMPLE

PRE-REQUISITES:

Frequency Array concept’s application on string, looping, logical reasoning

PROBLEM:

A string is said interesting if, it has at least K distinct characters and frequency of all those characters is same. Given a string S, find number of interesting sub-strings in it.

QUICK EXPLANATION:

If frequency of all characters is same, it goes without saying that the value of maximum frequency is also same as value of any other frequency. Hence, we can simply generate all sub-strings in O({N}^{2}) and check the conditions in O(1) by maintaining a cnt variable to check number of distinct characters in the sub-string, along with using MaxFreq*cnt==length of string to check if frequencies of all characters are same. (Note that cnt is storing the number of distinct characters in the string).

EXPLANATION:

This is a fairly straightforward and simple problem. The editorial will have a single section, which will describe the two approaches used by setter and tester.

1. Full Solution-

By the constraints, it is clear that an O({S}^{2}) can pass, provided we dont indulge in very expensive operations.

Let me first put up the main code for reference and then refer to it for explanation. Also, let me use S(i,j) to denote "Substring of S starting at i and ending at j"

for(int i = 0; i < N; i++) {
			int occ[26] = {};
			int mxocc = 0, cnt = 0;
			for(int j = i; j >= 0; j--) {
				if(occ[S[j]-'a'] == 0) cnt++;
				mxocc = max(mxocc, ++occ[S[j]-'a']);
				if(cnt >= K && mxocc*cnt == i-j+1) ans++;
			}
		}

What we are doing is, we are fixing the value of i, i.e. end point of the sub-string. From there, we traverse backwards, cumulatively storing information and checking conditions for the S(i,j) and updating the answer.

To those who were not able to solve the question, a word on cumulativeness is given below. (Please note, we will be discussing a general example, which has no relation to code above)

Click to view

Say you checked sub-string S(i,j) for now, and the next sub-string to be checked is S(i,j+1). Currently, our variables are holding data obtained after processing S(i,j). Let me denote D( S(i,j)) to denote "Data obtained on processing sub-string starting at i and ending at index j"

Now, we know that,

D(S(i,j+1))=\underbrace{D(S(i,j))} _ {Already \space present \space in \space our \space variables}+ \underbrace{D(S(j,j))} _ {Need \space to \space process}

Simple brute force will, re-calculate all data for S(i,j+1), while cumulative solution will say "I already have data for S(i,j) , then only thing to be done is process character at j+1 and to get answer for S(i,j+1)

Bonus question for you guys- Can we call it Dynamic Programming? Cumulativeness is a basic principle of things like prefix-sum etc. Is it dynamic programming as well?

The above solution works in O({N}^{2}). There is another alternative solution, which will work in O(26*{N}^{2}). However, the constant factor of 26 is a bit high for constraints of |S|=7000, hence certain constant optimizations will be needed.

Let me denote "Frequency of characters in S(i,j)" as Freq(S(i,j)) .This approach will use the fact that,-

Freq(S(i,j+1))=Freq(S(i,j))+Freq(S(j,j))

We already have Freq(S(i,j)). Again , instead of re-computing things for entire sub-string again, use the we simply update the previous values of frequency array. Then we check if there are at least K distinct characters in string or not. IF AND ONLY IF there are, then we check the frequencies of characters using frequency array. Note that skipping this optimization can lead to TLE, as we would waste 26 iterations checking the next condition for nothing.

We will loop through all characters of alphabet, and check for their frequencies. The allowed frequencies are 0 i.e. character not resent in string (recall we have already checked that at least K distinct characters are present), or if the character is present should be some value P which should also be the frequency of all other characters in the string.

This solution actually runs in O(A*{N}^{2}) where A is size of your alphabet/number of characters allowed in alphabet. As given in question, size of alphabet can be up to 26, hence this approach works.

As an homework, try to-

  • The first approach (whose code I posted) traverses substring from S(j,i) to S(j-1,i)S(0,i). Make a solution traversing it from S(i,j) to S(i,j+1) . In other words, the code traverses substring in a right to left fashion, you should do in opposite direction.
  • Write/Implement the code for Second approach. Chef Vijju’s corner has the solution if you’re stuck.

SOLUTION:

For immediate availability of setter and tester’s solution, they are also pasted in the tabs below. This is for your reference, and you can copy code from there to wherever you are comfortable reading them. You now dont have to wait for @admin to link solutions :slight_smile:

Setter

Click to view
#include <iostream>
#include <algorithm>
#include <string>
#include <assert.h>
using namespace std;
 
 
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
 
 
int T;
string s;
int n,k;
int sm_n=0;
int rep[300],cnt,cnt_mx,mx;
int main(){
	//freopen("02.txt","r",stdin);
	//freopen("02o.txt","w",stdout);
	T=readIntLn(1,10);
	while(T--){
		s=readStringSp(1,7000);
		n=s.length();
		sm_n+=n;
		assert(sm_n<=14000);
		k=readIntLn(0,26);
		int sol=0;
		for(int i=0;i<n;i++){
			for(int i=0;i<300;i++){
				rep[i]=0;
			}
			cnt=0;
			mx=0;
			cnt_mx=0;
			for(int j=i;j<n;j++){
				if(rep[s[j]] == 0){
					cnt++;
				}
				if(rep[s[j]] == mx-1){
					cnt_mx++;
				}
				if(rep[s[j]] == mx){
					cnt_mx=1;
					mx = rep[s[j]]+1;
				}
				
				rep[s[j]]++;
 
				if(cnt_mx == cnt && cnt >= k){
					sol ++ ;
				}
			}
		}
		cout<<sol<<endl;
	}
	assert(getchar()==-1);
} 

Tester

Click to view
#include <bits/stdc++.h>
// iostream is too mainstream
#include <cstdio>
// bitch please
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <cmath>
#include <iomanip>
#include <time.h>
#define dibs reserve
#define OVER9000 1234567890
#define ALL_THE(CAKE,LIE) for(auto LIE =CAKE.begin(); LIE != CAKE.end(); LIE++)
#define tisic 47
#define soclose 1e-8
#define chocolate win
// so much chocolate
#define patkan 9
#define ff first
#define ss second
#define abs(x) (((x) < 0)?-(x):(x))
#define uint unsigned int
#define dbl long double
#define pi 3.14159265358979323846
using namespace std;
// mylittledoge
 
using cat = long long;
 
#ifdef DONLINE_JUDGE
	// palindromic tree is better than splay tree!
	#define lld I64d
#endif
 
int main() {
	cin.sync_with_stdio(0);
	cin.tie(0);
	cout << fixed << setprecision(10);
	int T;
	cin >> T;
	for(int t = 0; t < T; t++) {
		string S;
		int K;
		cin >> S >> K;
		int N = S.length();
		vector<int> lastocc(26, -1);
		int ans = 0;
		for(int i = 0; i < N; i++) {
			int occ[26] = {};
			int mxocc = 0, cnt = 0;
			for(int j = i; j >= 0; j--) {
				if(occ[S[j]-'a'] == 0) cnt++;
				mxocc = max(mxocc, ++occ[S[j]-'a']);
				if(cnt >= K && mxocc*cnt == i-j+1) ans++;
			}
		}
		cout << ans << "\n";
	}
	return 0;}
 
// look at my code
// my code is amazing

Editorialist’s Solution will be put up on demand.

Time Complexity= O({N}^{2}) or O(\alpha* {N}^{2}) where \alpha is number of characters allowed in alphabet/language.

CHEF VIJJU’S CORNER :smiley:

1. Solution for 2nd Approach-

Click to view
#include <bits/stdc++.h>
// iostream is too mainstream
#include <cstdio>
// bitch please
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <cmath>
#include <iomanip>
#include <time.h>
#define dibs reserve
#define OVER9000 1234567890
#define ALL_THE(CAKE,LIE) for(auto LIE =CAKE.begin(); LIE != CAKE.end(); LIE++)
#define tisic 47
#define soclose 1e-8
#define chocolate win
// so much chocolate
#define patkan 9
#define ff first
#define ss second
#define abs(x) (((x) < 0)?-(x):(x))
#define uint unsigned int
#define dbl long double
#define pi 3.14159265358979323846
using namespace std;
// mylittledoge
 
using cat = long long;
 
#ifdef DONLINE_JUDGE
	// palindromic tree is better than splay tree!
	#define lld I64d
#endif
 
int main() {
	cin.sync_with_stdio(0);
	cin.tie(0);
	cout << fixed << setprecision(10);
	int T;
	cin >> T;
	for(int t = 0; t < T; t++) {
		string S;
		int K;
		cin >> S >> K;
		int N = S.length();
		vector<int> lastocc(26, -1);
		cat ans = 0;
		for(int i = 0; i < N; i++) {
			int occ[26] = {};
			int cnt = 0;
			for(int j = i; j < N; j++) {
				if(occ[S[j]-'a']++ == 0) cnt++;
				if(cnt < K) continue;
				int x = 0;
				bool ok = true;
				for(int k = 0; k < 26; k++) if(occ[k]) {
					if(x == 0) x = occ[k];
					else if(occ[k] != x) {
						ok = false;
						break;
					}
				}
				if(ok) ans++;
			}
		}
		cout << ans << "\n";
	}
	return 0;}
 
// look at my code
// my code is amazing

2. Can we not do better than O({N}^{2}) for this problem? Why/Why not?

3. Upto what value of S (length of string), will O({N}^{2}) pass? Is 2*{10}^{4} a good estimate? Try to explore the mathematics behind it.

4. Tester’s Notes-

Click to view

I have O(N^2+26N) by recalculating and checking just the max. frequency of any letter and the number of existing letters.

My solution is extremely simple code-wise and fast, it really does something like 10-20 elementary C++ operations per substring (and just a bit more assembler instructions). That means the constraints for SBSTR can be even higher.

5. A very good problem on topic of frequency array is this one. Also, its editorial (unfortunately for you, again by me xD) has more problems on same topic.

1 Like

In this problem some solution with complexity O(nn26) also passed.
link https://www.codechef.com/viewsolution/18669296 and this https://www.codechef.com/viewsolution/18660196.

Yes, I know. I described it in editorial, however, some other implementations (which used more expensive computations, or didnt optimize) failed as well.

1 Like

@vijju123 can u provide quick brief explanation for this:- http://www.spoj.com/problems/MMASS/

plzz

Well that should not have happened. Why there is always some problem in codechef contest with weak test cases?

Why is this code leading to tle…(my code is easy to be read)… https://ideone.com/e.js/DugpK4

I wil @code_man , but first check the answer by @l_returns on it. I think it satisfies what you want.

@akash_321 - I disagree there. I feel since setter and tester’s solution themselves are O({N}^{2}), they shouldnt have even tried to stop the O(26*{N}^{2}) ones. When 2 solutions are asymtotically similar, then even constant and random optimizations matter.

1 Like

Your solution is O({N}^{2} LogN) in my opinion, which is worse than O(26*{N}^{2}). Map data structure has a LogN factor involved.

@vijju123 then what is point of n<=7000 why not n<=1000 , just an extra break statement?

@vijju123 can you please tell why my O(N^2) getting TLE?
See this : https://www.codechef.com/viewsolution/18661895

i used unordered map…so it should be O(1) for insertion in my opinion…and for checking the second condition, I was iterating through this unordered map, which shall be less than 26 for many cases… @vijju123

The setter and tester did not intend that either, but they cannot indefinitely raise constraints or reduce TL, else some correct solutions will not pass. Ultimately they decided that, for second easiest problem of ltime, its ok if a few brute forces (optimized ones) pass. The tester himself made us aware by making such a solution. Its not that they were unaware of this, but its just that no correct solution should get TLE

link or didnt happen…xD

Any Python Solution for 100 Points , i apply above both method but no one gives 100 Points .
@vijju123 .

Time limit is very strict. Dont know this solution is giving TLE

CHEF VIJJU’S CORNER:-

ans:-2 There are O(n^2) substrings in a string and we have to check every substring. So I think we cannot do better than O(n^2)

ans:-3 A computer does 10^8 computation/second on an average. So 2*10^4 is seems to good estimation,
Because of n^2=10^8 ==> n=10^4

@vijju123 am I right?

1 Like

Hmm, true, but only provided that the test cases arent anti hash :stuck_out_tongue: . Still, I read somewhere that the constant for unordered map is a little high. The TL for problem is tight, so try to solve it without unordered map.

Ans2 - The reasoning is not correct. I can use the same reasoning to claim that “For sorting the numbers, we know that there are N! permutations. We need to try each and every one of them to see if that permutation is sorted or not. Hence, sorting takes O(N!) time.”

Ans 3- Yes, but 2*{10}^{4} isnt a very good estimate for TL of 1 second, as depending on implementations, it can take 0.4 - 4 seconds.

for(int i = 0; i < l; ++i) {
        for(int j = i + k - 1; j < l; ++j) {
            int count_t[26] = {0}, pr = 0;

O(26*{N}^{2}) with expensive operations. Remember that accessing elements of 2-D array is slower than that of 1-D array. Everything matters when constraints are too tight. Refer to code under point 1. in my corner.

1 Like

The constraints were too tight. It seems Python needed more than 3x multiplier. But in general, we expect user to know which language is best to tackle a particular problem.