Problem Link:
Difficulty:
EASY
Pre-requisites:
Sliding window, strings
Problem:
Given two strings A and B of equal length N, find how many substrings of equal length have a difference not greater than K (Substrings are denoted by L1, R1, L2, R2 : A[L1…R1], B[L2…R2], R1-L1 = R2-L2. Here, difference is the number of characters at that differ in the same corresponding positions. Also, substrings are different if their positions in the string are different.
Quick Explanation
For strings S and T of equal length, and characters c and d, diff(S+c, T+d) = diff(S, T) + 1 or 0 (depending on whether c = d or not).
Similarly, diff(S - {its first character}, T - {its first character}) = diff(S, T) - 1 or 0 (depending on whether their first characters are different or not).
Using the above, we can get diff(L1, R1, L2, R2) from diff(L1, R1-1, L2, R2-1) in O(1) time.
Also, if X is a substring of S and Y is a substring of T (at same positions), then diff(X, Y) <= diff(S, T). Thus, for a fixed L1, L2, once we find the minimal length for which diff(A[L1…R1], B[L2…R2]) > K, we know that for all larger lengths and those starting positions, the difference > K.
The above reasoning yields the following algorithm:
Initialize L1, L2 globally.
R1 = L1, R2 = L2, diff = A[L1] != B[L2]; //start with length 1
while(R1 < N && R2 < N)
if(diff == K+1)
ans += R1 - L1; // quadruplets (L1, L1, L2, L2)...(L1, R1-1, L2, R2-1) are valid;
diff -= (A[L1] != B[L2]);
L1++, L2++;
else
R1++, R2++;
if(R1 < N && R2 < N)
diff += A[R1] != B[R2];
//now we still have (L1, L1, L2, L2) ... (L1, R1-1, L2, R2-1) as well as (L1+1, L1+1, L2+1, L2+1) ... (L1+1, R1-1, L2+1, R2-1) ... etc. to be counted
len = min(N-L1, N-L2);
ans += len * (len+1)/2; //all substrings of this string get counted.
Finally, while the above procedure takes O(N) time, we need to ask, how many times do we need to Initialize L1, L2 and to what values should they be initialized. Note that once initialized, the offset of the starting positions between the substrings remains constant. Thus, we should initialize L1, L2 with all possible beginning offsets.
ans = 0
for(i = 0; i < N; i++)
run above procedure with L1=i, L2 = 0
for(i = 1; i < N; i++)
run above procedure with L1=0, L2 = i
Output ans
The time complexity of the whole procedure is thus O(N^2) per testcase.
Explanation:
Let us begin with as naive an approach as possible and try to improve from there. A brute force solution would vary over all possible L1, L2, length and check what is the value of the difference of strings. Pseudocode would be:
Diff(string S, string T)
val = 0;
for(int i = 0; i < S.length(); i++)
if(S[i] != T[i])
val++;
return val;
solve(string S, string T)
ans = 0;
for(int L1 = 0; L1 < N; L1++)
for(int L2 = 0; L2 < N; L2++)
for(int R1 = L1, R2 = L2; R1 < N && R2 < N; R1++, R2++)
if(Diff(S.substring(L1 to R1), T.substring(L2 to R2)) <= K)
ans++;
Output ans;
The above algorithm would take O(N) time for the Diff function, and would be called O(N^3) times, giving a complexity of O(N^4). This is unfortunately too slow.
We need to be more intelligent in calculating Diff or in calling the function Diff. We notice that A[L…R] is almost the same as A[L+1…R] or A[L…R+1], and similarly for B. Indeed, if we were to compare Diff values
A[L1…R1+1] = A[L1…R1]x
B[L2…R2+1] = B[L2…R2]y
and hence diff(A[L1…R1+1], B[L2…R2+1]) = diff(A[L1…R1], B[L2…R2]) + (x!=y).
Similarly, diff(A[L1+1…R1], B[L2+1…R2]) = diff(A[L1…R1], B[L2…R2]) - (A[L1] != B[L2]).
Well, that takes care of calculating difference efficiently, atleast somewhat. Now can we avoid calculating difference unnecessarily too? Notice from equation:
“diff(A[L1…R1+1], B[L2…R2+1]) = diff(A[L1…R1], B[L2…R2]) + (x!=y)” the fact that
diff(A[L1…R1+1], B[L2…R2+1]) >= diff(A[L1…R1], B[L2…R2]).
Thus, once we find minimal length l such that diff(A[L1…L1+l-1], B[L2…L2+l-1]) > K, then for any length >= l, we will get the difference > K.
Finding what is the maximal R1, R2 such that for fixed L1, L2 we have diff(A[L1…R1], B[L2…R2]) <= K, and then varying over possible L1, L2 yields the algorithm described in Quick Explanation, which is O(N^2).
Setter’s Solution:
Can be found here
Tester’s Solution:
Can be found here