Given any string S , we can obtain compressed string from it by repeatedly removing consecutive characters which are equal.(ex hhhaarssshiiiiil - harshil, aabbaanneel - abane , aaaaaaa - a ).Two strings S and T are called similar strings if their compressed strings are equal. We need to find total number of ordered pair of strings (S,T) having length atmost N which are similar.
This a simple math problem involving combinatorics.Suppose for any string S , the length of compressed string which we can obtain from it is L.First Lets count how many strings are possible having length of compressed string L.
The point of observation is that in compressed string no two adjacent characters are equal (if any two adjacent characters are equal, we can remove that pair of adjacent characters and obtain compressed string of length L-1).Thus total number of compressed strings of length L will be 26*25(L-1) (for first character , we have 26 choices (‘a’-‘z’) and for rest L-1 characters we have 25 choices as every such character shouldn’t be equal to its previous character).
Now we know total number of compressed strings of length L .Lets fix some compressed string C. Lets count total number of strings S of length atmost N whose compressed string is C.We can obtain string S from C by repeating occurrence of any character in C any number of times such that total length is atmost N.(for example,our compressed string C is abcd , then we can obtain string aabcccdddd by repeating occurrence of a - 2 times , b - 1 time , c- 3 times and d - 4 times).
Suppose we are repeating occurrence of first character a1 times , 2nd character a2 times … Lth character aL times , then total number of strings S will be total number of solutions of the equation a1 + a2 + a3 … + aL ≤ N , where each ai ≥ 1 .
Assume for now that total number of solutions of this equation is P( later we will see how to calculate P).
Then for a fixed compressed string C , total number of ordered pairs (S,T) having their compressed string C will be P*P.
Total number of compressed string of length L is 2625(L-1). Thus total number of pairs of similar strings (S,T) such that length of their compressed string is L is 2625^(L-1)PP.
Answer for this problem is Σ 26*25^(L-1)PP where L ≥ 1 and L ≤ N.
The only thing unknown in this formula is P.Lets see how to find it.
P is total number of solutions of equation a1 + a2 + a3 … + aL ≤ N , where each ai ≥ 1.
We can transform this equation to a1 + a2 + a3 … + aL ≤ N-L , where each ai ≥ 0.Solution of both the equations is same.
We can again transform the above equation to a1 + a2 + a3 … + aL + aL+1 = N-L , where each ai ≥ 0 .Note that we replaced ≤ sign with = sign in this equation and added one extra term.The total number of solutions of this equation is NCL where C is binomial coefficient.Hence the value of P is NCL.
Those who don’t know how to find total number of solutions of last equation , follow this link