Problem Code- CN04
Difficulty: Medium
Prerequisites:
Suffix Arrays
Explanation:
Learn all about suffix arrays construction here:
Finding Total Number of distinct Substrings.
Well how do we do this ?
We see that , any string , let us say
ABC will give following distinct substring ,
A
B
C
AB
BC
ABC
Which is equivalent to choosing an end point and a start point in a string, and what is left is a simple matter of finding the number of ways you can choose a start and end point . Which is nC2 (where n is the string length ) .
Problem happens when we have repeated characters . What will you do then ? that is when suffix arrays come into play .
Now if I am given a string , axyz , and let us say , I ask you how many strings can you make with this string such that they start from the first letter. Then the answer will be 4 ( a , ax, axy, axyz ) .
So , I have a string of length 7 , then I have 7 suffixes (excluding NULL , I am not counting it because it’s length is 0 , does not contribute towards any distinct substring ) . So , going by the above logic , I will 7+6+5+4+3+2+1 distinct strings , which is equivalent to nC2 which was also the result of above discussion .
Unfortunately for us , this result doesn’t hold for the strings with repeated characters.
Take this one
S= “aabaab”
Suffix Possible strings such that they start from first letter of suffix
b b
ab a,ab
aab a,aa,aab
baab b,ba,baa,baab
abaab a,ab,aba,abaa,abaab
aabaab a,aa,aab,aaba,aabaa,aabaab
Now as you see strings a,b,aab,etc are repeated but we also know that if we sort the above suffixes , those strings which are common , there length is given in Lowest Common Prefix .
So , the total number of distinct Substrings from a given string is given by
nC2 – Sum of All LCP
And that is the answer, as simple as that.