PROBLEM LINK:
Author and Editorialist: Hussain Kara Fallah
Tester: Michael Nematollahi
PREREQUISITES:
sorting
PROBLEM:
A string is called special if the set of all of its non-empty different subsequences IS equal to the set of all of its non-empty substrings. Refer to the problem statement for more detailed explanation.
Now given a string S with length up to 10^6. Count the number of its distinct special non-empty substrings.
EXPLANATION:
We should figure first what are the possible forms of a special string?
Clearly, a string consisting of 1 letter only is special (like âaaaaâŚâ). What about more than 1 letter?
If the string consists of more than 2 letters itâs not special. Consider our stringâs letters from left to right. Take the first appearing letter and the last appearing letter (if we consider the first appearance of each letter, the last letter that appears).
In case (âaabbbccaaâ) the first is âaâ and the last is 'câ
In case (âabcccddbbâ) the first is âaâ and the last is 'dâ
Now take the subsequence consisting of all occurrences of both letters. In the first case itâs âaaccâ, in the second itâs âaddâ. This subsequence doesnât appear as a substring, because thereâs at least one extra different letter that appeared before our âlast appearingâ letter. So as a result, the substring containing all occurrences of both letters will contain the extra letter.
(Note these are only detailed examples, but the observation can be applied to any string).
Now we are left with strings consisting of 2 letters. A string of 2 letters is special if it is only like (âaaaaâŚbbbbâŚâ) (the first letter repeated X times followed by the second letter repeated Y times.
Letâs assume that it doesnât follow the mentioned form. (Letâs suppose that our string consists of âaâ and âbâ only). According to our assumption, then there must be at least one letter âbâ with at least one âaâ to the left and at least one âaâ to the right. So if we take the subsequence of all occurrences of letter âaâ itâs not present as a substring.
Now how to answer our question?
First letâs break our string into the minimum number of disjoint parts, and each part consisting of consecutive identical letters.
For example âaaabbccddeffâ will be broken into {âaaaâ,âbbâ,âccâ,âddâ,âeâ,âffâ}.
The single letter case is easy, we maintain the largest group occurred for each letter and add up their sizes.
Now, the 2 letters case can be achieved from taking 2 adjacent groups (letâs say the first one size is x and the second is y so we can have x*y special substrings.
But we need to count only the distinct ones. Letâs maintain for each pair of letters (let_1,let_2) , a vector containing pairs (x,y) such that some point in our string there was a sequence of x letters of let1 followed by y letters of let2 (consecutively). For each pair, maintain all occurrences of matching consecutive groups. In the previous example, some possible groups are (âaaaâ,bb") , (âddâ,âeâ) , and so.
For each pair of letters (let1,let2) letâs count the number of distinct special substrings formed only by these letters (and let1 is the first letter). Letâs consider our lettersâ vector (the letters pair vector).
Letâs suppose that we have 2 pairs (x1,y1) , (x2,y2) such that (x1 >= x2 and y1 >= y2), then clearly the second pair is redundant, because all substrings formed from the groups of our second pair can be formed from the groups of the first. Letâs remove all redundant pairs (Hint: sort them according to x values and maintain decreasing y values).
Now for every 2 pairs in our vector (x1,y1) , (x2,y2) if (x1 > x2) then it implies that (y1 < y2). Now, all pairs are sorted according to the value of x. Letâs take two consecutive pairs as mentioned, what substrings should we add from the first pair that wonât occur from any other pair in the vector? Itâs clearly *(x1-x2)y1. (Assuming the first pair is (x1,y1) and the second is (x2,y2)).
We do the mentioed process for all pairs of letters.
Complexity O(|S| \, log \, |S|)