Difficulty : Medium
Pre-requisites : Suffix Automaton
Problem : Given a string S and N queries. Each query is defined by two integers - Li and Pi. Count the number of strings of the length Li that occur exactly Pi times (as the consecutive substrings) in the string S.
This problem is the hardest in the set. At least, according to my estimations. There are some simple solutions that can be coded without putting a lot of effort, though, so let’s consider them first.
How to get 13 points
The constraints are very low, so you can do a pure brute force. For example, you can store all the substrings in the STL map along with the number of their occurences and when you get a query, you can iterate through all the substrings in the map and for the each of them you can check that the length and the number of occurences is the same with what is asked in the corresponding query.
How to get 40 points
Now N is larger so we have to store the information about all the substrings in the more efficient way. For example, using a trie. It is possible to construct the trie, containing all the suffixes (=all the substrings) of the string S in O(|S|2) time. Moreover it’s possible to store the depth and the number of occurences of the associated substring for each trie’s node. Knowing that there will be no more than O(|S|2) such pairs, we can just count the amount of occurences of each pair and then answer queries in O(1) time.
How to get 100 points
Here we need some suffix structure in order to store the information about the substrings. Writer’s choice is the suffix automaton. The complete its’ description is beyond this editorial, because it requires a lot of space and theoretical results. So, from now on, we will consider that you’re familiar with the suffix automaton. If you’re not, here are some basic facts that will be crucial in out solution:
- The suffix automaton is a finite automaton that has letters associated with its’ edges.
- Any path from the root’s node corresponds to some substring of the string |S| (of course, assuming that we’ve built the suffix automaton for the string S). And vice versa, any substring of S can be obtained by going along some path from the root node of the suffix automaton. So we can form the following criteria: the string T is a substring of S <=> there’s a path from the root node of the suffix automaton for the string S that has the string T associated with it.
- The suffix automaton has O(|S|) nodes. In the general case, no more than 2|S| nodes.
- Each node has a set of substrings associated with it.
- All the substrings associated with a single node occur the same number of times.
- All the substrings associated with a single node are the suffixes of the largest substring, associated with this node.
Knowing this, we obtain the following solution:
- At first, we build the suffix automaton for the given string
- Then, for each its’ node we calculate the number of occurences of its’ associated substrings, as well as the lowest and the highest length of the string, associated with this node.
- Now we obtain a set of segments parallel to Y-axis, where X-axis corresponds to the number of occurences and the Y-axis corresponds to the length. There are as much these segments as the nodes in the suffix automaton.
- And now the problem is: given O(N) segments, parallel to Y-axis and queries of the form (X, Y). The answer for a query is the number of the segments that cover the point with the coordinates (X, Y).
This new problem can be solved different. One of the ways is to note that different X-coordinates’ segments are independent. So we can solve the 1D case of this problem. While solving the 1D case, it’s easy to see that the number of the segments that cover the point X equals to A-B, where A equals to the number of the segments that end later or equal than in X and B equals to the number of the segments that begin after X.
- A paper on suffix automaton
- The TSUBSTR problem from CodeChef April 2012 Long Challenge, that also requires you to use the suffix automaton, and its’ editorial