Start with an empty string, and do the following operations:
- Append a character to the end of string.
- Remove the first character of the string.
After each operation, report the number of distinct substrings in the string.
This was easily the hardest problem. The setters solution was based on Ukkonen’s Algorithm. The time complexity is O(Q).
Ukkonen’s algorithm has been described very neatly on stackoverflow. It has also been described in short in these slides. I would recommend reading the stackoverflow post(if possible, the slides as well) before we continue. Here is link to the published paper for the enthusiasts. It can also be found on e-maxx.ru.
The sum of length of all edges of the suffix tree is equal to the number of distinct substrings in the string. So, If we had to merely append characters to the end of string, it can be easily done with Ukkonen itself, because we could maintain sum of length of all edges. We also need to maintain the number of leaves, because the length of each leaf edge increases by one after each step. Length of remaining edges remain same, unless the edge has been split in previous step.
Ukkonen’s algorithm can be modified to handle deletions as well, and its not too hard to figure out. Here are the details.
The first step is to realize that each leaf corresponds to a unique suffix of the string, and we can do bookkeeping to maintain the leaves for each suffix. Only those suffixes whose length is equal to remainder(refer to stackoverflow article) or less have no leaves corresponding to them. The algorithm will automatically ensure that we remove only those suffixes for which a leaf exists.
Due to bookkeeping, we can get hold of the leaf corresponding to the current string. We only need to remove this leaf. We also might need to remove some internal edges that were being used only by the suffix we are removing. To see this, consider the suffix tree for string “aaac”.
Suppose we remove the first ‘a’ from string now. We would just need to remove corresponding leaf(i.e. node 3). Now suppose we were to remove the second ‘a’ as well. Then we would need to remove vertex 4, as well as vertex 2, because the 1-2 edge is being used only by this suffix. In general, when removing a leaf, keep removing its parent as well if
- The parent has out degree 0.
- And it is not the active node.
To see the reason for point 2 above, consider the string “aaacaac”. The active node after adding last ‘c’ will be node 2. Now lets say, you remove the first ‘a’ from the beginning. After that your suffix tree looks like:
Now you also want to delete the second ‘a’. However, you cannot remove node 2, because the node 2 may have outdegree 0, but it is being used by the active suffix. After removal of node 4, our suffix tree looks as follows.
This example also illustrates that after removing the front character, the active point itself can become a leaf. This can be identified as active edge child of active node becoming null. In this case, you will need to establish the current active point as a leaf, and move to suffix link of active node. This will ensure a condition we had in the beginning, that “The algorithm will automatically ensure that we remove only those suffixes for which a leaf exists”. This is all the high level details you need to know. Read the solutions below for details. Some of the top solutions are very neat as well and can be read.
Acrush and few others used suffix arrays and LCP array with high large amount of optimizations to pass the time limit.
Can be found here
Can be found here
Can be found here