Given a string S with length up to 10^5. You have to find a prefix of S that occurs the maximum number of times as a substring in S (among all prefixes). If there’s a tie, pick the longest prefix.
EXPLANATION:
Clearly, the prefix consisting of the first letter alone occurs the maximum number of times among all prefixes. Because whatever is the answer prefix, the first letter is always a prefix of it. However, it’s not necessarily the longest one.
To find the longest one, let’s maintain a list of all appearances of our string’s first letter. Let’s say the prefix consisting of the first 2 letters has maximum occurrences as well, what condition should hold?
Answer: Every appearance of the string’s first letter must be followed immediately by the string’s second letter. If one of these appearances was followed by a different letter, then clearly the prefix consisting of 2 letters occurs fewer times. So what are the condition for lengths > 2? (think about it).
If our answer’s length is K then all appearances of the first letter must be followed by exactly the same K-1 letters. So after we maintain the list of appearances indices, we keep moving them forward simultaneously one letter at each step (scanning substrings which present a prefix) until some mismatch happens or the last index reaches the end of the string.
This solution runs in O(|S|)
We can prove that according to the fact, that our segments built by moving our pointers cannot ever intersect. If 2 of them intersected, it means that at least one of our pointers points to some index with corresponding letter equal to the string’s first letter. BUT when we started a substring from the last appearance of our string’s first letter, there’s no other appearance in front of it, so it’s guaranteed that there will be a mismatch (or it reached the end of the string).
Am I the only one who observed the problem poorly and overkilled it by using binary search and string hashing ?
Here is the solution link : https://www.codechef.com/viewsolution/23134615
I used KMP search algorithm and binary search and hence the overall complexity was somewhat O((N+N+M)logN) or rather O((N+M)logN), multiplied the number of test cases. Can anyone say why is got itself TLEd? Or do I have no clue of the complexity?
“We can prove that according to the fact, that our segments built by moving our pointers cannot ever intersect. If 2 of them intersected, it means that at least one of our pointers points to some index with corresponding letter equal to the string’s first letter”
Can someone please explain this part a bit more? (with an example maybe)