CHSTR - Editorial

PROBLEM LINK:

Practice
Contest

Author: Vasia Antoniuk
Tester: Mahbubul Hasan and Sunny Aggarwal
Editorialist: Balajiganapathi Senthilnathan
Russian Translator: Sergey Kulik
Mandarian Translator: Minako Kojima

DIFFICULTY:

Medium

PREREQUISITES:

Z function, combinatorics

PROBLEM:

Given a string and a query k, output the number of ways we can choose k equal substrings from the string.

SHORT EXPLANATION

Let cnt[i] be the number of different substrings that occur in the string exactly i times. Then the answer for a particular k ( \le n) will be :
\sum_{i = k}^{n} cnt[i] * \binom{i}{k} we can calculate cnt[i] using suffix array/Z function.

EXPLANATION:

Let cnt[i] be the number of different substrings that occur in the string exactly i times.

For example, for the string “ababa”, the array will be:
cnt[1] = 4 -> {“ababa”, “abab”, “baba”, “bab”} occur exactly once
cnt[2] = 4 -> {“aba”, “ab”, “ba”, “b”} occur exactly twice
cnt[3] = 1 -> {“a”} occur exactly thrice
cnt[4] = 0
cnt[5] = 0

Now suppose we want the answer for k = 2, i.e. number of ways to choose 2 equal strings. How do we calculate this using the cnt array?

cnt[1] is of no use as we need 2 equal string.
cnt[2] -> There are 4 different substrings that occur 2 times. So, we can add 4 to the answer.
cnt[3] -> There is only 1 substring that occurs 3 times. But we need only 2 times, so we can choose any 2 of the 3. So, \binom{3}{2} = 3

Summing them, for k = 2 we get 7.

Now let us focus on calculating the cnt array.
To calculate this let us loop through all suffixes of S.

For the suffix P = S[i..N], let us calculate the array Z[i..N] where Z[i] is the maximal equal substring starting from i that matches a prefix of S[i..N].

For example, for the above string consider i = 1, the whole string is the suffix
“ababa”.
Here the Z array would be
Z[1] = 5 -> maximum prefix matching of “ababa” and “ababa”
Z[2] = 0 -> maximum prefix matching of “ababa” and “baba”
Z[3] = 3 -> maximum prefix matching of “ababa” and “aba”
Z[4] = 0 -> maximum prefix matching of “ababa” and “ba”
Z[5] = 1 -> maximum prefix matching of “ababa” and “a”

How is this array useful for calculating cnt?

From the above array we can conclude that we have 1 substring which can be choosen atleast 3 times. How? Because there are 3 entries in the above array that are greater than or equal to 1.
Similarly we can deduce for 2 (2 times), 3 (2 times), 4 (1 time ) and 5(1 time).
So we increment cnt[3] once, cnt[2] twice and cnt[1] twice.

We do this for all suffix of S. The only thing left is to observe that cnt[i] now actually has how many different strings can be choosen atleast i times. We want it to be exactly i times. This is simple: we just subtract from cnt[i] (cnt[i + 1] + cnt[i + 2] … + cnt[N]).

We can compute the Z function in O(n) time. See here for how to do that.

Time Complexity:

There are N suffixes and we take O(N) time for each suffix so total time is:
$O(N^2).

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution
Tester’s solution

3 Likes

Can we solve it without z algorithm?

I tried using a Trie data structure , and finding out cnt array in O(N^2). and the use DP to store the answers for 1 to n and the print appropriate answer or 0 if query is greater than N.
But i am getting TLE for three cases in Subtask 3?
is anymore optimizations required ???

1 Like

I am using a Trie data structure and using that finding out how the cnt array in O(N^2) .and using DP to answer the queries in O(1) time .I am still getting TLE for SUbtask 3 except two cases … i am using FAST IO is any further optimizations required??

1 Like

yes, i have solved with hashing using rabin karp algo
see my solution http://www.codechef.com/viewsolution/7225121
the code is not quite readable but you can understand it.

1 Like

Even I had initially used trie data structure to solve the problem and I am pretty sure my complexity was O(3*(n^2)). I got only one of the last subtask set accepted and rest all tle…The link to my solution is http://www.codechef.com/viewsolution/7156684 . Is it possible that the constant(3) multiplied to n^2 in the big Omega notation lead to TLE?? Does that happen often??
Plz someone answer.

Suffix Array + RMQ can also be used for solving the problem.

This question was very similar to http://discuss.codechef.com/questions/43060/anusar-editorial.

5 Likes

@raja44ever need a little bit explanation of your approach. How did you decided the hash function. Please help ?

I solved it using suffix trie and got AC 100 pts . Link : http://ideone.com/RKRiZX

1 Like

Even I was using Trie at first O(n^2), then realized that i’m using pointers to create trie thats why i was getting tle, then i went for suffix array (O(nlognlogn) creation time) pretty fast as compared to trie, O(1) time query, got accepted. Array for solving queries took O(n^2) creation time (same as in case of trie), and also removed unnecessary mod (caused me 2 TLEs), overall O(n^2) on trie was pretty much slower than the the O(n^2) on suffix array. code with Trie took 5 - 6s to solve 5000 length string with 10^5 queries (which i had randomly generated) on my pc, and same input took approx 1 second in case of suffix array.

2 Likes

I tried the same approach , but TLE.
http://www.codechef.com/viewsolution/7140090.
Not able to understand until now what was the reason ?

My solution passed subtask 1, and task no 4, 5 from subtask 2. For others it gave WA.
Can anyone suggest what can be problem with the solution. link

I’ve used lcp array to calculate number of occurance, of each distinct substring, in the string.
And used this for calculating answer.

The problem can be solved with a Trie in O(n^2) but you need to add at one step all the substrings that start at position i by changing all the nodes on the path in the trie and compute cnt[i] inside the insert function. I used the bits in a int variable to know which sons I already created for the current node because if you initialize all the nodes with NULL you’ll get TLE.

1 Like

I tried so solve using hash, hashing all substrings, firstly with length 1, then 2, and so on. The rest of algorithm was the same. But somehow, it was WA on the last test case only. I tried with different hash functions, but didn’t help. Anyone has idea why? Can I see test cases now after the competition ended?

i counted the no of equal strings for every length i form 1 to n using hashing(map is used to count equal strings) and save them in array a(same as cnt in editorial above)…then i used two optimizations
1.) for every k, i calcalated the answer by considering only non zero cnt[j] where k<=j<=n.
2.) second if for any length i there is no two equal strings than we stop counting cause there is also no equal strings for the length greater than i

Thanks for your great reply understood a lot. But how did you determined a hash function with no collisions.

Instead of using Z function ,I used a map instead and it caused me TLE in the third subtask well the complexity being N^2*logN + i think a big constant factor is hidden in implementation of maps!!

I tried 3 approaches.

  1. Trie : It would TLE all the time even after many optimizations in the last sub task with O(N^2).
  2. Hashing : I also tried un-ordered mapping of sub strings along with dp to get O(N^2) but this also TLEd on the last sub task.
  3. Suffix Array : Finally, I used SA with the complimentary LCP array to get 100 pts with O(N^2).

So what I could make out was that even though all of them were O(N^2), the constants and over heads in the first two approaches should have been high for the relatively strict time limit.

1 Like

just take a prime number big enough so that hash values do not collide after modulus operation cause the difference between the hash values would be that prime number… i used variable b for this purpose