wa in SSTORY suffix arrays

kuruma · March 18, 2014, 5:08pm

Hello @all,

As @vineetpaliwal pointed out (and as it was pointed out to me during the contest) the issue with Suffix Arrays is that only examining consecutive characters might yield WA on some test cases (I only knew how to apply Suffix Automaton to solve this, as I read it on CLRS book), which the Suffix Automaton didn’t detect (might be related with some properties of Suffix Arrays).

Note that I’ve only read about Suffix Arrays while setting this problem and I still need to learn it properly myself

Hopefully setting problems harder for me might push my own boundaries further

Thanks,

Bruno

rishul_nsit · March 18, 2014, 5:37pm

Hi everyone

I used suffix arrays to solve this question and my solution sustained all retests successfully.
The algorithm is:

Construct the suffix array sorted lexicographically.
Construct the lcp array from the suffix array’s consecutive entries.
Mark the entries in lcp array whose suffixes start from different strings.
Find the max length of possible lcs (longest common substring) from the lcp array (equal to the max value of marked entries in lcp array).
Traverse over lcp array again and for all marked entries equal to max value, construct the lcs from the suffix array entries. Insert this lcs into a set. This is a set of possible lcs.
Now traverse over the second string, generate all substrings of length equal to max possible lcs length ( This can be done in O(n) ), search them in the set of possible lcs and break as soon as you find a match. The first match is your answer. This is the lcs which occurs first in the second string.

Link to the solution: http://www.codechef.com/viewsolution/3549875

If number of possible lcs are ‘k’, then the complexity of my solution is O(nlgk). Basically any possible mistakes are avoided because all possible lcs are generated and the one which occurs first in second string is printed always.

Learnt a lot from this question, thanks to @kuruma

jangwa · March 18, 2014, 5:43pm

Thanks …

randomusername · March 18, 2014, 5:44pm

I must say I enjoyed this problem, despite it’s annoying regrades. I used SA+LCP, and then realized that finding the leftmost LCS in the second string isn’t that easy. So after a day or two, I found a solution. It might be an overkill, but I had to implement three more necessary structures:

L : L[i]=MAXj, such that j < i and LCP[j]<LCP[i]

R : R[i]=MINj, such that j > i and LCP[j]<LCP[i]

(basically we want to find first element smaller than LCP[i] on both sides of the array)

CS (a structure that is used to retrieve RMaxQ in O(1), segment trees could work too)

Then I noticed that for every pair of consecutive suffixes in SA we can find the “best” (first in the second string) appearance of their common prefix in the second string using these structures. We just do RMaxQ(L[i]+1,R[i]).

I can elaborate this even further and show an example, if anyone is interested, but please share some solutions that use SA+LCP but are different than mine, I’m curious.

Cheers

randomusername · March 18, 2014, 5:46pm

Wow, that actually works? I thought that approach would TLE, I overlooked the overall complexity. Nice work!

randomusername · March 18, 2014, 5:49pm

But still, the problem can be solved with Suffix Arrays, in many different ways. So apart from the late regrades, this problem turned out well, because it can have diverse solutions, totally different from the setter’s.

rishul_nsit · March 18, 2014, 6:00pm

Thanks!
Yes it will work because of the trade off between length of lcs and number of lcs. If there are more number of lcs (higher k) then length of lcs is less and construction takes less time. And if length of lcs is high then there are lesser lcs (lower k) and matching takes less time. So overall the solution passes the time limit.

deepsaggas · March 18, 2014, 6:41pm

I found the length of the LCS say ‘k’ using SA and LCP. Now the task was to find the minimum index ‘i’ in s2 s.t. s2[i…(i+k-1)] is also the longest common substring. So what I did was I again iterated over LCP array and found the minimum such index in all valid windows. I define a valid window as [i…j] s.t. k<=LCP[s] for s \in [i,j] and LCP[i-1]<k and LCP[j+1]<k and there is q1 in [i,j] s.t. SA[q1] starts in s1 and there is a q2 s.t. SA[q2] starts in s2. You can look at my function algo() in my [solution][1] .
[1]: http://www.codechef.com/viewsolution/3618336

tinchy_stryder · March 18, 2014, 6:55pm

Generating substrings can become O(N^2) in the worst case. How does this pass?

Isn’t the complexity for std::erase() O(N) where N = length of the string?

deepsaggas · March 18, 2014, 6:56pm

Try this test case: “fagehe”,“fhfgfg”. Your code gives WA for this test case

rishul_nsit · March 18, 2014, 11:06pm

@tinchy_stryder have a look at my solution where I generate all substrings in O(n)

shahid_khan · March 19, 2014, 1:32am

Hi, I was able to discover this trick during the contest. I modified my code to handle this trick as well but I still kept on getting WAs. Can you or someone else provide me a case where my code fails? Link to my code -> http://www.codechef.com/viewsolution/3597436

You may add printf(“Index: %d\n”,Ind-M-1); before if(Len>0) statement to verify correctness.

lerence1201 · March 20, 2014, 6:05pm

Thanks a lot. Really helped!