A tutorial on Suffix Arrays

I have been researching suffix array for last couple of days. This tutorial has certainly been of great help. Also I find the link given for skew algorithm to be much simpler at least in theoretical sense.

However I ran into a problem with the statement in the given link which says

Recursively handle suffixes from the i mod 3 = 1 and i mod 3 = 2 groups.

I understand the division of the string into 1,2 mod 3 group & 0 mod 3 group. But I do not get the point of recursively solving this step to get the suffix array.

@kuruma It would be of great help if you could clarify my doubt.

awesome explaination dude… seriously… finally i got the suffix array… :slight_smile:

“ssissippi” suffix must be after “ssippi” suffix after sort considering first four characters.

awesome (y)

1 Like

how is sissippi has lower sort index(before) than sippi after sorting for first four characters…??
suffix array is really troubling me.Please sm1 help!

very Good… :slight_smile:

1 Like

Now, we can call quick-sort and sort the suffixes according to their first 4 characters by using the 2-tuples we constructed above! The result would be

can anyone please explain the above line???qsort is called in which tuple…

1 Like

how tuples r used for sorting please explain???

1 Like

Really,easy to understand from this tutorial!!!:slight_smile:

3 Likes

Now, we can call quick-sort and sort the suffixes according to their first
4 characters by using the 2-tuples we
constructed above! The result would
be:

6 6: sissippi
7 3:
sippi
8 5: ssissippi
9 2: ssippi

Can anyone explain me the result of above step. How are the suffixes sorted according to first 4 characters after calling quick sort.
I mean, how is ‘sissippi’ is having a lower sort-index then ‘sippi’ and simmilarly how is ‘ssissippi’ having a lower sort-index than ‘ssippi’ ?
I think the correct order should be:

6 6: sippi
7 3:
sissippi
8 5: ssippi
9 2: ssissippi

1 Like

In the naive approach, shouldn’t the loop go from i=0 to s.size() rather than i=0 to s.size()-1 ?

1 Like

well explained, but I think there should be some correction in the table given below:
Sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        3: sissippi
7        6: sippi
8        2: ssissippi
9        5: ssippi
till (compared)4th character,  8 sort_index should be "ssippi" and 9th should "ssissippi" same in 6 & 7 sort_index
1 Like

Shouldn’t it be
“We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two suffixes”
instead of
“We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two prefixes.”

1 Like

I am a beginner. Very well written sir. It was a deep and helpful insight.

1 Like

I think the final answer given is wrong. It should be

sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        6: sippi
7        3: sissippi
8        5: ssippi
9        2: ssissippi

This was awesome!
Thanks Bruno.

while sorting the suffixes according to the first 4 characters, why is “3” not assigned in the second tuple ??

shouldn’t in the naive approach

for(int i = 0; i < s.size();i++)
{
    m[s.substr(i,s.size()-i)] = i;
    v.push_back(s.substr(i,s.size()-i));
}

should be:

for(int i = 0; i < s.size();i++)
{
    m[s.substr(i,s.size()-1)] = i;
    v.push_back(s.substr(i,s.size()-1));
}

???

notice the size() - 1 instead of size()-i

very good explanation…

Another nice tutorial :slight_smile: