I have an issue with precomputation algorithm in KMP algorithm… I was reading it from Geeksforgeeks
and I have a problem in understanding that else part Why do we do len=lps[len-1]?? When we do dont find the last character matching we shorten up the suffix too(I visualise the len as the length of longest suffix and longest prefix matching)!! And if we do len=lps[len-1] then i think we are simpply ignoring the second last character???
else // (pat[i] != pat[len])
if (len != 0)
// This is tricky. Consider the example AAACAAAA and i = 7.
len = lps[len-1];
// Also, note that we do not increment i here
else // if (len == 0)
lps[i] = 0;
We aren’t ignoring anything, because we have valid computed lps values for previous positions already. Try to look at it in following way - we have some string, we want to add new char and compute length of longest matching between prefix and suffix. You try largest possible matching from old string, and check that added char matches with corresponding char in prefix. If it does - OK; otherwise you have to take some shorter prefix-suffix match; your old string looks like AxA, where A and x are some strings (it is also possible that there are no x at all, and two entrances of A overlap, but it does not matter here). You want to try some smaller preffix-suffix match. But if you will take some shorter suffix - it will be also a suffix of A, not only a suffix of original AxA string. And of course all shorter prefixes of AxA are also prefixes of A. Therefore your moved from solving question for AxA to solving question for A now; and this move is done by line len=lps[len-1].
a) We calculate lps because in case of mismatch we didn’t have to start over again.
b) In case pat = AAACAAAA and i=7; len=3 and it mismatches [A!=C]; if we see the lps[len]=0; it will not be fruitful.
But lps[len-1] = lps= A ;
it makes lps=[0,1,2,0,1,2,3,3] .
It is due to lps[l-1] we can have two 3 at last.