Finding Hash of Substring [i, j] in O(1) using O(|S|) pre computation

Given a string S of length n characters, is it possible to calculate the Hash of its substring [i, j] (From index i to index j. Inclusive) in O(1) using some form of precomputation ? Maybe a modification of the Rolling Hash ?

Similar Problem

Problem Statement TACHEMIS

I have seen it being used in a similar problem where in a string was given in a compressed form. Meaning, e.g. if the string is "aaabccdeeee" then the compressed form is:

3 a
1 b
2 c
1 d
4 e

How data was stored

They are stored in an str[] array as :

str[] = [{'a','3'}, {'b','1'}, {'c','2'}....]

HASHING Concept that was used in the solutions

And programmers had used the following hash concept to find if the given substring is a Palindrome or not. Given a substring of string S as (i,j), they computed the hash of substring [i , (i+j)/2] and the reverse hash of substring [(i+j+2)/2, j] and checked if they were equal or not. So if they wanted to check if in string S = "daabac" whether substring [1, 5] is a a palindrome or not, they computed the following :

h1 = forward_hash("aa") 
h2 = reverse_hash("ba")
h1 == h2

Code for the Hashing Concept

The hash precomputation was done as follows :

/* Computing the Prefix Hash Table */
pre_hash[0] = 0;
for(int i=1;i<=len(str);i++)
{
    pre_hash[i] = pre_hash[i-1]*very_large_prime + str[i].first;
    pre_hash[i] = pre_hash[i]  *very_large_prime + str[i].second;
}

/* Computing the Suffix Hash Table */
suff_hash[0] = 0;
for(int i=1;i<=len(str);i++)
{
    suff_hash[i] = suff_hash[i-1]*very_large_prime + str[K-i+1].first;
    suff_hash[i] = suff_hash[i]  *very_large_prime + str[K-i+1].second;
}

And then the hash was computed using the following functions :

/* Calculates the Forward hash of substring [i,j] */
unsigned long long CalculateHash(int i, int j)
{
    if(i>j)
        return -1;
    unsigned long long ret = pre_hash[j] - POW(very_large_prime, [2*(j-i+1)])*pre_hash[i-1];
    return ret;
}
/* Calculates the reverse hash of substring [i,j] */
unsigned long long CalculateHash_Reverse(int i, int j)
{
    unsigned long long ret = suff_hash[j] - POW(very_large_prime,[2*(j-i+1)])*suff_hash[i-1];
    return ret;
}

What I am trying to do

I am looking for a general approach to the above concept. Given a Pattern P, I want to check if the pattern P is present in a string S. I know the index (i) to check where it may be present. And I also know the length of pattern P represented as |P|. In short I want to check if hash of S[i, i+|P|] and hash of P match or not in O(1) using some form of pre computation on S.

Ignoring the time taken to compute hash of P else it would be O(1+|P|)

This is precisely the technique employed by Rabin-Karp string matching algorithm. You can start off with the Wikipedia article on Rabin-Karp. Or consult CLRS; it has a detailed explanation.

1 Like