CHSTR - Using Perfect Hashing

Did someone get 100 using hashing in this problem?

I did ,see the solution http://www.codechef.com/viewsolution/7225121

Can you please explain your hash function? thanks!

I don’t see how this will run without collisions if the test data is strict.

Can Somebody help me understand why my O(n^2) algorithm using hashing gives TLE for the third subtask.
http://www.codechef.com/viewsolution/7211894

My Algo is that I hash all the substrings into a hashTable using polynomial hashing (mod size of the table). I keep an id for all strings which is its value in base 27 (considering ‘a’ as 1 … ‘z’ as 26). I maintain an array setsWithSize which contains the number of different strings which occur as substrings exactly i times at setsWithSize[i].
After hashing all the substrings, I compute the array answ[] which contains the answer for query k at answ[k].

@thezodiac1994 just take hashing base as a big prime no. so that collision can be prevented

@vikky_coder , I think the hash function is just a variant of Rolling hash

Rolling hash is basically the following algorithm

hash = (hash * base + s[i]) % MOD

The crucial point in using hashing is to perfectly choose base and MOD

You can use Mod wrt more than one prime for safety

@vikky_codder first calculate the hash value for first substring of length i and the remaining can be calculated using this statement
ht = int_mod(ht*b - s[j-i-1]*e + s[j],mod);
…where i+1<=j<=n and b is the base(which is a big prime no) and e=b^i

I used rolling hash with some combonations of base and MOD, but all I got was WA and in some cases TLE due to perhaps because of large value of MOD. The problem is “how to choose base and MOD for perfect hashing?”.

@raja44ever thanks :slight_smile:

@priyankpalod how is it O(n^2)? What about h.add() that is being called for all substrings?

@vikky_coder The best approach to do hashing is to use a prime mod say 10^9+3 and two bases say 29 and 59 . Then calculate two hash functions for each string one using base 29 and other using base 59. Two strings are then said to be equal if and only if both the hashes are same for the two strings . It is highly unlikely that this approach will give collisions (at least it never did for me). :slight_smile:

I did it without hash

http://www.codechef.com/viewsolution/7137470

1 Like

@vikky_codder h.add() just adds the substring in the hashtable and since the hashTable is very large we can safely assume it takes constant time (very less load factor)

That still doesn’t assure that all keys will be distinct.

Well lot of people did using rabin karp and polynomial hashing …

i dont understand how collisions could be avoided.

lets say a%m =x
implies (a + n*m)%m = x where n is any positive integer
the range values obtained will be from 0 to 26^(5000) as 5000 is the size of the string now even if you take a prime number as big as 10^18 (64 bit limit in c++) how can you gaurentee it wont cause collision in a unbiased test cases??

As the prime is increased, the probability of collisions decreases. I don’t think the standard RK algo is 100 % collision proof but when you use a large hash table, you can do away with “NOT SO STRICT” data. I think thats what has happened here.
PS : You cant use a prime as large as 10^18 because you need to perform multiplications in the hash funcn which will result in overflow.

Look, you can never guarantee the elimination of collisions. All you can do is insure that collisions are minimum. For that you may use two bases as vippu95 said, or you can use modulo two or more primes. The strings will be equal if the hash value for each modulo is same.