What is the best algorithm to find the most occurrence of a word in text?

Does anyone know what is the best algorithm to find the most occurrence of a word in text (very large), so hash map is not a solution I’m looking for. I’m thinking of either “Trie” or “Suffix Array” but I haven’t had a clear idea how to solve it. Any idea or suggestion would be greatly appreciated.

For example: This is is is an example.
Then the most occurrence is “is”.

What about this pseudo code:

c = first char
while c not text end
    c = get char
    if c terminates word
        get actual word from trie
        increment counter and check if it is max
    else
        move in trie pointer using c

so in your example

after this word

ROOT - t - h - i - s (1) // <- most occurences

after first is

ROOT + t - h - i - s (1) // <- most occurences
     - i - s (1)

after second is

ROOT + t - h - i - s (1)
     - i - s (2) // <- most occurences

at the EOF

ROOT + t - h - i - s (1)
     + i - s (3) // <- most occurences
     + a - n (1)
     - e - x - a - m - p - l - e (1)

edit: but this works only if you can split the text to words… If your question is about substrings instead of words, then the question is if you are looking something better than substrings of length 1 (just characters).

what could be the reason behind not using STL map? someonoe please give me an example where STL map will be inefficient for this problem. thnks.

you can use KMP or Aco-Corasick both used for pattern matching, if u have large number of pattern match with a a input text of large size u can use Aco-Corasick

the question was asked 1 year ago so i think the op would have got his answer by now. its showing up at the top of the feed because of my cmment “what could be the reason behind not using STL map? someonoe please give me an example where STL map will be inefficient for this problem. thnks.” can you please answer it. also please add some details in ur current answer. i really dont understand how KMP or aho corasick could be used here…

I think KMP is the best one with O(n+m) refer to Needle in a haystack problem on SPOJ

please give some more details.

//