How to find k most frequent words in a file?

arpit728 · July 3, 2016, 8:21am

what is the most optimum way to find k most occurring words in the file.

arunnsit · July 3, 2016, 9:53am

hello arpit , considering the input to be a string of length L which consists of space separated words . so basically there are 2-3 ways which i am able to think of that can be used to do so.

Trie : you can make a dynamic trie to store each word along with a counter on them , simply add them and increase the counter. O(L) algorithm , can be implemented carefully with o(L) space.
inbuilt hash (unordered_maps) : you can hash every word and store them in a key,value pair map . well, this will take O(kL) where k is actually a factor sometimes considered to nearly logn . O(L) memory.
explicitly hash : well this is same as above but you can select your own hash function depending upon the input and again to hash them you need a storage , which can be either maps (again logn factor) or trie .

These ways are bounded by my knowledge , would be more than happy to know if there exists a better one .

arpit728 · July 4, 2016, 11:54pm

@arunnsit

Can you provide links to some tutorials on trie.