I am reading about Count-Min Sketch data structure which gives a probabilistic answer to point and range queries, based on error probability parameter and the tolerance parameter. For example, the question “how many times with probability of 10% did item x appear in the stream of data” could be answered by CM.
An associated problem of heavy hitters has also come up. While implementing a min heap for the HH problem, I have noticed various research papers specifying that only if the minimum count of an item in the sketch is greater than a threshold, do we insert into the heap.
My question is, does this mean we are probabilistically answering the heavy hitters problem? Would the corresponding question be “with probability of 10%, which item was the second most frequent in the stream of data?”