Nth largest/smallest value

Hello guys,

Is there any efficient algorithm to find the nth largest (or) smallest value from a given array without having to sort the array…??

Possibly in a single iteration … or using some DS with sufficient memory efficiency !

Kindly do let me know if it is possible :slight_smile:

Thanks,
Arun.

Yes there is some way. In C++ it has own implementation called nth_element and works in O(n) time complexity.

The approach is similar like in quicksort. Pick some element as pivot x and reorder array such there are elements smaller than x, equal to x and greater. Notice, that they don’t need to be sorted in these sections. Now you can easily discard one of these sections - if there are more than n elements smaller than x the answer must be here and you can throw away greater elements. And simililar if there are fewer elements smaller than x. Now you can continue on smaller array. This should be O(n) if in each iteration you discard half of actual array. And if you choose pivot randomly, there is a high probability of doing so.

There is also deterministic O(n) algorithm, but it’s more complicated. Try to chceck this or google for “median of five” algorithm.

2 Likes

@michal27 , thanks for your quick response ! i’ll take a look at those algos !

Another tricky case to be added to the question,
What if i don’t have all the data in hand (assuming i have a very large dataset) and i can only obtain the data by streaming like 10000 records at once and then i’ve to discard the data in hand to get the next set of data …
Eg : “A million records and i can only obtain 10000 at a time… totally streaming the data 100 times to process all the data (100 * 10000 = 1 million).”
in this case is it possible to obtain a linear time algorithm ? with sufficient memory efficiency !

And N is < 10000 too or it is not limited?

Well, I think this problem cannot be solved with just these constraints. Because you can search for far too big n-th largest number, which you can’t remember.

But nice problem with large data set is this: You have stream of N numbers and you want pick K largest one. But you can only remembered 3K numbers. What is the most efficient way? (Don’t get confused by 3K. It can be also 15K or 40K, but it must be constant)

I fully agree with michal27 as to find Kth largest one when K is greater than the capacity we can store its impossible.
@michal27 if I am not wrong we can use a min heap for that…We will keep adding to the heap till its size is less than K(as we want only K largest one) and now when its size is equal to K as soon as new data arrives we check it with the min element if its less then the minimum element in the heap we just ignore it and if its greater then we delete the minimum and insert it into the heap. After processing all the data we will be left with K largest one. :slight_smile:

yes that’s true @xpertcoder this is correct solution, but not the fastest one. Your solution has time complexity O(N log K), because at most each element must be added to heap. But there is soultion O(N), so try to figured out :slight_smile: (if you will be stuck, I can add hint :slight_smile: )

His solution is actually O(KlogN), if i understood it correctly. I don’t think there is a way to do it in O(N), since even the classical k-th element problem can’t be solve in O(N) IN PLACE. The best optimization i can think of for his solution is to decide what’s better to search for k-th max element or n-k+1-th min element. Which is just a constant improvement, ie O(0.5*NloglN). But feel free to prove me wrong.

heap size is K, so it is O(N log(K)). And Classical k-th element is solved in O(N) in place as described by @michal27. It uses idea from quicksort and quicksort is in place sorting algorithm…

1 Like

Yes you’re right, my mistake(s). Still don’t know if it’s possible to do it in O(N) (not the original problem, but the modified one).

@c0d3junki3 it’s not a coincidence, that I mentioned this problem along with finding k-th smallest element with O(n) algorithm :slight_smile:

let’s make the problem a bit more difficult by saying you want the k largest elements(not k th alone) of the array that can be achieved in O (k + n log k). All you have to do is maintain a min-heap of first k elements , and now traverse it k times array[k] to array[n-1] (0 based indexing). Each time you encounter an element larger than the top of min heap you swap those elements.
This gives us the largest k elements left in the heap at the end. Time complexity being O (k + (n-k) log k).
You may further sort them, to make your complexity O ( k + (n - k) log k + k log k) = O (k + n log k ). You may find this link useful “http://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/

1 Like

Implementation of the above approach: http://ideone.com/jp3uE8 , it is when k=3, can be gene.ralized

Implementation of basic quicksort method for the same: http://ideone.com/Fd4ZXh For more, see this: http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/