Problem Link:
Difficulty:
Easy-Medium
Pre-requisites:
Persistent Data Structures, Segment Tree, Merge Sort Tree
Problem:
Find the number of comparisons that will be made during the sorting of the given permutation of integers with the provided QuickSort code.
Explanation:
Solution to the sub task 1:
It’s enough just to translate the given code into your favorite programming language. Given implementation is naturally a QuickSort and it is widely known that it will not make more than N*N comparisons.
Solution to the sub task 2:
Again, it’s enough just to run the provided code. N is quite large here, but again, it’s widely known that QuickSort makes about O(N*log(N)) comparisons on randomly generated arrays, so that’s still enough.
Solution to all the sub tasks:
The solutions to sub tasks 3 and 4 are very similar in fact. Your score depends on how efficiently you implement the key idea.
Observation: every time the list A[] is given to the sorting function, A[] always consists of the permutation of the numbers in some range [L; R]. Initially, it’s [1; N], then it’s [1; pivot-1] and [pivot+1; N] and so on.
Another observation: the relative order of numbers in every possible list A[] given to the sorting will not change.
So, we just have to calculate the middle number by position (i.e. (R-L+2)/2-th number) among all the numbers with values in the range [L; R] in order to get pivot.
For this, we generally use the concept of the “rank-query”. The rank query basically asks, what is the k-th least element in a given set, for a queried k. If the set contained the positions of the elements in [L; R], then the (R-L+2)/2-th element is what we are looking for.
This is generally done by using a ‘1’ at the value of the element in the set, and then the k-th element is got by just asking what is the first position to have k 1’s to the left of it (inclusive). This data structure is generally implemented by a BIT or a segment tree, but in this problem we will see how using a segment tree is of use to us.
Lets define Pos(x) = position in original array where element x is,
Set(a, b) = the set {Pos(a), Pos(a+1), …, Pos(b)}.
We are generally interested in Set(L, R) and finding the (R-L+2)/2-th element, but notice:
Set(1, a-1) is a subset of Set(1, b) and
Set(a, b) = Set(1, b) \ Set(1, a-1)
Thus, if just define S(x) = Set(1, x), we would get that given (L, R), we need to find the (R-L+2)/2-th element in S(b) \ S(a-1).
Using our concept of ones to denote positions, set difference where one set is subset of the other is now just finding the first position x where number of 1’s to the left of x in S® - number of 1’s to the left of x in S(L-1) = (R-L+2)/2 !!
Notice now, that once we have N+1 segment trees (each segment tree corresponds to a set S(x)), we would be done, because traversal for a given query (L, R, k) is merely : go to left child with (L, R, k) if number of ones in left child® - number of ones in left child(L) <= k, else go to right child with query (L, R, k - #ones).
But, we cannot have N+1 segment trees!! Thats too much memory!! and time to create !! But notice how S(x+1) = S(x) U {Pos(x+1)}. There is just one element being added to the set! Can we save memory and time now? The answer, is yes. The concept used is Persistence. You can read up about persistent data structures online through various sources : Article on Codeproject, Problem QUERY Editorial (the part on “Introducing Persistence”). For wikipedia lovers, what we are aiming to do in our persistent segment tree looks a lot like in the Example on Trees on the wiki page, whose diagrams are self-explanatory). One of the key things to note in persistence, and in persistent segment trees, is that there is no child-parent reference. All references and traversals are one-way: down the tree. Also, node numbers are dynamic and not the standard left-child = 2i, right-child = 2i+1.
Putting it all together:
Root[0] = Empty segment Tree
for i in 1 to N
Root[i] = Root[i-1].insert(Pos[i])
Answer = f(1, N)
f(L, R)
if(R <= L) return 0
x = getKth(Root[L-1], Root[R], (R-L+2)/2)
pivot = A[x]
return (R-L+1) + f(L, pivot-1) + f(pivot+1, R)
getKth(Lnode, Rnode, k)
if(Lnode is a single position)
return Lnode's position
ones = Rnode.leftchild.ones - Lnode.leftchild.ones
if(ones <= k)
return getKth(Lnode.leftchild, Rnode.leftchild, k)
else
return getKth(Lnode.rightchild, Rnode.rightchild, k - ones)
Code for segment tree modification can be found for example in setter’s code.
Analysis of Time complexity and Memory Complexity:
Using persistence, each update operation take O(logN) time and memory.
We totally have an original segment tree taking about 2N memory, plus N more updates taking a total of 2N + NlogN memory.
Time complexity of an update is also O(logN).
Traversals (to get the K’th element) takes O(logN) time as well. And such queries come at most N times since there are at most N pivots.
Thus, overall time and memory complexity of the solutioN: O(N log N)
Note on getting full points:
Finally, the above implementation will get you only 86 points. It is not because of efficiency (Subtask 3 and Subtask 4 are meant to differentiate time complexity of O(logN) with O(log^2N) per query), but due to the fact that with N as large as 5 * 10^5, the recursion stack would be too large resulting in a Runtime Exception/Segmentation Fault. The best way to overcome this issue, is to convert the implementation to a non-recursive manner.
In fact, since the [L; R] intervals being queried are independent of each other (subproblems don’t even overlap), we need not worry about in which order we query the intervals, just that we need to query all the relevant ones. Thus, we can modify our DFS-like recursive implementation into a BFS-like version using queues. This will get you full 100.
There exists also an O(log^2)-per-query approach, however it’s more complicated from the setter’s point of view.
Problems to practice Persistence:
Codeforces: Noble Knight’s Path
Spoj: DQUERY (online version of this can be solved using persistence)
Common mistake
Bugs in subtask 1 and 2 can arise from incorrect implementation of Quicksort.
go(L, R):
if(R - L <= 0) return 0
p = A[(L+R)/2]
tmp1 = tmp2 = 0
for i from L to R:
if(A[i] < pivot) less[tmp1++] = A[i]
else if (A[i] > pivot) more[tmp2++] = A[i]
for i from 0 to tmp1-1
A[L+i] = less[i]
for i from 0 to tmp2-1
A[L+tmp1+i] = more[i]
return go(L, tmp1-1) + go(tmp1+1, R) + (R-L+1)
The bug is in the indices used in the last line of code.
In fact, we saw various versions of codes having bugs that were simply due to incorrect indices.
To avoid this kind of bug, it is best to have separate variables for different index-domains, and to completely avoid using “L+i” etc.
In the above example, you could have instead
int j1 = L;
for(int i = 0; i < tmp1; i++, j1++)
A[j1] = less[i]
int j2 = L + tmp1;
for(int i = 0; i < tmp2; i++, j2++)
A[j2] = more[i]
return go(L, j1-1) + go(j1+1, R) + (R-L+1);
Setter’s Solution:
Solution to subtasks 1 and 2 can be found here
Solution to all the subtasks with O(N log^2 N) complexity can be found here
Solution to all the subtasks with O(N log N) complexity can be found here
Tester’s Solution:
Can be found here