DELNMS - Editorial

gamabunta · August 14, 2013, 3:51am

PROBLEM LINKS

Practice
Contest

DIFFICULTY

CHALLENGE

PREREQUISITES

Ad Hoc

PROBLEM

You are given a sequence of N integers. You wish to delete all the numbers in this sequence.

The only operation available to you is to delete some of the integers that follow the folling rules

All of them have the same values
Their positions are in arithmetic progression
The arithmetic progression should only be defined by
- Start Position
- Common Difference
The AP should cover all the positions that are within the limits and lie on the rught of the Start Position

Also, the delete operation results in the gaps left by the deleted number being filled by moving the numbers to the left.

EXPLANATION

naive solution

It is quite obvious as to what would constitutes the simplest solution to get AC.

for i = N to 1
	print "v = ", i
	print "d = ", 1

The idea is that we can always delete the last number without affecting anything else in the array.

Thus, there is always a way to delete all the numbers within O(N) operations.

dynamic order indices

Once a delete operation is performed, you have to adjust the indices of all the numbers which are on the right of any number which was deleted.

This operation can be performed in O(N) after each delete operation. But, this might be too expensive.

We can store the positions at the leaves of a segment tree and
Store the number of items in the left subtree of each internal node.

deletion of a node

Delete Operation can be handled by updating all the O(log N) segments encountered on the way of the search for the number.

Now, any delete operation can be handled in O(log N) time, at the tradeoff of increasing the query time of the order index from O(1) to O(log N).

searching the order index

Since we only perform stabbing queries, we will encounter O(log N) internal nodes.

When going to the right child of a root, we must add up the count of nodes in the left subtree of the root, plus one, to count the root.

A large number of solutions use the following ideas along with data structure to maintain the dynamic order indices

Store the list of positions that exist for each number in the array.
Search for the next candidate number to be deleted

The next candidate for deletion may just be the largest AP of positions stored for that number.

The challenge is also to make sure that each selection of v and t satisfies the criteria in the problem statement.

SETTER’S SOLUTION

Can be found here.

TESTER’S SOLUTION

Can be found here.

lxfind · August 14, 2013, 7:03am

Thanks for the editorial. I felt that more content could be added.
Any detailed insights on how to select the next candidate in a fast way?
Also, the highest score achieved by kgcstar is significantly higher than anyone else. Does anyone have thoughts on how he did that?

gamabunta · August 14, 2013, 12:12pm

I spent a day on trying to read through the codes that were written by top participants.

But these submissions are too advanced for me to grasp They have gone through many many iterations of optimizations. I didn’t want to delay the delivery any more.

In the past, the top contestants have given ideas on what their submissions did. It would be great if that happens for this problem as well

brianfry713 · August 14, 2013, 12:48pm

lxfind, look at kgcstar’s final code and see all the constants hard coded. You can also see that he made around 4,675 submissions on this problem. He must have been modifying his code to work better with the judge’s input.

utkarsh_lath · August 14, 2013, 1:09pm

link to setters code and testers code is broken

vineetpaliwal · August 14, 2013, 1:55pm

I think having separate final test cases should be the way to go.

sobhagya · August 14, 2013, 2:34pm

I also felt that more content could be added especially to the challenge problems with 2 and 3 different approaches. And top contestants like mugurelionut, kgcstar, acrush and djdolls should explain their approaches since, reading their program for challenge problems is quite tough, even the mid level programmers felt too much difficulty in understanding them.

sobhagya · August 14, 2013, 2:49pm

@brianfry it means he has submitted 1 solution in every 3 min(if we take whole 10 days of contest). And also solved other 9 problems in between. O.o

mugurelionut · August 14, 2013, 3:24pm

I will try to describe the main ideas I used in my solution. Actually, the “base idea” is very simple: At each move try to find the longest possible valid AP (i.e. a move which deletes the most number of elements) and use that move. This is, however, just a greedy approach. Moreover, finding the longest valid AP at each move is also something difficult to achieve given the tight time limit.

So I had to come up with ways of quickly finding “good” (i.e. long enough valid APs). Before continuing I should say that somewhere among my first submissions I managed to identify the fact that there were only 14 test cases with: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 188, 196, 207 and >60000 distinct numbers (I did not identify the values of N at the beginning, but I approximated them pretty well later on). Thus, I classified the test cases into: trivial (1 distinct element = all elements equal = just 1 move is enough), small number of distinct elements (2-10), large number of distinct elements (188-207) and very large number of distinct elements (>60000). It seemed to me that the greedy approach would probably need to be optimized differently according to the type of test case at least.

For tests with small number of distinct elements I only searched for APs whose last position is close to the end of the array (within some interval), their before-the-last position is close to the last position (i.e. at most a bounded number of elements of the same value away, but I also bounded the ratio of the AP). I also chose a minimum lower bound for the number of elements of the AP, so that I would stop searching for longer APs when an AP with at least that many elements was found (for instance, for 2 distinct elements I settled at 17 elements per AP; but that’s just because of the time limit - with a higher time limit I could find longer APs on my randomly generated cases, with 23-24 elements per AP) - anyway, that’s just the tradeoff I used.

For tests with a large number of distinct elements I used almost the same approach, with a small twist. I checked all the distinct values appearing in the last x positions of the array. For each such value I considered the last position to be somewhere among the last y numbers equal to that value and the other constraints are similar to the small number of distinct elements case.

For the very large number of distinct elements case I used an approach similar to the “large number of distinct elements” case (with some extra conditions - in this case there were many values with only 1 or 2 elements equal to that value, so I tried to remove them whenever I had the chance before continuing). I didn’t bother to optimize this case too much because it seemed to me that there wasn’t too much more to optimize out of it (but I may have been wrong - @kgcstar handled this case smarter than me - I did not check how much better his score for this case is than mine).

I also looked at @kgcstar 's code. In essence he also uses a greedy approach which tries to remove as many elements as possible at every move. He also uses bounds for searching, like I did (e.g. the last element of the AP is allowed to be only one of the last x elements equal to that value). But: for the very large test case he has a smarter solution which tries to create longer APs for some numbers by removing certain values of which only one elements was left - his Star function. For the other cases he also has an initial greedy with 1-step lookahead, i.e. he doesn’t just try to find an AP with the largest number of elements, but rather an AP such that its length plus the length of the largest AP in the remaining array is maximum (this is the 1-step lookahead) - however, as this is more time consuming, he only uses this a limited number of times in the beginning (his doasc function). Then, when the array is large, he splits the array into chunks of a fixed size (starting from the end) and removes the elements from the last chunk - when only a small number of elements is left in the chunk he moves on to the next chunk (which will also include the remaining elements from the previous chunk). By using chunks which are smaller compared to the total number of elements he effectively limits the range of an AP, but is capable of searching more thoroughly for long APs within the chunk.

Moreover, the test with 5 distinct elements had the smallest value of N (around 750). It seems that @kgcstar was able to fully identify this test case and compute the best solution for it offline - in his code you can find the sequence of moves precomputed for this test case (look for ans=147; in his code if you are curious).

Also speaking of @kgcstar 's solution, it’s funny that he hid his real score until the very end (if you look at his solutions just before his last one, you will see that for the case with 1 distinct element he was printing a large number of moves, not just 1). I guess that he was trying to make his competitors think his solution was worse than it really was, in order to make us not have so many incentives to optimize our own solutions too much It seems, however, than even with this hidden score his solution was still better than everyone else’s.

dtb · August 16, 2013, 5:51pm

Thx for this informative post. Can you also share the techniques you (and others like kgcstar) use to infer info about the testcases ? Repeated assertions ?

buda · August 19, 2013, 8:35pm

While I love Codechef long contests (probably my favorite contests of all), challenge problems are often fairly frustrating because of two related issues:

solutions are evaluated on the same dataset throughout the competition and the final score is also based on that dataset
there is no limit on the number of submissions

People actually “exploit” that by using asserts to basically find out the properties of the test data which are not specified in the problem itself. This is especially annoying when the problem doesn’t specify how the tests are generated like it was this time.

Making hundreds of submissions to “feel out” the test data doesn’t seem like a fun activity to me :S

ab19922991 · August 20, 2013, 12:26am

he hacked the smallest test case knowing all 749 numbers in the array