KILOWHAT - Editorial

anton_lunyov · January 15, 2013, 3:00pm

PROBLEM LINK:

Practice
Contest

Author: David Stolp
Tester: Anton Lunyov
Editorialist: Anton Lunyov

DIFFICULTY:

CHALLENGE

PREREQUISITES:

Basics of plane geometry

PROBLEM:

You are given N points on the plane. Each point is either red or blue. You need to find non-self-intersecting polygon separating points of two colors. So that all red points are inside or on the border of the polygon, and all blue points are outside or on the border of the polygon, or vice versa. The goal is to minimize the perimeter of your polygon.

EXPLANATION:

As mentioned in a hint the simplest way to get AC is to construct a polygon simply having all N points on its boundary. We consider several ways how to do this.

Let’s sort all points in lexicographical order (at first by X-coordinate and in the case of a tie by Y-coordinate). Actually it is almost correct just to print this sorted array of points as the answer. Indeed, it will be non-self-intersecting except that the last side, that connects the first and the last vertexes, could intersect many other sides. To handle this issue it is sufficient to add just two points to the end of this list. Namely, the points (maxX, maxY + 1) and (minX − 1, maxY + 1) will do the job, where minX is the minimal X-coordinate among all points and maxX, maxY are defined similarly. Just draw the corresponding picture to the following example (it is already sorted):
8
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
Two additional points will be (3, 4) and (0, 4). Note that we exclude point (3, 3). This is to show necessity of the point (maxX, maxY + 1). Without adding this point you get self-intersection here. And actually you indeed get WA adding only the point (minX − 1, maxY + 1) to the end of the list. Also note that the point (minX, maxY + 1) instead of (minX − 1, maxY + 1) also leads to WA as the last side can completely cover the first side in this case (again refer to this example). See tester’s 1st solution as a reference to this approach. But such solution scores only 23.721313 points (note that ACRush was able to score less than 4.000 points!)
Actually, we could print only input points in some order to get AC. For this let’s choose some point inside or on the boundary of the convex hull of all our points, and sort them by the polar angle with respect to this point making some careful consideration of the points with the same polar angle. One of the simplest way to get AC by this scheme is to choose the point with the coordinates (sumX / N, sumY / N), where sumX is the sum of X-coordinates of all the points and sumY is the sum of Y-coordinates of all the points. Clearly, this point lies inside or on the boundary of the convex hull. Also this point is quite lucky and we should not care about the order of the points with the same angle. Simply sorting pairs (angle, index) and printing points in the order that this sorted arrays produces gets AC. And such solution gives slightly better score 14.734905. Refer to the tester’s 2nd solution for details.
Let’s play with the previous solution to achieve better score. Let’s choose some random convex linear combination of input points and sort all points with respect to this point. Doing this many times (namely, while time permits) we could seek for the polygon with the smallest perimeter. To construct convex linear combination of points
(X[i], Y[i]), i = 1, 2, …, N,
we choose some sequence of non-negative weights
w[1], w[2], …, w[N]
and consider the point of the form
(sumX / sumW, sumY / sumW), where
sumX = w[1] * X[1] + … + w[N] * X[N],
sumY = w[1] * Y[1] + … + w[N] * Y[N],
sumW = w[1] + … + w[N].
Such solution scores 13.651408. Refer to the tester’s 3rd solution for details. Note how the comparison of points by polar angle is performed there. It is recommended way how it should be always made.
The problem is NP-complete since we can reduce the classical NP-complete traveling salesman problem to it. For this we should take equal number of red and blue points and place them by pairs where in each pair we have one red and one blue point that are very close to each other. Than any polygon that separates points of different color should approximately have vertexes close to the points in pairs. So it will correspond to traveling salesman tour.
We can use this insight to achieve slightly better score. We still ignore colors of vertexes and simply seek for the traveling salesman tour that passes through each of our points exactly once. There are many approximate algorithms exist for this problem.
The fairly simple one was implemented by Rudradev Basak in his first solution. Namely, he chooses some random permutation as a route and then check for each pair of non-consecutive sides AB and CD whether we could improve the perimeter of the polygon by reversing the path from B to C, so that we replace |AB| + |CD| by |AC| + |BD| in the perimeter. It can be proved that once no such improvements are possible then we have non-self-intersecting polygon. This idea brings him 6.70761 points. Refer to the tester’s 4th solution as a clean and commented version of the described solution that also tries permutations while time permits, which allows to get a better score 6.428542.

ADVENTURES WITH SPECIAL JUDGE LEADING TO CONTEST EXTENSION:

Just 3 hours before the scheduled end of the contest, we received the new best solution by Rudradev Basak which scores only 3.614238 points, while ACRush was the only one who was able to score less than 4 points at that time and his score was about 3.99. Analyzing Rudradev’s submission we noticed that he simply uses the scheme above, adjusting the permutation, but only for points of some color (actually he tries both colors and also does this while time permits). So, his output is simply the polygon that contains points of one of the colors. Clearly, in general for such polygon we could have points of other color inside as well as outside it. So this output is wrong. David quickly realized the error in the special judge and fixed it by adding one addition check for such case. This happened just in half an hour before scheduled end of the contest. In mere minutes we made a decision to rejudge all AC submissions and extend the contest one day more. The actual extension was made just in 6 minutes before the scheduled end of the contest. Apparently, the last half an hour of the contest was insane for us.

You may find interesting to see how Rudradev was able to figure out our bug:

“It was a very unexpected combination of events that led me to the find the bug:
1) I made the same bug in my local tester as the online judge.
2) The logic for my solution happens to trigger the bug, but in a non trivial fashion. When I realized the above, I made a very simple change to the code which would make it absolutely clear if the online judge indeed has the same bug.
3) I submit this changed code, but I happen to submit it on a wrong problem, and getting WA, think that I might be mistaken about above.
4) Few hours later I happen to look at my KILOWHAT submission history, and am surprised to see no WA there.
5) Figure out everything, make the joke submit, and behold!
In any case, extending the contest was the right thing to do. Unfortunately I won’t have any time at all to work on this problem in the next 24 hrs.”

ADVANCED SOLUTION:

The author’s solution does not ignore colors and construct some non-trivial polygon that separates points of different colors, which allows to score 5.515622 in no time. It starts with an edge that will belong to the result, a set of points that will be inside the result polygon, and a set of points that will be outside. It begins by finding the convex hull of the “inside” points, then for each point inside, it assigns it to the nearest edge. Then for each edge it applies the algorithm recursively, swapping inside and outside points. It combines the sub-results into a single polygon. See author’s solution for implementation details and additional comments for understanding this approach. Probably adding some randomizations and running it while time permits could allow getting closer to unbeatable ACRush

AUTHOR’S AND TESTER’S SOLUTIONS:

Author's solution	can be found here. It scores 5.515622 points using 0.00 of time.
Tester's 1st solution	can be found here. It scores 23.721313 points using 0.00 of time.
Tester's 2nd solution	can be found here. It scores 14.734905 points using 0.00 of time.
Tester's 3rd solution	can be found here. It scores 13.651408 points using 0.98 of time per test.
Tester's 4th solution	can be found here. It scores 6.428542 points using 0.99 of time per test.

RELATED PROBLEMS:

Will be provided soon. All readers are welcomed to provide related problems in comments.

evandrix · January 15, 2013, 3:26pm

Is it possible to submit this as practice now that the contest is over?
I can’t find this problem in the practice section of this website.

anton_lunyov · January 15, 2013, 3:31pm

Wait for a moment. The problems should be added soon (like 15-20 minutes).

damians · January 15, 2013, 3:58pm

My solution is not comparable to ACRush’s or mugurelionut’s, but it’s quite simple to code, and gives some nice score/hardness ratio
Firstly one needs a good TSP (travelling salesman) on plane approximation. There are a lot of algorithms for that. I used the same as Rudradev (described in editorial).

Choose (e.g.) red points to be the inner ones. Let S=set of all red points.
Find a cycle through S (use TSP algorithm). We use our nice method to do it, so we get a valid polygon ;). Ok but now some blue points are probably inside the polygon. How sad Take all blue points inside, add them to the set S and repeat step 2.

We repeat until no new point is added to S.
This approach gives quite a lot of points. But to get score like 4.3 (0.9 points) you need to improve it a bit. For example take corners which are concave and red and delete them etc.

anton_lunyov · January 15, 2013, 4:07pm

Actually in his subsequent submits Rudradev try to use the same idea
But he do only one relaxation: add inner points and update the answer.
I should guess at that moment that this implies that the judge is wrong.
Since clearly after rebuilding we could have new blue points inside.

iscsi · January 15, 2013, 4:27pm

I spent 4-5 days at the end of the last year with TSP on kaggle despite of this I couldn’t solve this problem, after the rejudge… I was so upset, I would like to delete my account, but I found a post that I couldn’t do that. Now have a chance I will participate again…

I’m thinking about the bug, so in my case:
I’ve “solved” the problem on january 3, and 4. On January 14 announced the judge has a bug. I thought I remember well on the problem and I thought my implementation has a bug. I use one of my old code (which was bad, on the old judge too ) and try to find out on the server some assert what was the bug in the old judge, but I couldn’t etc…

So in my case If you told what was the bug, it still would be very hard to solve the problem again correctly in one day. I understand its my fault that I couldn’t solved the problem, and its my fault I doesn’t understand the problem well, But I still think you should announced the bug if we just get 24 hours. I feel 3 days would be the limit where you shouldn’t announced… I would like to apologize my behavior but I had a hard day yesterday, except for this problem too.

nims11 · January 15, 2013, 4:34pm

I was doing the same as the advanced solution but it seems my solution was leading to some case when there was no edge that could include a point inside without invalidating the polygon. Maybe because i was including a point X into edge AB simply by breaking it into AX and XB, so for such cases, i was falling back to the poor solution. Should have tried breaking the edge from between.

jbupit · January 15, 2013, 6:16pm

I used a modified version of Graham Scan to achieve the polygon. I took the “inside” point with minimum X coordinate as the origin, sorted with the polar angle, and handled the co-linear points by keeping the farthest “inside” point and nearest “outside” point and ignoring the rest.
Then while scanning, if the current point is an “inside” point, and it must be a part of the ‘hull’ (by examining angles with prev. 2 pts) then include it in the polygon. If, however, the point is an outside point, and must NOT be the part of the hull, include it in the polygon.
Anything wrong with this approach?

mugurelionut · January 15, 2013, 7:20pm

Let me explain the main idea of my solution (it got an average score of 3.981307 in the end which was about 0.994 of ACRush’s score).

Let’s assume that we want to keep points of a given type A (e.g. type A = the 1000 points) inside or on the border of the polygon and keep the points of the other type (e.g. the 1024 points) outside or on the border. We can get a spanning tree of the points of type A consisting only of line segments which do not contain other points (of type A or B) and which is non-self-intersecting. This spanning tree will be the “backbone” of the polygon. We will start with a polygon which consists of the spanning tree, traversing each edge twice (once in each direction) - this initial polygon will repeat some vertices (actually all of them except the leaves of the spanning tree), but that doesn’t matter now. The next step is to “convexify” the polygon - i.e. “fill in” concave corners.

In order to “convexify” a polygon we need to:

find a concave corner i, j, k (i, j and k are 3 consecutive vertices on the border of the polygon and j is a point of type A)
check if other points on the polygon boundary are located in the triangle i-j-k (if any, then this corner will be skipped - although it could be “convexified” a bit by adding extra points, I did not try this)
find the set of points of type B located inside the triangle i - j - k

– if no points of type B are found then simply add the edge i - k to the polygon (“cutting” the corner j)

– if any points of type B are found inside the triangle, then compute the convex hull of these points + the points i and k and add to the polygon the chain from i to k from this convex hull (“cutting” the corner j) – note that this path is still concave for the polygon, but it is less concave (or more convex) than the corner i - j - k.

Note that after each “convexification” step of 3 consecutive vertices i, j, and k, j will be “cut” from the polygon (reducing its number of occurrences on the polygon boundary), the length of the polygon will be reduced (because we will have a “shorter path” from i to k) and some points of type B are added on the border of the polygon (between i and k).

When no more corners can be “convexified” then the convexification process of the initial polygon ends. Note that it is possible to still have some repeated vertices on the polygon boundary - in this case the polygon is bad and will be disconsidered. Otherwise we got a new solution candidate, we compute its perimeter and keep the best one found so far.

OK… so that’s the main idea. Besides trying to optimize (for speed) the convexification process as much as possible, the question of how to generate good candidate spanning trees (e.g. without self-intersections) arises. It turns out that one of the best ways to do this (as far as I could find) is to generate spanning trees which are close to the minimum spanning tree of the points of type A (introduce some distance tolerances and some randomization). Initially, in my first AC submission, I only generated the spanning tree of each type of points and convexified it and that’s it (it scored about 4.375) – I generated more trees in the manner speicifed above later to get better results (actually, I generated a few for each type of points A and B and then generated a lot of trees only for the type which produced the best results).

However, doing what I did above, only got me (after many rounds of optimizations for speed - in order to generate more spanning trees - and much spanning tree generation parameter tweaking) to a score of around 4.08.

In order to get a score below 4.00 (and because we got an extra contest day ) I split the processing time as follows: the algorithm above up to a fixed time limit (in the end only up to 0.2 seconds). Then, I kept the spanning tree which produced the smallest perimeter and generated more trees from it by removing an edge and adding another one instead of it (in order to keep the tree connected). By randomly removing and adding edges (under some constraints, including edge length constraints) I generated new trees and “convexified” their polygons, repeatedly improving the perimeter of the solution.

Actually, on my computer, I was able to obtain even slightly better scores (on my own tests) but with a bit more time. So in the end I tried to optimize my source code for speed (the more trees I tested the higher the chances of improving the solution), but I couldn’t beat ACRush’s score (the one he obtained in the last hours of the contest)

Maybe I could have used the spanning tree improvement algorithm better (e.g. keep multiple spanning tree candidates sorted by their score and expand them in order to get new candidates), but there was no more time for trying that.

damians · January 15, 2013, 9:34pm

That’s pretty clever Actually I was thinking about something like that, but didn’t find such a nice way to make the polygon “convex”. Yesterday night I was thinking you are going to win this time, but (as we all know) ACRush never gives up. Btw. I’m very curious what’s his approach in this problem, kind of hard to deduce something from his 1600-lines-code

mugurelionut · January 15, 2013, 10:12pm

I only started participating in Codechef long contests in Oct. '12 and the Nov. '12 contest was the first one where I thought I was close to winning. I went to bed on the last contest day being on the 1st place and woke up on 2nd place - I wasn’t quite expecting that, then. But this time, although the same scenario occurred, I was kind of expecting it to happen (last night I had a score which was only slightly better than ACRush’s, which was not good enough to stand). Anyway, the whole reason for which I was able to get so close to ACRush’s solution was the extra contest day…

mugurelionut · January 15, 2013, 10:17pm

I am also curious about ACRush’s approach - it would be nice if he found some time to describe it a bit. The only thing that seems clear is that for grid graphs he seems to do some kind of DP (which makes sense given that their size is at most 14x14). But I cannot really tell what else he does.

anudeep2011 · January 16, 2013, 1:21am

I will explain my implementation here

Best score

4.143956 (0.955) in 46.94 sec with some randomization
4.429190 (0.893) in 1.63 sec

Say there are 2 types of points , Color A and Color B

1 > Solve for A,B and B,A and choose the best answer

2 > A points will be either on the fence or inside, B points either on the fence or outside

3 > Calculate the convex hull of A, Say this polygon P

4 > while any of the following 2 operations is needed perform it else return answer

If a B point is inside the polygon we need to get it on to the fence.

If a A point is outside the polygon we need to get it on to the fence.

Both the operations are performed in same manner.Let the point be S

For each edge XY in polygon P check if it is possible to break XY and construct XS and SY without intersecting any other edges.
For all such edges pick the edge that gives best change in length.

This way it gave 4.429190 which is 0.893 of ACRUSH’s score!

With simple randomization (random permutation of A,B points vectors) and running full time it gave 4.143956 which is 0.955 of ACRUSH’s score.

This needed some geometry routines like ConvexHull, LineLineIntersection , etc

ACRush · January 18, 2013, 4:42am

Before explaining the algorithm, I want to say, mugurelionut’s solution has much more potentials than mine. I believe mugurelionut will able to overcome me if we have one more day.

It’s also very interesting in reading those neat solutions getting a very good score.

The basic ideas of my solution : (Let’s call two sets of points as BLACK and WHITE points.)

compute TSP (same strategy as approach 4 in the editorial) of all WHITE points.
attach all inner BLACK points to the boundary of the polygon. (about 4.6)
run SA (simulated annealing) by a) remove one point. b) replace one point by another. c) replace one point a by b, then remove a. d) reverse some points. and etc.

The idea is relatively simply, the only advantage may be that it’s easy to test and refine.

One thing I did on the grid: (improve the scores from 4.1 to 4.0) apply dynamic programming strategy based on following assumptions,

edges are only between adjacent rows (or the same row)
at most 4 cross edges between any pair of adjacent rows.

Based on above, it’s not hard to write an efficient code.

For the contest,

Personally, I think we’d better not extend the end-time of the contest, even new problems added very late. Because that should be a fair game,
It may be better if we have 100K sourcecode limit for the challenge problem, otherwise removing spaces is very boring.
I enjoy the challenge problems these month, thanks for all. I’m also happy to share some thoughts of challenge problems in the future if possible.

damians · January 21, 2013, 1:16am

It looks like the DP for grid was crucial to beat mugurelionut.