Problem Link:
Difficulty:
Hard
Pre-requisites:
High School Geometry
Problem:
Given a set S of N points in plane, you have to draw two lines such that sum of squares of distance of each point from nearest line is smallest.
Explanation:
The first thing one realizes about this problem is that the business of taking minimum of the distances is causing all the trouble. If we had to draw a single line to optimize sum of squares of its distance from all points, it would be a piece of cake. High school calculus can be used to find the optimum equation of the line. Here is a brief sketch:
Let the equation of optimum line be y +mx +c = 0. The square of distance of a point (x0,y0) from this line is (y0 + mx0 + c)2/(1 + m2). Therefore, sum of squares of distance of all points is
(Y2 + m2X2 + c2 * N + 2 * m * XY + 2 * Y * c + 2 * m * X * c) / (1 + m2)
where Y2 = sumi yi2, X2 = sumi xi2, XY = sumi yixi, X = sumi xi, Y = sumi yi, N = sumi 1
We need to find optimum m and c. To do this, first differentiate with respect to c to get c as a (linear)function of m, plug it in the above, and differentiate with respect to m to get optimum value. Refer to solutions at the end for more precise details. Assuming that the one line case can be solved in constant time, given X, Y, XY, N, X2, Y2, we can now proceed.
Now imagine the optimal pair of lines. Let the lines be named A and B. Every point in our set S is closer to exactly one of A and B. Let SA be the set of points in our S closer to A than B. Similarly define SB. Note that SB = S - SA.
Now suppose, by some magic, we managed to find the set SA. Then we would again be done. This is because line A is the optimal line for the set SA, and line B is the optimal line for remaining points.
Cool ! So, we now have a O(N * 2N) solution, which basically iterates over all subsets SA of S, and reports the solution.
Obviously, the next step is to note that sets SA and SB cannot be arbitrary. If we are given a pair of lines A and B, then SA consists of exactly those points in S which do not lie in the colored region. Similarly, SB consists of those points which lie in the colored region. Justification is very straightforward and left to the reader. The colored region can be identified by a pair of perpendicular lines. For any two perpendicular lines L, M, let RL,M denote the the first and third quadrant of the co-ordinate system defined by line L, M. Formally, RL,M = {points P | tan ā PXL ā„ 0}, X being intersection point of L and M.
Due to discussion above, we know that the set SA cannot be arbitrary. The set SA has to be such that there exist two perpendicular lines L and M, so that SA = S ā© RL,M.
In fact, we give a list Z of pairs of lines(i.e. Z={(L1, M1), (L2,M2), ā¦ (LM,MM)}) such that, for any arbitrary pair of lines (L, M) we can obtain another pair of lines (Lā, Mā) which satisfies the following
- S ā© RL,M = S ā© RLā,Mā
- (Lā, Mā) ā Z
In other words, the possible lines L and M can be infinite in number, but we can find a finite set of pairs of lines which still induce all possible partitions of the given set S. In fact we can show that the required number of pairs of lines is polynomial in N.
Here is how to obtain Lā, Mā from any L, M.
-
Translate L, parallel to itself towards right, until L hits some point P1 ā S. Stop translating when it is infinitesimally small distance away from P1.
-
Now translate M, parallel to itself towards right, until it hits some point P2 ā S. Again stop translating when distance between line and point becomes infinitesimally small.
-
Rotate both L and M clockwise so that i) M remains perpendicular to L, ii) One end point of L is fixed at P1, iii) One end point of M is fixed at P2. The intersection point of L and M moves in a circle with P1P2 as diameter. Stop rotating when one of L or M hits some point P3 ā S. Again, stop rotating when it is infinitesimally small distance away from P3.
Now it is clear that the new lines Lā, Mā obtained by above process still induce same partition of S as L, M did. Moreover, Lā, Mā are completely defined by
- Points P1, P2, P3.
- One bit indicating whether P3 was hit from above or below. It doesnāt matter whether Lā actually hit P3 or Mā because they are interchangeable.
The set Z of all possible final pairs (Lā, Mā) obtained after translation/rotation has size 2n3, and can be easily enumerated. We can solve the problem in O(n^3) overall time as well, because the partitions imposed by these lines can be interconverted by adding/removing single points, and hence (X, Y, XY, X2, Y2) tuple can be re-calculated in constant time. See the solutions below for exact details.
Setterās Solution:
Can be found here
Testerās Solution:
Can be found here
Editorialistās Solution:
Can be found here