CHALLENGE - MEDIUM
Geometry, Data Structures and Simple Math
You are given a point cloud and a set of query points. For each query point, find the nearest point form the point cloud. You are awarded ‘1’ for a query solved successfully, and 0 otherwise.
This is the classic problem of finding nearest neighbors.
The limits on number of points in the point cloud - 50000 - and the number of queries - 50000 - ruled out brute force approaches. The limits on coordinates (-1e9 to 1e9) meant that approximate solutions would need very high precision to score well.
First, lets discuss what was needed to get AC.
One could simply brute force the solution for the first query point and print random values (albeit, valid) for the remaining queries. A lot of submissions received WA even for this simple approach.
The reason was - they were not using the right precision to store the distance. The squared distance can be as large as 1.2e19, which only fits completely inside the “unsigned 64-bit integer” datatype. Solutions using “double precision floating point” to store the distance were also doomed to fail because double only stores 15 digits of precision and the exponent.
The test cases were unforgiving with regard to precision and were crafted especially to fail solutions that relied on “double” or “long long”.
Since this was a CHALLENGE problem, there was no exact way to solve the problem.
The challenge for us (tester + setter) was to ensure that the test cases were hard enough to not be torn apart by any one single approach. To that end, we generated test cases that were very hard to solve because there were several points that could be considered CLOSEST within close precision. Our solutions could at best score ~20000 on the test data and that too after relaxing the time limits xD
Some of the possible solution themes that were explored were:
Approach 1. Clustering
- Let there be C clusters.
- Choose C points.
- For each point ‘c’, sort all the points according to distance form ‘c’. Choose the first K points as candidates that lie in the cluster of ‘c’.
Now, for each query, find the cluster that is closest to the query and find the closest point from among all the points in the cluster.
Approach 2. kd-tree
KD-Trees are orthogonal range search data structures that can report all points within a fixed x-min to x-max and y-min to y-max, very efficiently.
3D KD-Trees can be constructed for the set of points. For each query, the candidates to be tested are reduced by the kd-tree. They can be reduced further based on the precision of the search.
3D KD-Trees perform very well (scoring 50000!!) when the point cloud is gaussian [ and you can guess that the point cloud wasn’t gaussian ]
Approach 3. Metric tree
These are trees that partition the 3D space using spheres. There are several types of metric trees. We used the Cover Tree during testing. We found them performing very well on very specific point clouds, and pretty badly otherwise.
The top solutions in the June Challenge use metric trees along with transformations to the point cloud.
Will be soon updated.
Can be found here.