Data Structures,Randomized Algorithms
Given N servers, each server located on a point (x,y) in the plane. Each server has processing units. You will be given the number of seconds each processing unit needs to complete a task (they are not necessarily the same). At most one task can be assigned to a unit, and it cannot be used before it finishes the task.Q tasks are pushed one by one into your queue, you can also assume that each task is located on a point in the plane. You must assign each task to a processing unit of a server you choose. The cost would be the distance between the task’s and the server’s points plus the processing time of the unit you have assigned this task to. You must solve this problem ONLINE. Each task must be assigned immediately to a unit. The time interval between two tasks is exactly one second. The total cost of all assignments must be minimum.
The problem is a combination of two simplified real-life problems, KNN and min-cost job-scheduling.
KNN (k-nearest neighbors) is an important algorithm used in pattern recognition. Here we only had 2 dimensions to deal with. This is not the case for real problems. There are a lot of variations of the job-scheduling problem. Here we had an interactive version that could be solved (not optimally) by a greedy strategy.
KD-Trees are the perfect and most popular tool to solve the KNN part of the problems. Any other space partitioning data structure would do the job. Using any of them helps finding the closest server (or set of servers) to a particular task. Then one core had to be chosen only from that set of server(s).
An elegant approach is contracting the plane into a smaller one by scaling all points by some constant.
The other part of the question required a lot more creativity than data-structures. And there is no right answer. Most contestants used a priority-queue for each server in order to track the optimal core. Another option was to use a balanced binary-search tree (or even a sorted-array) in order to maintain the same set of cores. Choosing a core at random from the set of optimal (by distance) servers was another option.
A prospective approach was choosing servers at random (400 was the number used by quite good number of contestants) and assign the task to the optimal core of each of them. This approach is effective when you don’t have enough memory to a store a complex data structure. A lot of people would try to come up with more strategies instead of implementing hard data structures.
Exploit all available time for a testcase. Finishing a test in 0.5 seconds while you have one extra second is pointless. You can try a lot of random choices in the remaining time or maybe try additional strategies.
Although it’s harder to code, it is better to find a bag of ‘closest’ servers, the expected outcome of the algorithm won’t be that perfect if you always tend to few choices.
Trying random servers is not a bad idea (trying random solutions is not a bad idea in general). For some problems random solutions are easy to implement ,reliable and efficient. On the other hand deterministic solutions for them might be a nightmare. There is no guarantee that the (BEST) (again under some measure of performance) is the closest one to you. That is why the cost terms of the sum were so close to each other.