SCLUSTER - Editorial



Author: Snigdha Chandan
Tester: Hiroto Sekido
Editorialist: Kevin Atienza




Heuristics, facility location problem, simulated annealing, steiner tree


Roughly speaking, given an N\times N grid with individuals K in it and each having a number called its “interaction index”, you are to move some (maybe none, or all) individuals so that all K form a single connected component (where edge- and corner-adjacent individuals are considered adjacent). The goal is to minimize the value 1000A + 10B, where A is roughly the weighted sum of distances the individuals are moved, and B is the “average” difference between interaction indices of all adjacent clients. (The exact formula for A and B are given in the problem statement)


This problem was inspired by the “mobile facility location problem”, which is a variant of the classical facility location problem. In the mobile facility location problem, each facility and client is assigned to a start location in a metric graph and the goal is to find a destination node for each client and facility such that every client is sent to a node which is the destination of some facility. The objective is to minimize the total distance clients and facilities travel.

In this problem (“social cluster”), we can consider all the individuals on the grid as clients and our objective is to form a connected cluster while minimizing the total travelling distance of all the clients. Here, instead of summing up the distances travelled by individual clients, we will sum up the weighted distances (the weight is the interaction index). Here we are also minimizing an additional parameter: the average difference between interaction indices of all the adjacent clients with a particular client.

It is an NP-hard problem, so it’s really difficult to come up with the exact optimal solution. Therefore we will try to approximate the optimal solution. We will discuss some of the heuristics below. Take note that we are minimizing the following score: \text{Score} = 1000A + 10B, where A and B are two parameters defined in the problem statement. Roughly speaking, A represents the cost incurred by moving individuals. The longer the distance, the higher the cost, but it’s less costlier to move individuals with higher interaction indices. On the other hand, B represents the “average” difference between interaction indices of all adjacent clients.

First of all, we can forget about B and try to focus on parameter A, i.e. we will try to construct a connected cluster of by moving some of the clients. (After all, A is (exactly) one hundred times more important than B.)

We can use “simulated annealing” approach as a heuristic in order to construct the connected component. We can also consider it as a special case of classical K-median problem.

We can also create a metric Steiner tree consisting of all the clients at maximal position as terminal nodes and all the clients at non-maximal positions as Steiner nodes. Already a very simple 2-approximation algorithm exists for computing the metric Steiner tree.

Once we are somehow successfully done with constructing the cluster, we can focus on rearranging the positions of the clients while still maintaining the connectivity of the cluster in order to minimize B as well as to minimize the overall score.

We have to handle this situation very carefully as some arrangement may minimize B but may result in increasing A.

There are a lot of approaches for this problem, and some are more effective than others. Feel free to discuss your approaches below :slight_smile:



On the contest page, there was a note stating that more test datas are going to uploaded for this problem. Will/can these new test datas affect my contest rank?

The way I approached it was fairly simple.

First, ignore the B cost, so just optimize moving the people.

Then, pick a random cell to start growing our connected component. We repeat the following:

  1. Find the person who has the lowest cost of moving to an available cell.

  2. Move that person to that location.

  3. Mark all neighboring unoccupied locations as available.

I try this for a few random cells. Actually, I tried median of x/y positions or even center of board, but it didn’t seem to effect the score too much. Anyways, after doing this, you can see for each person how much they moved in the x/y direction. We can shift the entire connected component by the median of the x/y moves to get a better cost afterwards, so we do that as well (making sure the connected component stays in bounds).

This was able to get a score of about .9 I’m not sure how much it would have increased if I considered the B costs, but I wasn’t able to find a nice way to incorporate that without making my program too slow. (link to my submission: )


I think it is good to mention the programming language / compiler version for the programs that was uploaded on behalf of tester and setter.

I tried setter and tester solutions and one is giving wrong answer and other segmentation error. :slight_smile:

The setter and tester solution are for the problem “STETSKLX”.

Thank you for pointing it out. It has been corrected! (the broken links will hopefully be fixed soon)