PROBLEM LINK
Author: Hasan Jaddouh
Tester: Alexey Zayakin
Editorialist: Jakub Safin
DIFFICULTY
CHALLENGE
PREREQUISITIES
none
PROBLEM
You’re given two N\times N matrices C and D. Construct a connected undirected graph on N vertices while minimising its total cost. The cost of a graph with M edges (u_1,v_1),\dots,(u_M,v_M) and degrees of vertices d_1,\dots,d_N is
Note that the vertex costs aren’t necessarily increasing with increasing degrees. The test data are generated randomly.
QUICK EXPLANATION
What, you want a quick explanation for an optimisation problem?
Find a decent solution, then try adding/removing edges and hillclimbing. A decent solution is e.g. the minimum spanning tree or a graph with optimal degrees.
EXPLANATION
First, let’s list some statistical results.
-
the expected sum of N random integers, each in the range [0,M], is \mu=NM/2
-
Central Limit Theorem tells us that for large enough N, the standard deviation of this sum is \sigma=\frac{\sqrt{N}M}{\sqrt{12}}=\frac{\mu}{\sqrt{3N}}
-
the expected value of a minimum of N random numbers is M/(N+1); its standard deviation is \sigma=\mu\sqrt\frac{N}{N+2}
One thing we can do easily is take a random solution – with M edges and N vertices, the cost is ~N\frac{H_{max}}{2}+M\frac{C_{max}}{2} – and then try improving it by adding/removing edges (while keeping connectivity) and recalculating the cost; everytime if we find a better solution, we can keep improving that. Each improvement attempt runs in O(N+M), but the limiting factor is just checking if it’s still connected, so it can be done in O(1) per failed attempt e.g. if the graph is a spanning tree (M=N-1).
It’s usually a good idea to take less edges instead of many. We can also try improving a different solution than a random one.
Minimum spanning tree
Note that if the second sum didn’t exist, we’ve got a problem that’s well-known and easy to solve: computing the MST of a complete graph. This can be done in O(N^2) using a modification of Jarnik-Prim algorithm which uses a list instead of a heap. Turns out the expected cost of this tree is ~1.2C_{max}.
Of course, even if we find an MST, the second sum will give us N random costs, so the expected costs of this approach is ~N\frac{H_{max}}{2}+1.2C_{max}.
Optimal degrees
How about optimising the second sum? We can just pick the minimum H_{ij} for each i, but that doesn’t guarantee us if there even is a connected graph with these degrees. Regardless, the expected minimum is ~H_{max}/N, so their expected sum is ~H_{max} (instead of H_{max}N/2 for random choices).
The required conditions for a connected graph with degrees d_{1..N} are:
-
the sum of degrees must be even (the so-called handshake lemma)
-
\sum d_i \ge 2(N-1)
-
no vertex has degree 0
This works because we can just keep making edges between 2 vertices with the currently maximal d_i and decrementing those d_i, which gives some graph with \ge N-1 vertices; then we can keep removing a non-bridge edge from some component and using it to connect 2 components until the graph is connected.
It might turn out that the degree sequence which gives the minimum cost from H isn’t feasible, but non-zero degrees are trivial, the expected \sum d_i is ~N^2/2 (the indices j which give minima are random) so that condition shouldn’t be a problem and if the parity doesn’t fit, it can be fixed by changing just one degree. All in all, the cost of the second sum can be ~H_{max}.
We still need to choose edges to fit the degrees. This is good, because it allows us to minimise the cost of the first sum as far as possible and get something better than ~\frac{C_{max}}{2}\frac{N^2}{4} (there are \sum d_i/2 edges). After we choose the edges, we can still run the hill-climbing / random optimisation.
Something else
We have described two approaches which try to minimise one sum at the cost of keeping the other pretty large. Something in between would be even better – picking degrees so that the cost from H wouldn’t be toolarge and the number of edges wouldn’t be too large either, choosing worse edges to get better degrees… it also depends on the specific values of C_{max} and H_{max}.
What did you use?