Disjoint Set Data Structure, Simple Math
There are N nodes in a graph labeled from 1 to N. Each node has an associated cost with it, given by cost array. You wish to connect this graph by constructing edges between the nodes.
- You can construct an edge between two nodes a and b iff cost[a] ≥ 0 AND cost[b] ≥ 0
- The cost of the construction of the edge is equal to cost[a] + cost[b]
Some edges already exist. Hence, some nodes are already connected. You wish to find the minimum cost of connecting the entire graph.
We will be using a Disjoin Set Data Structure that supports the following operations
- init(N), initialize N disjoint sets.
- find(A), get the id of the disjoint set to which A belongs.
- join(A,B), merge the two disjoint sets in which A and B belong respectively.
Union-Find along with Union by Rank and Path Compression is the asymptotically optical way of maintaining Disjoint Sets.
After considering the existing set of edges, we will have some disjoint sets, where there are one or more nodes in each set. Let us assume there are K disjoint sets.
We only need to put K-1 edges
This is easily proven by considering that adding each edge reduces the number of disjoint sets by 1. If it doesn’t, then the edge is connecting two nodes that are already connecetd.
Thus, we only consider K-1 edges at most.
As we dwell deeper into discussing the solution to the problem we will encounter a lot of difficulty discussing the ideas if we keep reminding ourselves of ignoring the vertices with negative costs. To be able to handle them in the discussion below (and maybe in your implementation as well), we assume that each negative cost is replaced by a very large positive cost. We will see soon that doing this does not affect the optimal answer.
We can consider each disjoint set as an independent vertex and try to build a minimum spanning tree on the graph. The problem is, that for K disjoint sets, there may be as many as KC2 edges. We know that we can solve the problem of finding the minimum spanning tree by greedily considering the edges in increasing order of weights.
Here comes to our rescue the definition of the weight of edges in the problem statement.
Edge Weight for edge from a to b = cost[a] + cost[b]
Let x be the vertex for which cost[x] is smallest. An edge between x and any other vertex will have a lower cost than any other edges in the graph.
Also, considering all the edges with one end point on x means we are considering sufficient edges for a connected graph!
Note how assuming that all negative cost vertices were actually very large positive cost edges allows us to get around several edge cases
- The lowest cost vertex, x, is automatically the lowest non-negative cost vertex
- We will consider all the edges from x to other vertices in the order of increasing weights, thus we avoid considering edges to vertices with negative weights
- If we are forced to consider an edge to a vertex with large weight (negative cost vertex originally), then we know that we could not have connected to the connected component of that vertex through any vertex whose weight was positive (otherwise we already would be connected to that component).
Hence, if the answer could be reached before connecting to any vertex with very large weight, we can print the answer, otherwise, we will print -1.
Note that we can avoid the sorting step altogether and consider the minimum cost vertices in each connected component. This way, if we know the sum of costs of the minimum cost vertices in each component, the answer is simply
cost[x] * (K-1) + sum - cost[x]
Where x is the vertex with the smallest cost.
Can be found here.
Can be found here.