PAIRCLST - Editorial

pushkarmishra · March 13, 2016, 3:23pm

PROBLEM LINK:

Practice
Contest

Author: Sergey Kulik
Tester: Vlad Mosko
Editorialist: Pushkar Mishra

DIFFICULTY:

Medium

PREREQUISITES:

Dijkstra’s Algorithm, Floyd Warshall, All Pairs Shortest Path Problem

PROBLEM:

Given is a graph G with N nodes and weighted edges. Out of these N nodes, K are special nodes. We are required to find the minimum distance between any two special nodes.

EXPLANATION:

Subtask 1
In this subtask N \leq 300. This implies that we can simply perform the floyd-warshall algorithm for finding the shortest distance between all pairs of nodes. Once we have prepared the table of such distances all we need to do is iterate over all pairs of nodes and take the minimum of all distances between the special nodes. Floyd-warshall algorithm is \mathcal{O}(N^3). This complexity is fine for this subtask.

Subtask 2
For this subtask N \leq 10^5. Since, graph is connected, there are at least as many edges. Clearly, we can’t apply the algorithm used in the first subtask since it will exceed time limit. The problem asks us the shortest distance between any two special nodes in the graph. This leads us to an observation. We only need to think about distances of nodes from the special nodes. In other words, we do not need the “all pairs shortest path” algorithms. We just need the shortest distance from the special nodes to all the nodes in the graph. For this subtask, the number of special nodes is less than or equal to 10. That is a fairly small number. All we have to do is run Dijkstra’s algorithm from each of the special nodes. This will give us the shortest distance of each node from the special nodes. We can then find the shortest distance between any two special nodes. The complexity of dijkstra’s algorithm is \mathcal{O}(N\log N). Therefore, the complexity of this algorithm works out to be \mathcal{O}(NK\log N). This is sufficient for this subtask.

Subtask 3
In this subtask, the number of special nodes increases to 10^4. We can’t get the previous algorithm to work under the given time limits.

Let us start by thinking of an algorithm to solve a simpler version of the given problem wherein all edges are of weight 1. Now we propose the formal algorithm for this problem and then prove its time complexity.

The formal algorithm is as follows:

We pick a random special node and perform a BFS from this point. We stop at the first level s which contains another special node.
We know that the minimum distance between any two special nodes can’t be more than s. So we again take a special node at random which hasn’t been taken before and perform a BFS again. If we don’t find any special node in s distance, we terminate the search. If we do, then we update the value of s and repeat the procedure with some other special node taken at random.

It is pretty intuitive why this algorithm gives the correct answer. But why does it run within the given time limits? What is its time complexity?

In the first look the algorithm seems to have a complexity of \mathcal{O}(NK). But in fact it is \mathcal{O}(N\log N). Basically, we show that after each iteration of the BFS, the minimum distance to be searched before which another special node is found – henceforth referred to as the radius of search – becomes half. Let us see why.

After the first run of BFS from the randomly chosen special node, we can have two cases:

s \geq \frac{N}{2}
s < \frac{N}{2}

In the second case, we can clearly see that the search radius for the next BFS will be \frac{N}{2}, i.e., half of the current one. This is because we will anyway terminate the search if we don’t find a special node in that radius. But what happens in the second case?

If s \geq \frac{N}{2}, this means that all the other special nodes apart from the one we started our BFS on must all be clustered within a radius of N-s. Since s \geq \frac{N}{2}, this means that in the next BFS, we will definitely find a special within \frac{N}{2} radius of the one we start on.

Therefore, we see that in every iteration of BFS, we half our search radius of the graph. The algorithm therefore is \mathcal{O}(N\log N).

We can now go back to our original problem. Since, our original problem involves edges of different weights, instead of performing BFS, we need to perform Dijkstra’s algorithm. But how many times? Following the same reasoning as the simpler version of the problem, we get that Dijkstra will be performed \mathcal{O}(\log S_e) times, where S_e is the sum of weights of all edges in the graph. This gives a total runtime of \mathcal{O}(N\log N\log S_e) which is sufficient for the given constraints.

Editorialist’s program follows the editorial. Please see for implementation details.

OPTIMAL COMPLEXITY:

\mathcal{O}(N\log N\log S_e) where S_e is the sum of weights of edges.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

AnonymousBunny · March 26, 2016, 11:09pm

Another approach: start a Dijsktra with the marked vertices being the initial set. For each vertex v, let boss[v] and dist[v] be the closest marked vertex to v (pick any one if there are multiple choices) and distance from v to boss[v] respectively (both of these can be found from that Dijsktra). Now for each edge u-v such that boss[u]!=boss[v], one candidate answer is dist[u]+dist[v]+len[u][v]. It’s easy to see that the answer will be the smallest of these candidate answers. Code: https://www.codechef.com/viewsolution/9751280

I failed this in contest because this:

for (int i=1; i<=n; i++) {
cin>>u>>v>>w;
...
}

nadeem_akhtar · March 26, 2016, 11:54pm

I think,the algorithm wherein all edges are of weight 1 need not give the correct answer.
Consider the case :-

2 - 3 - 4 - 5 - 6 - 7
|
8
|
9
|
10

where 2,4,6,7,10 are special nodes.
Let in the first BFS we choose 2,then answer will be updated with 2 when 4 is found.
In the second BFS if we choose 10 then we will not find any special node in range of 2,thus algrithm will terminate.So,it will output 2 as answer.But the actual answer is 1(6 and 7).Please correct me if I’m wrong.

pushkarmishra · March 27, 2016, 12:03am

Editorialist’s program gives 1 as output. Please refer to it as it follows the editorial closely. if you still dont understand, i shall try to explain

waterfalls · March 27, 2016, 4:44am

I don’t see how the editorial’s solution works.

Just testing with CF custom invocation, the editorialist’s solution seems to TLE (>10s) on, for example, a “star” graph with 1e4 arms of length 9 with one center node. The tester’s solution is the same. The author’s solution doesn’t take in the correct input and output but seems to be fine (same as AnonymousBunny’s solution).

I think the problem with the editorial’s reasoning is that the halving process, while true for one iteration, does not hold past the first. The process does not make a “smaller” version of the problem, since the number of nodes remains the same.

I really like the problem: interesting to think about, the author’s solution is great, and I also thought of my own totally different solution which I find pretty cool I do think the long armed star graph is a pretty natural test for this problem though and it looks like a lot of bad solutions passed.

I might have missed something, if so please let me know!

arunmittal53 · March 27, 2016, 8:59am

changing “long long” by “int” make accepted within time limit

pushkarmishra · March 27, 2016, 10:41am

Ok, I shall explain. When we BFS from 10, we don’t find anything within range of 2. But that doesn’t mean we terminate the algorithm. We start from another special node, say 4. The best answer remains 2 so we will still search a range of 2. When we do it from 6, we find a better answer, i.e., 1.

pushkarmishra · March 27, 2016, 10:42am

Could you please elaborate on the “star graph” test case. Like, could you produce such a test case and provide me the link.

deep9539 · March 27, 2016, 10:56am

In Editorialist Solution,

for (int j = 1; j <= n; j++) d[j] = 1e9, vis[j] = 0;

The maximum value of K = 10^4 and of N = 10^5 which gives complexity O(N*K) which is 10^9 so how does it pass the test cases.

Thanks in advance!!!

waterfalls · March 27, 2016, 10:56am

Sure yeah I don’t think this is standard terminology

Basically a tree with k chains of length 9, with the root being 90001. Also this is unweighted basically. Tell me if there’s anything wrong with this generator.

pushkarmishra · March 27, 2016, 11:17am

Sorry, the wrong version of editorialist’s program was uploaded. Please click on the editorialist link again to view the updated version.

shariquemohd · March 27, 2016, 11:21am

@waterfalls can you please share your solution/approach ?

deep9539 · March 27, 2016, 11:30am

Yeah, that’s was the needed one!!

pushkarmishra · March 27, 2016, 11:48am

@waterfalls For a test case generated by your generator, author’s solution gives “no luck at all”. why so

ps06756 · March 27, 2016, 12:26pm

Hello,
I have implemented the algorithm mentioned in the editorial.
I have a solution, in which I am using int64_t to store the distances, but I am getting TLE with that solution, on the other hand, If I use int32_t in place of int64_t I am getting AC.
Why is that ?

https://www.codechef.com/viewsolution/9754905
https://www.codechef.com/viewsolution/9754870

waterfalls · March 27, 2016, 12:37pm

@pushkarmishra probably because the author’s solution seems to be using a different input/output than the final problem? This is probably due to a revision of the problem. I think the int aa overflowed to a negative hence aa<2. Try another accepted solution? (like AnonymousBunny)

waterfalls · March 27, 2016, 1:54pm

Alternate solution that I find pretty cool:

We want to find shortest paths with a variety of starting points. This brought to mind the multiple source shortest path problem, which is basically the same as single source, except you add all starting points to your queue/priority_queue with distance 0.

Now, this doesn’t work since your path will just go from like special->not_special->same_special which is not allowed since the start and end must be distinct.

The next idea is to only choose some of them as starting points. Then, if the optimal pair is (a,b), you need exactly one of (a,b) to be chosen as a starting point, and then you will find the answer as the shortest distance from a starting special point to a nonstarting special point.

How do we decide which to choose as starting points? Here’s a nice step - we win if we choose exactly one of the starting points. Notice if we choose a RANDOM set of starting points, there’s a 1/2 chance that we chose exactly one as a starting point. Now, if we iterate choosing a random set of starting points enough times, we will find the answer with high probability.

I chose MAGIC=20 (1/2^20 chance of failing a single test), here’s the implementation: https://www.codechef.com/viewsolution/9755374.

Complexity is O(nlogm) per iteration, and a bit of math can help choose a good number of iterations (or just until time runs out).

waterfalls · March 27, 2016, 1:55pm

@shariquemohd I posted an answer below.

anupam_datta · March 27, 2016, 3:01pm

https://www.codechef.com/viewsolution/9755226

how did this pass?

shariquemohd · March 28, 2016, 1:31am

Thank you