PROBLEM LINK:
Author: Jakub Safin
Tester: Kevin Atienza
Editorialists: Suhash Venkatesh and Pushkar Mishra
DIFFICULTY:
Medium-Hard
PREREQUISITES:
BFS, DFS, Probability, Expected value
PROBLEM:
Given a R\times C maze where each cell is either empty (colored white) or blocked (colored black), you can click on any white cell. Whenever you click on a white cell, it turns red and all cells reachable from it also turn red. The process is repeated until there is a red-colored connected path from start (0, 0) to end (R-1, C-1), when the game ends. You are required to find the expected number of clicks to finish the game.
EXPLANATION:
The first thing to notice is that given a set of connected white cells, clicking on any of them turns all these cells to red. Hence, we first need to group the cells into connected components. This can be achieved by using a DFS - Depth first search (or) BFS - Breadth first search.
It is given that there is at least one connected path of white cells from start (0, 0) to end (R-1, C-1). It is easy to note that after breaking the white cells into groups of connected components, there is exactly one component which contains both start and end cells. One more thing to note is that we are only interested in the number of cells in each component, since clicking on any cell in a component turns all cells in that component to red.
Let us assume that there are (n+1) connected components with sizes x_{0}, x_{1}, ..., x_{n}, where (n >= 0). Let x_{0} be the component which contains both the start and end cells. The game ends when we click on any of these x_{0} cells.
Subtask 1
For this subtask, R\cdot C <= 30. It is easy to observe that the maximum number of connected components is utmost (R\cdot C) / 2 = 15. In fact, it is lower than 15 since we havenât accounted for the existence of a path from start to end. But 15 is a good enough bound for us to be able to come up with a solution.
We can use a bit mask to denote which components have already been visited. At every step, try to click on any cell of any unclicked component. We can try every possible unclicked component in every step. The game ends when we choose to click on any of the x_{0} cells (component containing the start and end cells).
Pseudo code:
// There are (n+1) components totally => x0, x1, ... xn for (i = 0 to 2^(n+1)-1) { dp[mask] = -1 } double E(mask) { if (dp[mask] > 0) return dp[mask]; sum = 0, ans = 0; for (i = 0 to n) { if ((1<<i) & mask) continue; sum += x[i]; } for (i = 0 to n) { if ((1<<i) & mask) continue; if (i == 0) ans += (x[i] / sum) \* (1); else ans += (x[i] / sum) \* (1 + E(mask ^ (1<<i))); } return dp[mask] = ans; }
In the above code, calling
E(0)
gives the required answer. Let us take an example with n = 4, and we have to compute E(10010). This means that we have already clicked on the components 1 (x_{1}) and 4 (x_{4}). The number of remaining cells which can be clicked upon in this step is (x_{0} + x_{2} + x_{3}). The probability of clicking on any cell in component i is now x_{i} / (x_{0} + x_{2} + x_{3}), where i = 0, 2 (or) 3. Now, we can take any of these choices and the expected value can be written as:
This gives us an \mathcal{O}(n \cdot 2^n) solution, where n = \frac{R\cdot C}{2}. This is enough to pass the first subtask.
Subtask 2
For this subtask, R\cdot C <= 50000, The bit mask method used for subtask 1 is way too inefficient for this. We need to do something much smarter. Continuing from where we left in the previous subtask, letâs define E_{n}(x[0..n]) as the expected number of clicks to finish the game when there are (n+1) components of sizes x_{0}, x_{1}, ..., x_{n}. Let us also define E_{n-1}(x[0..n]\setminus x[i]) to be the same as above, but containing only n components, with the component x_{i} removed. We can write the expected value as:
This indicates that the expected value should have some closed form. Letâs try to find the expected values for small values of n. It is obvious that E_{0} = 1.
You can use a pen and paper to do the above simplification and see that it is straight forward to get to the above derivation. Looking at the pattern, you might be tempted to generalize the following:
Turns out that this is indeed the solution to the problem. However, we still need to prove this. Let us try to prove this by induction. Let us assume the following:
We can write E_{i+1} as follows:
In the last double summation, note that the order of summation doesnât matter. Hence, we can write the above equation as follows:
Hence, we have proved the above formula by induction. The time complexity to perform the BFS/DFS is \mathcal{O}(R\cdot C) and the expected value can be found in linear time in the number of components. We already showed above that \frac{R\cdot C}{2} is a loose upper bound for the maximum number of components. Hence, the problem can be solved in \mathcal{O}(R\cdot C) time.
COMPLEXITY:
\mathcal{O}(R\cdot C) per test case.