PROBLEM LINK:
Author: Praveen Dhinwa
Tester: Kevin Atienza
Editorialist: Kevin Atienza
DIFFICULTY:
Easy
PREREQUISITES:
Expected value
PROBLEM:
There are N coders numbered 0\ldots N-1 and ordered according to ratings. Each coder i, except the first, selects a person \text{choice}[i] with a higher rating to be teammates with, or -1 if the coder is lazy. (Thus, -1 \le \text{choice}[i] < i.) The first person has \text{choice} equal to -1 even though he/she is not lazy.
For each lazy person i, \text{choice}[i] is replaced with a uniformly random value from 0 to i-1 with 1/2 probability.
What is the expected value of the number of teams?
QUICK EXPLANATION:
The resulting graph will always be a forest, no matter what. Also, a forest with N nodes and E edges has exactly N-E components. Thus, the answer is N minus the expected number of edges.
Let L be the number of lazy people. Each lazy person has a 1/2 probability of having \text{choice}[i] be replaced, so by linearity of expectation, the expected number of edges across all lazy people is L/2, and the expected number of edges overall is L/2 + (N-1-L). Thus, the answer is N - (L/2 + (N-1-L)), or 1 + L/2.
EXPLANATION:
Let’s imagine setting the \text{choice}[i] of each lazy person i one by one. At any point in time, the coders form a bunch of teams, and after each update, two of these teams may merge, so the number of teams may increase by one. However, it’s possible that the number of teams doesn’t increase, in case i and \text{choice}[i] were already teammates to begin with. Now, when does this exactly happen?
Suppose we want to set \text{choice}[i] to some value j \in [0,i). Now, consider a path from j to i. Naturally, this sequence of nodes will sometimes decrease and sometimes increase. However, it’s impossible for the sequence to increase and then decrease, because that would mean that some person had two \text{choice} values. More specifically, remember that the only way for j < i to be connected by an edge is when \text{choice}[i] = j, which means that if the sequence went a < b > c, then \text{choice}[b] = a and \text{choice}[b] = c, which is absurd.
This only means that the sequence can only decrease, and then increase until it reaches i. But this means that the node j' right before i in the path is < i and adjacent to i, thus \text{choice}[i] = j', contradicting the fact that i is lazy. All of this means that it’s impossible to choose any \text{choice}[i] in such a way that will connect two people that are already teammates, in other words, form a cycle in the graph. In other words, the graph is always acyclic!
An undirected graph with no cycles is called a forest, and it’s easy to prove some elementary facts about forests. The following fact will be most useful for us:
In a forest of N nodes and E edges, the number of connected components is N - E.
This is easily proven by induction, and is actually pretty intuitive. I suggest drawing out a few random forests to get some intuition behind this.
In our case, each team corresponds to a connected component, so the number of teams is N - E. The expected number of teams is therefore equal to the expected value of N - E, which, by linearity of expected values, is equal to the expected value of N minus the expected value of E. Now N is constant throughout the whole procedure, which means we only need to compute the expected value of E.
Now we’re getting closer to the solution. Let L be the number of lazy people. Thus, there are N - L non-lazy people, and there are already N - 1 - L edges in place. (Remember that \text{choice}[0] = -1.) Now, let i_1, i_2, \ldots, i_L be the indices of the lazy people. We define L new random variables X_1, X_2, \ldots, X_L, where X_k is 1 if the coin lands tails for person i_k (thus \text{choice}[i_k] \not= -1), and 0 otherwise. Using the $X_k$s, we can now express the number of edges in the graph after the procedure:
But we want the expected value of this thing. Easy! By linearity of expected values again, this is simply equal to:
Now, we focus on E[X_k]. We know that there is 1/2 probability that X_k = 1, and 1/2 probability that X_k = 0. Thus, the expected value of X_k is simply 1/2. This applies to all other k s, thus the expression above is simply:
Finally, the expected number teams is:
which is a neat answer. Notice that we didn’t really care which people are lazy. We only need the number of lazy people!
Time Complexity:
O(N)