PROBLEM LINK:
Author: Kevin Atienza
Tester: Istvan Nagy and Kevin Atienza
Editorialist: Kevin Atienza
PREREQUISITES:
Probability, expected value, probability distribution, amortized analysis
PROBLEM:
You are given a sequence V_1, \ldots V_N. We define S_0 = 0 and S_i = S_{i1} + V_i. However, when adding two numbers and the sum exceeds 999999, we instead select the sum as a random integer uniformly chosen from 0 to 999999. What is the expected value of S_i for all 1 \le i \le N?
QUICK EXPLANATION:
Let P_i(v) be the probability that S_i is v, for 0 \le v \le 999999. Initially we have P_0(0) = 1 and P_0(v) = 0 for v > 0.
When we have a new value V_i, we need to update this array. We have the following:
In other words, the array is shifted V_i steps to the right, and the V_i entries that overflow are redistributed into the whole array uniformly. Also, the expected value of S_i is almost the expected value of S_{i1} plus V_i, except that we have to adjust for the entries that overflowed. The things we need to implement are the shifting operation and the computation of U_V = \displaystyle\sum_{j=1}^{V} P_{i1}(1000000j) for a given V.
To implement this efficiently, instead of creating the P array explicitly, we instead represent it as a list of runs: (L, p), which means a length of L elements with value p. For example, initially, we have [(1,1), (999999,0)]. Then to update the representation, we remove a “chunk” (i.e. a series of runs) of size V_i from the end of this list, use them to compute U_{V_i}, add the constant \frac{U_{V_i}}{1000000} to all remaining elements, and append a new run at the beginning. The expected value of S_i can also be updated efficiently.
Instead of adding a constant value to all elements of the array explicitly (which can be slow), we instead hold a variable T which is to be added to all values of the array, and simply increment T. Then as we access each run (L,p), we use instead the value p + T.
Now, in the worst case there can be up to N runs removed in a single operation. However, each run is only removed once from the array, so the amortized cost is O(1) and the overall complexity O(N)
One final hurdle is precision: the error accumulates quickly enough because T only increases. To fix this, we simply explicitly add T to all values of the list from time to time to make the error growth smaller. If we decide to add this every K operations, then the complexity becomes O(N^2/K). Choosing something like K = \sqrt{N} would yield a running time O(N\sqrt{N}).
There is also a way to solve this with a segment tree with no subtractions, and therefore possibly less precision error.
EXPLANATION:
The definition of the expected value of some random variable X is the sum of the possible values of X multiplied by their probabilities. We know that the possible values of each S_i are simply [0,1,\ldots,999999]. For i \ge 0 and 0 \le v \le 999999, let P_i(v) be the probability that S_i = v. Thus, the expected value of S_i is by definition
To compute each E[S_i], we will need to compute P_i(v) for all i and v quickly. For the base case i = 0, we have P_0(0) = 1 and P_0(v) = 0 for v > 0 (i.e. S_0 is equal to 0 with probability 1). Also, the values from P_i will be dependent on the values from P_{i1} because of the equation S_i = S_{i1} + V_i (where the “+” is our special “addition”). Thus, we will try to compute P_i and P_{i1}.
Suppose we have all values from P_{i1}, and now we will add V_i to S_{i1}. These probabilities will change. Let’s try to compute P_i(v). If v < V_i, then the only way to arrive at the “sum” v is when the addition S_{i1} + V_i overflowed. But this can only happen for the values S_{i1} \ge 10^6  V_i. Also, if this is true, then there is only a \frac{1}{10^6} chance of actually getting the value v (because all sums are equally likely). Therefore, we find that for v < V_i:
On the other hand, if v \ge V_i, then there’s another way to arrive at v, and that is when S_{i1} was v  V_i at the beginning. This occurs with probability P_{i1}(v  V_i), so we find that for v \ge V_i:
Combining these two, we get:
Thus, in order to compute the answer, we must solve two problems:
 Maintaining and updating the list of probabilities P_i as we increase i, and
 Quickly computing the expected value given these probabilities.
The update
We can restate the transformation from the array P_{i1} to P_i. Let P be an array of length 10^6, such that P[v] is initially P_{i1}(v). Then:
 Remove the last V_i elements of P. Let t be the sum of these removed elements.
 Add V_i elements at the beginning of P, each of value 0.
 Add \frac{t}{10^6} to all values of P.
(The first two steps taken together can also be seen as a right shift.) In addition, we also need to maintain the value \sum_v v\cdot P[v], so we can get E[S_i] easily. Initially, P[0] = 1 and P[v] = 0 for v > 0.
In the second operation, we add a contiguous run of 0 s at the beginning, which suggests to us that P contains long runs of data. (A run is a contiguous subsequence in which the same value occurs.) The third operation doesn’t do anything to change that: It may change the values of each run, but it doesn’t affect the structure / lengths of the runs at all. This hints that we can represent P as a list of runs. A run itself can be represented by two numbers (l,v): its length
and its value
. Initially, we have two runs: (1,1) and (10^61,0). Also, initially \sum_v v\cdot P[v] = 0.
Now, let’s implement the operations above on this representation of P.
First operation
Removing the last l elements is straightforward:
 Take the last run (l_k,v_k) in the list. If l_k \le l, then remove it from the list, decrement l by l_k, and repeat this step.
 Otherwise, the last run satisfies l_k > l. In this case, simply decrement l_k by l. (Don’t remove it.)
Along the way, we must be able to update the value of \sum_v v\cdot P[v], so when removing (l,v) from the back of the list, we must adjust it. If the total length of the runs is L, then the first index of this run in P must be L  l, so we need decrement S by
A similar adjustment can be done when decrementing the length of the last run. (As needed in the second step.)
Second operation
Now, what about adding a run of 0 s at the beginning? Suppose we want to add l zeroes. These zeroes don’t contribute to \sum_v v\cdot P[v], but the shifting of the remaining elements affect it. Specifically, the change in value is equal to
Thus, we also need to maintain \sum_v P[v] in addition to \sum_v v\cdot P[v].
Third operation
Finally, let’s tackle the third operation. We wish to add a value t to all runs in the sequence. We don’t want to walk through the list and add t individually to the values of the runs because that would be slow. Instead, what we can do is to maintain another variable “\text{add}” separately, indicating the value that we want to add to all runs’ values. To add t to all runs, we simply increment “\text{add}” by t. Then, when we access a given run (l,v), we just say its value is v+\text{add}.
We also need to remember to update \sum_v P[v] and \sum_v v\cdot P[v], but this is easy. If L is the length of P so far, then the change in \sum_v P[v] is equal to L\cdot t, and the change in \sum_v v\cdot P[v] is equal to \frac{L(L1)}{2}\cdot t.
Pseudocode
In case some details aren’t clear, the following is a pseudocode you may study:
runs = []
sumPv = 0 // represents sum(P[v])
sumvPv = 0 // represents sum(v*P[v])
L = 0 // length of P
add = 0 // the number to add to all runs
def init():
// initializes the structure
runs = [(1,1), (999999,0)]
sumPv = 1.0
sumvPv = 0.0
L = 1000000
add = 0.0
def remove_from_back(l):
// remove 'l' elements from the back
// returns the total of values removed
if l == 0: // do nothing
return 0
Let 'run' be the last run in 'runs'
// adjust
l_rem = min(l, run.l)
real_v = run.v + add // the real value
sumPv = real_v * l_rem
sumvPv = real_v * (L*(L1)/2  (Ll_rem)*(Ll_rem1)/2)
L = l_rem
if run.l > l:
run.l = l // decrement the run, and we're done.
return l * real_v
else:
pop the back of 'runs'
return run.l * real_v + remove_from_back(l  run.l) // continue removing
def add_zeroes_in_front(l):
// add 'l' zeroes in front
add (l,add) in front of 'runs'
// remember that "add" will be added to all values, so to add real zeroes,
// we need to add "add" in front
sumvPv += l * sumPv
L += l
def add_value_to_all(t):
// add 't' to the value of all runs
sumPv += L * t
sumvPv += L*(L1)/2 * t
add += t
We maintain the list of runs called “runs
” (which can be implemented as a deque, or a sliding array), and four values:

sumPv
representing \sum_v P[v] 
sumvPv
representing \sum_v v\cdot P[v] 
L
representing the length of P 
add
representing \text{add}
One noteworthy part of this code is in add_zeroes_in_front
. Instead of adding a run with value 0
, we use the value “add
”, because add
will be added to it later on. In remove_from_back
, we use real_v
during computation to update sumPv
and sumvPv
.
After performing the three operations, we can extract E[S_i] as sumvPv
.
Running time
Let’s analyze the running time of the algorithm. Type 2 and 3 operations run in O(1), while type 1 operations potentially runs linear in the number of runs, so this seems slow. However, notice that each run can only be pushed and popped at most once, so the overall running time of all type 1 operations is actually proportional to the number of runs added to the structure. But the second operation adds at most 1 run, so after N operations, there can only be at most N runs, and thus the running time for N operations is O(N). (In other words, the amortized running time of each operation is O(1).) This passes the time limit!
Precision
Although the approach is mathematically correct, it suffers from precision problems. This is because we update sumPv
and sumvPv
by adding and subtracting large numbers to it (e.g. \frac{L(L1)}{2} is quite large), so loss of significance happens. There are a few ways around it:
 You can use more significant data types. A
long double
seems to be enough to maintain precision.  You can reset “\text{add}” occasionally, by actually adding this value to all runs, then setting it to 0. The more often we do this, the more we maintain precision, but it comes at the cost of slower running time. Choose a frequency that still passes the time limit and maintains enough precision.
 You can find a different approach altogether. An algorithm that doesn’t use subtraction is desirable, because no catastrophic cancellation occurs. A solution involving segment trees is possible, though the running time is the slightly slower O(N \log N).
Time Complexity:
O(N \log N) or O(N)
AUTHOR’S AND TESTER’S SOLUTIONS:
Will be posted soon