PROBLEM LINKS
DIFFICULTY
MEDIUM-HARD
PREREQUISITES
Knuth-Morris-Pratt String Matching Algorithm, Dynamic Programming, Simple Probability
PROBLEM
Chef likes to toss a pizza many times. The sides of the pizza are denoted by F (the front side) and B (the back side). Each side has the same probability of showing up after each toss. A sequence of tosses can be represented by a string over alphabet {F, B}, where each character represents the side that shows up in the corresponding toss.
You are given a string S1 of length N. This is called the lucky sequence. You are also given a string S2 of length M. This string represents the sequence of tosses Chef has made today. Count the expected number of additional tosses Chef must make until the string representing the last N tosses is equal to S1. If Chef has already made a lucky sequence today, the answer is 0.
QUICK EXPLANATION
Use dynamic programming approach. The state is E[k] = the expected number of additional tosses when the last k tosses match the first k tosses in the lucky sequence.
EXPLANATION
For simplicity, we will assume that the strings S1 and S2 has been decoded.
We will use KMP String Matching Algorithm extensively in the solution. Specifically, we will use the so-called “failure function” feature from KMP. So, let’s recall how failure function works.
Consider a string S of length N. For convenience, we will use 1-based indexing. We will use notation S[a…b] to denote the contiguous substring of S from the a-th character to the b-th character, inclusive.
Let’s introduce fail[] array. fail[i] is equal to k, where k is the maximum integer such that S[1…k] = S[i-k+1…i]. In other words, fail[i] is the length of the longest prefix of S[1…i] that is also a suffix of it. This failure function table can be computed in O(N). The pseudocode is as follows.
fail[1] = 0 k = 0 for i = 2; i <= N; i++: while k > 0 and S[k+1] != S[i]: k = fail[k] if S[k+1] == S[i]: k = k + 1 fail[i] = k
Note that in some KMP references, k must be less than i (i.e., both the prefix and suffix must be proper). For the purpose of this problem, we will not use that restriction.
Now, let’s solve the problem. First, apply the failure function to S1 as fail_S1[]. Then, apply the failure function to S2 as fail_S2[] as well, but by comparing its characters to S1 as follows. In other words, fail_S2[i] = the maximum integer k such that S1[1…k] = S2[i-k+1…i].
k = 0 for i = 1; i <= M; i++: while k > 0 and S1[k+1] != S2[i]: k = fail_S2[k] if S1[k+1] == S2[i]: k = k + 1 fail_S2[i] = k if k == N: k = fail_S2[k]
There are two possible cases.
Case 1. The string S1 is a substring of S2. We can check this by utilizing the failure function we just built and use KMP to find the match. More specifically, we check whether there is integer k such that fail_S2[k] = N. The answer for this case is 0, since Chef has already have the lucky sequence and therefore does not have to make any additional tosses.
Case 2. The string S1 is not a substring of S2. We will use dynamic programming approach in this case.
Let E[k] = the expected number of additional tosses Chef must make if the string denoting the last k tosses of the current sequence is equal to S1[1…k]. The answer to the original problem would be E[fail_S2[M]].
The base case of this DP solution is E[N] = 0, i.e., if the string denoting the last N tosses of the current sequence = S1[1…N] = S1, then Chef has made the lucky sequence and he should stop tossing pizzas.
Let’s define the recurrence relation for E[k], where k < N. Chef must make at least one additional toss. Let C be the side of the pizza that shows up on the next toss. There are two cases:
- C is equal to S1[k+1]. Therefore, the expected number of additional tosses Chef must make is 1 + E[k+1].
- C is not equal to S1[k+1]. Let q be the maximum integer such that S1[k-q+1..k] + C = S1[1..k+1]. The value of q can be traced from fail_S2[] as follows:
q = k while q > 0 and S1[q+1] != C: q = fail[q]
The expected number of additional tosses Chef must make is 1 + E[q+1]. For convenience, let rematch[k] = q+1. Note that rematch[0] = 0; this is a special case.
Therefore, E[k] = (2 + E[k+1] + E[rematch[k]]) / 2. We can rewrite this equation into:
2E[k] = 2 + E[k+1] + E[rematch[k]]
E[k+1] = 2E[k] - E[rematch[k]] - 2
Because both k and rematch[k] are not greater than k+1, we have a noncircular function definition here. We will solve the equations by expressing each E[k] as P(k) + Q(k)E[0]. We will solve for P(k) and Q(k).
First, because E[1] = 2 Ă— E[0] - E[0] - 2, then P(1) = -2 and Q(1) = 1. Let t = rematch[k]. For k > 1,
E[k+1] = 2E[k] - E[t] - 2
= 2(P(k) + Q(k)E[k]) - (P(t) + Q(t)E[t]) - 2
= 2P(k) + 2Q(k)E[0] - P(t) - Q(t)E[0] - 2
= (2P(k) - P(t) - 2) + (2Q(k) - Q(t))E[0]
Therefore,
P(k+1) = 2P(k) - P(t) - 2
Q(k+1) = 2Q(k) - Q(t)
Having able to calculate P() and Q(), now we have this ultimate equation:
E[N] = P(N) + Q(N)E[0]
and since we know that E[N] = 0,
E[0] = -P(N) / Q(N)
(the division should be taken as inverse multiplication modulo 1,000,000,007.)
Finally, the answer to this problem is E[fail_S2[M]] = P(fail_S2[M]) + Q(fail_S2[M])E[0].
The time complexity of this solution is O(N).
Once more, do not forget to perform all calculations modulo 1,000,000,0007.
SETTER’S SOLUTION
Can be found here
TESTER’S SOLUTION
Can be found here