Problem Link
Author: Ivan Fekete
Tester: Misha Chorniy
Editorialist: Bhuvnesh Jain
Difficulty
MEDIUM
Prerequisites
Dynamic Programming, Tries, Hashing
Problem
You are given a empty string S and a taget string T. Youa re also provided with N patterns p[i]. You may perform any number of operations as follows:
- Insert character ‘x’ to the left of the string L. Cost: cl[x] * |L|
- Insert character ‘x’ to the right of the string L. Cost: cr[x] * |L|
- Insert string ‘p[i]’ to the left of the string L. Cost: kl[i] * |L|
- Insert string ‘p[i]’ to the right of the string L. Cost: kr[i] * |L|
Find the minimum cost to make string T from S using the above operations. Note that |L| denotes the current length of the string L, before the operation is performed.
Explanation
The problem statement clearly lists some transitions and cost related to it and we would like to minimise the overall cost. So, this hints towards a dynamic programming approach.
Let us define dp[i][j] as the minimum cost to make the string T[i..j] using the above operations. The transitions are exactly the same as given in the statement. Let L = (j - i + 1).
- dp[i+1][j] + cl[T[i]] * (L - 1)
- dp[i][j-1] + cr[T[j]] * (L - 1)
- dp[i+x][j] + kl[y] * (L - x), where x is length of any string p[y] such that T[i..(i+x-1)] = p[y].
- dp[i][j-x] + kr[y] * (L - x), where x is length of any string p[y] such that T[(j-x+1)..j] = p[y].
dp[i][j] is clearly the minimum over all possible transitions. The time complexity of the above logic is O(|S| * |S| * N * |P[i]|). This is enough to pass the first subtask.
But there is a small caveat while implementing the above solution. The most general way to implement dp solution is as follows:
for i in [0, |S|-1]:
for j in [i, |S|-1]:
calculate dp[i][j] using all transitions
But the above implementation is wrong as the transitions for dp[i][j] require us to have the correct values of all dp[x][y] calculated where i <= x <= y <= j. So, we want to calculate the dp of smaller substrings first and then extend it to bigger substrings. A working pseudo-code for the first subtask is as follows:
for len in [1, |T|]:
# First find for subtrings of length 1, then 2 and so on.
for i in [0, |T|-len]:
j = i + len - 1
res = dp[i+1][j] + cl[T[i]] * (len - 1)
res = min(res, dp[i][j-1] + cr[T[j]] * (len - 1))
for y in [1, n]:
x = |p[y]|
if x > len:
continue
# attach left
match = 1
for l in [0, x-1]:
if p[y][l] != T[i+l]:
match = 0
if match:
res = min(res, dp[i+x][j] + kl[y] * (len - x))
match = 1
for l in [0, x-1]:
if p[y][l] != T[j-x+1+l]:
match = 0
if match:
res = min(res, dp[i][j-x] + kr[y] * (len - x))
dp[i][j] = res
print dp[0][|T|-1]
Full Solution:
Note that in the above solution, the only time consuming step is the transitions from all possible strings p[y]. Since we know that the maximum length of any string p[y] is atmost 100, we see that dp[i][j] would take atmost 102 transitions (2 for characters). If we can do each transition in O(1) then we can completely solve the problem.
The first thing to observe is that we have atmost 100 different strings which could be appended to the left or right while calculating the cost of a substring. Instead of iterating over all possible patterns p[y], we need to efficiently find the cost of making that substring (if possible) from the set of patterns. This requires efficient string matching from a set of patterns. Below are 2 different approaches to it:
Author’s solution
Make 2 tries of all the patterns, one for the left side transitions and another for the right side transitions. For each node in the trie also store the minimum cost of making that string. For all nodes, it is initially INF, some large value. Only for nodes where a pattern p[y] ends, the cost is set to the minimum of kl[y] or kr[y] over all possible patterns ending there (Note that there can be multiple same patterns but with different cost). To efficiently search for the cost of the substring, we simply traverse the trie and after each transition update the dp values.
The overall time complexity of the above solution is now O(|S| * |S| * \max_{i=1}^{i=N} |P[i]| + \sum_{i=1}^{i=N} |P[i]|), where the first part is for dynamic programming calculating and the second part is for building the trie. The space complexity of the solution is O(|S| * |S| + |N| * |P| * 26), where the first part is for the dynamic programming table and second is for the trie of patterns.
Once, you are clear with the above idea, you can see the author implementation below for help.
Editorialist’s solution
We can use hashing for string comparison as well. So, we hash all the patterns and store the cost of them in a map. We also store the hash of text T as well. Now, we calculate the cost of every substring as follows:
# H[S] denotes the hash of stirng S.
# mp[H[P]] stores the cost for pattern P.
for i in [0, |S|-1]:
for j in [i, |S|-1]:
w = H[S[i..j]]
if w is present in mp:
cost_l[i][j] = mp[H[P]].first
cost_r[i][j] = mp[H[P]].second
else:
cost_l[i][j] = INF
cost_r[i][j] = INF
The time complexity for the above approach is O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * \max_{i=1}^{i=N} |P[i]|), where the first 2 parts are for building the hashes, the third for the above pseudo-code (\log{N} for searching the map) and last part for dynammic programming approach. The space complexity of the approach is O(|S| * |S|) which is required for storing the dynammic programming table and also the cost of every substring.
NOTE: For the hash based solution the string lengths are small and also the number of strings to consider i.e. N are large so there can be chances of collision. So, it is preferred to use double hashing in this problem. For details refer to this blog on codeforces.
Once, you are clear with the above idea, you can see the editorialist implementation below for help.
Feel free to share your approach, if it was somewhat different.
Time Complexity
O(|S| * |S| * \max_{i=1}^{i=N} |P[i]| + \sum_{i=1}^{i=N} |P[i]|) or O(\sum_{i=1}^{i=N} P[i] + |S| + |S| * |S| * \log{N} + |S| * |S| * \max_{i=1}^{i=N} |P[i]|)
Space Complexity
O(|S| * |S| + |N| * |P| * 26) or O(|S| * |S|)
AUTHOR’S AND TESTER’S SOLUTIONS:
Author’s solution can be found here.
Tester’s solution can be found here.
Editorialist’s solution can be found here.