Problem Link
Author: Stacy Hong
Tester: Misha Chorniy
Editorialist: Bhuvnesh Jain
Difficulty
HARD
Prerequisites
Branch and Bound, Dynamic Programming, Big Integers, Hashing, DFS/Recursion
Problem
You are given a string S and integer N. So need to place some ‘+’ symbols in the strings S so that the value of the expression equals N.
Explanation
Subtask 1: N < 1000000
A simple knapsack like dynamic programming is sufficient to pass this subtask. The dp state is as follows:
A recursive pseudo-code for the above logic is given below:
def solve(int idx, int remain):
if idx == len(s):
return remain == 0
if dp[idx][remain] != -1:
return dp[idx][remain]
dp[idx][remain] = 0
num = 0
for i in [idx, len(s) - 1]:
num = num * 10 + s[i] - '0'
if (num > remain):
break
if (solve(i + 1, remain - num)):
PLUS[i] = 1;
return dp[idx][remain] = 1
return dp[idx][remain];
solve(0, N) # N = integer value of string s in input
for i in [0, len(s) - 1]:
if i and PLUS[i]:
print '+'
print s[i]
The above code works in time complexity O(|S| * N) where |S| is the length of string S and N is the given integer. The space complexity is also same. Note that Codechef servers allow programs using such large memory (around 1G) to pass but you can you bitset or other equivalent structures to reduce the memory of your program too. See the linked solution below for subtask 1 for more details.
The solution for this subtask also exists using hashing and dynamic programming.
Subtask 2: N < |S|
Adding 2 numbers of sizes x and y leads to final result having size at most max(x, y) + 1. This is easy to prove. The size will be max(x, y) if there is no carry-over in the end otherwise it will increase by 1. Extending the above logic to addition of m number, we can conclude that if the numbers have lengths x_1, x_2, \cdots, x_m, then the length of final result is bounded by (max(x_1 + x_2 + \cdots + x_m) + m - 1). You can easily prove it using induction.
Note that number of ways to partition the string S into different expression is exponential. But using the above observation, you can the conclude the following fact:
Given a string S and integer N, having the length as n, there is very less number way to achieve the target N if n is comparable to |S|. This is because most of the partitions will either have the maximum number in them as too low. Equivalently, if the number of partitions we put in S is large, then achieving a larger target N is not possible. This hints that for sufficiently large integers, the greedy technique of choosing a larger size partition and checking if the remaining part can add up to desired N will run very fast in practice as the number of iterations will not be large. For example: S = 114390211, N = 43915
Considering greedily large size partition of S such that their value is less than N, we have the following numbers: [11439, 14390, 43902, 39021]. (Note that the size of this selected numbers can be at most (|S| - n + 1).) Let us greedily start with 43902. The remaining parts of S are (11, 11) and the required sum now is (43915 - 43902) = 13. Note that there are 2 ways to achieve it (11 + 1 + 1) \text{ or } (1 + 1 + 11). As you can see, the number of iterations were less. The example is just a short one to make to understand how greedy recursion might behave for larger test cases.
But the other case where the integer N is small but |S| is very large, there can be large number of ways to achieve the desired result. For this, we have already seen that a dynamic programming solution already exists (Subtask 1). Trying the above greedy approach can be very bad in this case, a simple example being (S = 99999999999, N = 108).
With some of the above ideas, we design a simple branch and bound based algorithm for the problem:
range_tried = [0 for i in [1, len(S) - 1]]
# Find all range in S such that their value is less than N and sort
# them in decreasing order of their value. Let it be called "RANGES"
def recurse(range_idx, remain):
if remain < 0:
return false
if remain < LIMIT: # LIMIT ~ 100000
use dp based subtask_1 solution
else:
for i in [range_idx, len(RANGES) - 1]:
if conflict(current_range, range_tried):
# If current range we are trying conflicts with one
# already tried in recursion before.
continue
X = value of integer for RANGES[i]
# Update range_tried
if recurse(range_idx, remain - X):
return True
# Update range_tried to old value
return False
Note the above is a simple solution based on the initial observations. But do we need to really check for all possible ranges? Can we decide greedily at some point that given range can never result in an answer as N, i.e. Say we have selected some ranges, can we say that with the remaining ones we can never achieve the target N without checking all possible partitions or greedily checking large number ranges.
Actually, given initial choice of our ranges, we can bound the maximum number we can achieve with the remaining ones. A simple check which ignores the digits of remaining parts of S and considers all of them to be 9 and finds the maximum value possible is a good starting point. If this value is already less than N, then we can simple prune our solution. Even stronger checks based on actual values of digits in string S can lead to better pruning. So, the loop in the above code modifies as follows:
for i in [range_idx, len(RANGES) - 1]:
if conflict(current_range, range_tried):
# If current range we are trying conflicts with one
# already tried in recursion before.
continue
MAX_POSSIBLE = get_max(ranges_set)
if MAX_POSSIBLE < N:
# As we iterate in decreasing order of numbers, the
# MAX_POSSIBLE can only decrease as 'i' increases
break
X = value of integer for RANGES[i]
# Update range_tried
if recurse(range_idx, remain - X):
return True
# Update range_tried to old value
Another thing we can do is to remove the early exit of recursion where a check is based on “remain < 0”. This can be easily done by directly starting from ranges such that value of considered numbers is always less than “remain”. This is again helpful as after greedily choosing a large size partition, it is possible in most case the other large size partitions should be ignored in further recursion either due to conflicts in common ranges or “remain” decreasing too fast to become less than 0. For this, a simple binary search can help us to find the first index in “RANGES” from where we should begin our search. This is possible as we had initially stored our ranges in decreasing order of the value of integers they represent.
With the above ideas, the recursive solution based on branch and bound works quite fast in practice. A small analysis of the time complexity is as follows:
-
N < 100000: Assuming we set LIMIT to this value in above pseudo-code. A dynamic programming based solution is run with complexity O(|S| * N). Again to optimise this, we can use recursive version instead of iterative version. This is because will work in exactly the above complexity but in recursive solution, most of the states will be unvisited due to our initial choices of the ranges. This will significantly reduce the constant factor in your code.
-
For most of the other cases, as N becomes larger i.e. size n reaches closer to |S|, the possible solutions reduce significantly as explained before.
A more detailed analysis of time complexity is will available soon.
Ashmelev solution (Fastest in the contest): Branch and Bound with strong bound checking
Click to view
The solution is based on the following two ideas:
- Let’s consider all possible partitions of S. Of course, it is extremely slow, but if we iterate from big substrings to small, the required sum (N) will be decreased fastly. I used recursion to find all possible partitions: function
int go(int pos, string cur, int c, int csum)
where “pos” - number of the last taken substring (all substrings were sorted in decreasing order), “cur” - remaining necessary value (initially - N), “c” and “csum” - some useful variables.
The same function - “goSmall” - used for small numbers, where “cur” fits long long.
- This recursion is very slow still, it is necessary to add an additional check whether the solution may exist and exit the recursion in the bad case. I implemented function “tooBig” (and for small numbers - “tooBigSmall” ) that checks whether remaining number “cur” is too big and cannot be the sum of the remaining substrings, including that we may only use substrings starting from “pos”. This function should be quite smart to make the recursion algorithm really fast. I used DP approach to find the largest possible number we may obtain from the remaining substrings - “getans2” calculates this number approximately for string “cur” and “getans3” calculates this number exactly for small (long long) “cur”.
Tips for Big integer library (mostly for C++ users)
Languages like python and java already have big integer library implemented in them. But C++ users need to implement the same for their usage in this problem. A small tip is to store the numbers as groups instead of single digits. For example: S = 123456789123456789. Below are 2 possible ways to store S:
This helps to perform operations on base different than 10 and reduces the constant factor of your program. Generally, the base is chosen as {10}^{9} so that all possible operations like (+, -, * , /) fit inside the datatypes provided by the language. You can see setter’s library for example.
Time Complexity
To be described in detail later.
Space Complexity
To be described in detail later.
Solution Links
The solution for subtask 1 can be found here
Setter’s solution can be found here
Ashmelev’s solution can be found here