KOL1510 - Editorial

kevinsogo · December 26, 2015, 6:31am

PROBLEM LINK:

Author: admin
Tester: Kevin Atienza
Editorialist: Kevin Atienza

PREREQUISITES:

Shortest common supersequence, backtracking

PROBLEM:

Given K \le 16 binary strings s_1, \ldots s_K (with alphabet {B, G}), each of length up to 8, find the shortest common supersequence. If there are multiple solutions, output any.

QUICK EXPLANATION:

Just check each binary string up to length 16 in increasing order of length, and output the first one containing all K strings as subsequence. This works because every string of length up to 8 is a substring of the following string:

BGBGBGBGBGBGBGBG

EXPLANATION:

The small bounds of the problem suggest a brute-force approach may pass. A simple brute-force solution is as follows: try all possible strings in increasing order of length and find the first one having all K strings are subsequences. However, we don’t know how long such a solution might run. It may happen that this approach takes a very long time, so we need to find some upper bounds.

The most obvious upper bound is just the sum of the lengths of all strings, because a common supersequence can easily be built by appending all K strings together. However, obviously this is a very loose upper bound, and not a very useful one: the total length of all strings may reach 16\times 8 = 128, and enumerating all strings of length up to 128 is basically impossible.

Let’s take a look at small case. Take all K = 16 binary strings of length 4. Let’s find a shortest common supersequence. The first string that a brute-force solution gives is BGBGBGBG (or GBGBGBGB depending on the order you checked the strings.) Similarly, taking all K = 32 binary strings of length 5 gives the string BGBGBGBGBG (or GBGBGBGBGB). It seems that we have a pattern here: All binary strings of length L seems to be contained in the BGBGBG... string of length 2L. What’s so special about these alternating strings?

Well, it turns out that we can easily extract each length-4 string from BGBGBGBG. First, group the characters this way: (BG)(BG)(BG)(BG). Now, an arbitrary length-4 string can now be extracted as a subsequence of this by simply picking the correct letter from each group!

In fact, this gives us a better upper bound for our brute-force solution: 2M where M = \max_i |s_i|. In the current problem, this means that the answer is at most 16 characters, which leaves us with a much smaller set of strings to check, in fact so small that it makes the brute-force solution described above very fast!

The algorithm above runs in O(KM4^M), because it takes O(KM) time to verify each of the O(2^{2M}) strings of length up to 2M. With backtracking, this can be reduced to O(K4^M), though such a solution isn’t necessarily faster due to implementation overhead.

Implementation

Let’s describe some implementation details.

The first is about enumerating all strings of a certain length, say L. One can use backtracking for this, although there’s another way using bitmasks. We can think of a binary string as just a bit string of length L, or a nonnegative integer from 0 to 2^L-1, where the i th bit is on iff the i th character of the string is a G. Converting a bit string into a “BG” string, and vice versa, can be done straightforwardly with bitwise operations, but this makes enumerating all strings of length L easier: Simply loop through all values from 0 to 2^L - 1! (The latter can be written using bitwise shift operators as (1 << L) - 1.)

In fact, we can do away with “BG” strings altogether and just use bit strings all throughout! Depending on your implementation, you may gain some speedup from doing this.

You also need to be careful with the builtins and library functions you use. For example, in you might find some speedup by using char arrays instead of using std::string or Java’s String class.

Time Complexity:

O(K 4^{\max |s_i|})

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester