STRDIG - Editorial

utkarsh_lath · August 19, 2013, 12:35am

Problem Link:

Practice
Contest

author: Tasnim Imran Sunny
tester: Roman Rubanenko
editorialist: Utkarsh Lath

Difficulty:

Easy-Medium

Pre-requisites:

ad-hoc, implementation

Problem:

A very long string P is formed by concatenating decimal representation of all numbers from 1 to 1e9. Thus

P = “123456789101112131415161718192021222324252627282930313233343536 … 9999999991000000000”

You are given some query strings, and for each query string Q, you need to find if it is a substring of P.

Quick Explanation:

The solution is based on following ideas:

All strings of length upto 9 appear somewhere in P.
Q does appear in P iff, for some integer a, Q is of the following form and 1<=a-1<a+l<=10^9

Q = (some suffix of str(a-1)) + str(a) + str(a+1) + str(a+2) … str(a+l-1) + (some prefix of str(a+l))

no of digits in ‘a’ can be at most 10, and str(a) starts somewhere among the first 9 positions in Q.
If Q is long enough, there can be at most 45 possible values of a. For each value of a, test if Q is of the above form.
If Q is not long enough, we still know a prefix of str(a) and a suffix of str(a-1). Check if there is a compatible ‘a’.

Explanation:

If |Q| < 10, then Q is always a substring of P. This is because, if

|Q|<=8, then “1”+Q is decimal representation of a number between 1 and 1e9, so it always appears in P.
|Q| = 9 and first character of Q is not ‘0’, then a = int(Q).
|Q| = 9 and all characters in Q are ‘0’, then Q appears in “1000000000”, hence in P.
|Q| = 9 and first non-zero character of Q is at position i. Then a = int(Q[i, 9]) * 10ⁱ + 1.

The problem is mainly implementation heavy, so here is psudo code for solution.



Q = input
n = |Q|  
if n<10 return true;
if Q=="1000000000" return true;
// Q spans over at at least two integers, if at all  
for offset = 0 to 9              // starting pos of str(a)
    for len = offset+1 to 10     // length of str(a)
        if solve(offset, len)
            return true
return false

As one can see, the above code tries to solve the problem for every possible starting position of str(a), and length of str(a). Here is how to solve if Q is “long enough”



bool solve(offset, num_digits)
    if offset + num_digits > n
        return solve2(offset, num_digits) // discussed later  
    if offset>=n or Q[offset]=='0'        // 'a' cannot have leading 0's
        return false  
    // str[a, b] is substring of str starting at position a(inclusive) and ending at b(exclusive)  
    a = int(Q[offset, offset+num_digits])
    prev = str(a-1)
    // if a-1 is 0, then offset would be 0 as well (offset< num_digits), so skipped it  
    if Q[0, offset] != prev[prev.length-offset, prev.length]
        return false  
    while offset < n
        if Q[offset, min(offset+num_digits, n)] != str(a)[0, min(n-offset, num_digits)]
            return false;  
        if a>1000000001
            return false
        a++
        offset += num_digits
        num_digits = str(a).length  
    return true

In above code, if the first iteration of while loop is skipped, because it would be “useless”, then make sure that a is 64 bit, otherwise an overflow might happen, as for

Q = “4294967297” + “0000000002”

Now, it remains to show how to solve if Q is “not long enough”.



bool solve2(offset, num_digits)
    if num_digits == 10
        return (Q[0, offset] == ("999999999")[0, offset])    and
               (Q[offset, n] == ("1000000000")[0, n-offset])
        // 'a' can be 10 digit only if a is 1e9  
    if Q[offset]]== '0'             // 'a' has leading 0's
        return false  
    suffix_of_a = str( 1+ int(Q[0, offset]) )[0, offset]
    suffix_len = suffix_of_a.length  
    prefix_of_a = Q[offset, n]
    prefix_len = prefix_of_a.length  
    common_len = suffix_len + prefix_len - n
    return prefix_of_a[prefix_len - common_len, prefix_len] == suffix_of_a[0, common_len]

Solutions:

You might find that Tester’s and editorialist’s code are shorter and cleaner than setter’s code.

Setter’s solution
Tester’s Solution
Editorialist’s Solution

bugkiller · August 19, 2013, 12:40am

Is it an easy-medium problem?

sobhagya · August 19, 2013, 1:36am

How can this be Easy-Medium problem, when only 18 solve this problem out of 1485 with 4.33% submission accuracy. Even, the top coders failed to submit at first go.

xellos0 · August 19, 2013, 1:39am

I’m surprised at the difficulty estimate, too. It’s true that the problem doesn’t require as much knowledge as DIVQUERY, but there’s so much casework that it’s much harder - you need to sweat out a lot of blood in order to make it work correctly.

phantom11 · August 19, 2013, 1:52am

Hats off to the problem setter. How could you think about the problem !!!

tripshock · August 19, 2013, 11:40am

Can we get the test cases for this problem? For that matter, why don’t you publish the test cases and solutions for all problems as part of the editorial?

utkarsh_lath · August 19, 2013, 5:13pm

@xellos0, @sobhagya, @bugkiller
We were ourselves quite surprised by the fact that only few people could solve this problem. We were expecting many more people to nail this one. About difficulty estimate, I think everyone would agree it is less harder than DFSGRID(which is medium-hard-ish) which has many more corner cases. So this problem can be at most medium.

Problems are rated by how difficult the ideas involved were, and the editorial gives a very clear demonstration that if you think neatly, there aren’t many corner cases. That said, the fact that medium problem was solved by more people is because of generally people have more experience in algorithms, and are lazy towards implementation heavy problems. That does not mean the implementation problems are hard by default.

utkarsh_lath · August 20, 2013, 10:04am

Solutions are provided for all problems, and editorials give away whatever corner cases are there. This information should be sufficient and you should not need the test cases in your hand to figure out what’s wrong with your code. You can always generate some random cases and compare the output of setter’s code with yours.

maweaver · August 20, 2013, 8:05pm

After rewriting several times due to many WAs, here is the method I ended up using, it is similar to the editorial method but I think has a few less edge cases.

I start at the beginning, and treat the first m digits of q as the last m digits of a k-digit string. I keep a working value as a string instead of an int, with ‘?’ for any unknown digits. The algorithm looks like this:

for numDigits = 1 through min(10, len(q)):
    for m = 1 through numDigits:
        cur = q[0:m], pre-padded with '?' so that len(cur) = numDigits
        if !valid(cur):
           continue;

        offset = m
        matches = true
        while offset < len(q):
            inc(cur)
            if !match(cur, offset) || !valid(cur):
               matches = false
            offset += len(cur)
        if matches:
            // Done, answer is YES

This uses a few sub-routines whose implementation is straight-forward:

inc : Increment the string, stopping when it hits a ‘?’
match : Check the string against q, assigning any ‘?’ to its corresponding value in q
valid : Check that the string does not start with 0 and is <= 10^9

The only special case is all zeroes, it will be marked as invalid by the valid() routine but is valid as long as its length is <= 9 (and is always invalid otherwise).

utkarsh_lath · August 20, 2013, 9:03pm

I think you forgot to update(replace ‘?’ with proper values) cur when comparing it with Q[m:] for the first time.

harrybauer247 · August 21, 2013, 4:08pm

I agree that this problem is harder then easy/medium if even very experienced coders can’t solve it at first.