PROBLEM LINK:
Author: Sergey Kulik
Tester: Vlad Mosko
Editorialist: Pushkar Mishra
DIFFICULTY:
Easy-Medium
PREREQUISITES:
None
PROBLEM:
Given a string S formed of As and Bs, we need to output a string T that is a subsequence of S and appears only twice in it.
EXPLANATION:
Subtask 1
For this subtask, the length n of string S is less than or equal to 3. That’s a fairly small number. We can easily consider all the cases:
- If the string S is of length 1 then the answer is -1.
- If the string S is of length 2 then the answer is -1 if the two letters in it are different. If they are the same then the letter is the answer. For example, if S = AA then answer is A.
- If the string S is of length 3 then the answer is -1 if all letters are the same. If not then the letter which appears twice is the answer.
Clearly, the complexity is \mathcal{O}(1) per test case.
Subtask 2
For this subtask, the length n of string S can be up to 10. The previous algorithm won’t work anymore. We can use bitmasking though. Let us treat As as 0s and Bs as 1s. We modify the string S to S' which contains 0s and 1s instead of As of Bs. Now, we iterate over each subsequence of S'. There can only be at max 2^{10} subsequences since n can be at max 10. Each subsequence represents a binary number. We convert it to decimal. This decimal must be between 0 and 1023 inclusive. We can keep an array counter[0..1023] to count the number of occurrences of each subsequence. So when a subsequence that is binary representation of number i appears, we add 1 to counter[i]. If there is a subsequence with exactly 2 occurrences, i.e., if counter[i] = 2, we output its corresponding string containing As and Bs.
The complexity of this algorithm is \mathcal{O}(2^n) per test case where n is the length of the string S.
Subtask 3
For this subtask, the length of string S can be as large as 5000. Clearly, the previous two approaches won’t work. We need to make some crucial insights into which subsequences can be possible candidates for the answer.
Let us assume that for a given string S, a string T only appears as a subsequence only twice; once at S \{ i_1, i_2, \dots, i_{|T|} \} and other at S \{ j_1, j_2, \dots, j_{|T|} \} such that i_1 < j_1. But if both of these subsequences form T then so does S \{ i_1, j_2, \dots, j_{|T|} \}. This is a contradiction. In other words, a string T can’t appear ONLY twice in this format wherein the two appearence differ by some position.
That eliminates many cases. The only ones left are these:
-
When there are only 2 occurrences of a letter in the given string S. For example, S = ABAABAAAA. In this case, the answer is simply that particular letter, i.e., B in this case.
-
When there exists a double occurrence of a letter which is surrounded by the other letter. For example, S = AABBAABABBA. In this case, BB appears at the index 2 (or 8; we can consider any) and has As before and after it. In this case, the answer is the entire string excluding one out of the two occurrences of B, i.e., AABAABABBA. This can’t work for triplets or more because {N \choose k} \neq 2 for N \geq 3.
So, all we need to do is iterate over the string and either determine if a letter only appears twice or look for a doublet surrounded by the other letter, i.e., the one not forming the doublet. This can be done in one scan of the string S. Thus, the complexity is linear in the size of the input string.
Editorialist’s program follows the editorial. Please see for implementation details.
OPTIMAL COMPLEXITY:
\mathcal{O}(N) per test case where N is the size of input string S.