PROBLEM LINK:
Author: Ankit Srivastava and Uttam Kanodia
Tester: Roman Rubanenko
Editorialist: Amit Pandey
DIFFICULTY:
Easy-Medium
PREREQUISITES:
Basic programming
PROBLEM:
Given a string containing ‘(’ and ‘)’ only. Find the longest such subsequence of the given string that is non-regular. Amongst all such distinct answers, output the lexicographically K^{th} amongst them.
QUICK EXPLANATION:
If the given string in not balanced, then the string itself is the largest non-regular(unbalanced) pattern. On the contrary, if the given string is regular(balanced), then all subsequnces having length (N-1) will be non-regular and there is a pattern in their lexicographic ordering.
EXPLANATION:
The problem can be broadly classified into two cases:
Case I: Given parenthesis string is not balanced, which can be checked using stack data structure described here. In this case, the largest unbalanced sub-sequence will be then the given string itself. So if K=1, then the answer will be the given string, -1 otherwise.
Case II: Given parenthesis string is the balanced one. Consider that L is the length of the given string, then there will L sub-sequences of length (L-1), and each of them will be unbalanced. This is because, in any sub-sequence, number of opening and closing parenthesis will not be the same.
How to find the number of distinct unbalanced sub sequences?
Consider a string S = “(())”, what will be number of distinct unbalanced sub-sequences? it can be easily claimed that if we delete S[0] or S[1], we will get same string i.e. “())”. So, it can be seen easily that if we delete any character from a contiguous chunk of the characters, we will get same unbalanced sub-sequence. It can be formally written as follows:
Consider the any contiguous chunk of same characters in the given string, Suppose S[i] = S[i+1] = S[i+2] =\ ....\ = S[j] =\ '(' or ')'. If we delete any character between i^{th} and j^{th} positions (both inclusive), it will produce the same sub-sequence, because deleting any of them will replace (j-i)+1 number of same characters by (j-i) number of same characters. So number of distinct sub-sequences of length (L-1) will be number of different contiguous chunks of same characters. For example, string (())((())) will have 4 distinct sub-sequence of length 9, and each of them will be unbalanced.
How to do lexicographic ordering of those sub-sequences?
Suppose that the string which we generate by deleting a character from the i^{th} chunk (0-indexed) is L_i. As our given string is balanced, it will consist of one or more opening parentheses and one or more closing parentheses alternatively (starting with opening). So, to produce L_{2n} we need to delete an opening parenthesis, and to produce L_{2n+1}, we need to delete a closing parenthesis. Now, let me claim that the lexicographic ordering:
So for string S = ''(())()((()))(())", the lexicographic order will be L_1 < L_3 < L_5 < L_7 < L_6 < L_4 < L_2 < L_0. To produce lexicographically 3^{rd} string we need to produce L_5 by deleting any of the bold characters : “(())()((()))(())”.
Proof:
Consider the proof L_{2i+1} < L_{2j+1},\hspace{2 mm} \forall \hspace{2 mm} (i < j). Others can be proved in the same manner.
Consider that the i^{th} chunk has A_i number of characters. Now consider the (A_1 + A_2 + .... A_{2i+1})^{th} character in L_{2i+1} and L_{2j+1}, it will be ‘)’ in L_{2j+1}, as i < j, and it will remain the same as in the original string, but it will be ‘(’ in L_{2i+1} and all previous characters will be the same in both of the strings. As opening parenthesis is lexicographically smaller than closing parenthesis, hence L_{2i+1} < L_{2j+1}.
Solution:
Setter’s solution can be found here
Tester’s solution can be found here