ANKPAREN - Editorial

amitpandeykgp · June 19, 2015, 4:38pm

PROBLEM LINK:

Practice
Contest

Author: Ankit Srivastava and Uttam Kanodia

Tester: Roman Rubanenko
Editorialist: Amit Pandey

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Basic programming

PROBLEM:

Given a string containing ‘(’ and ‘)’ only. Find the longest such subsequence of the given string that is non-regular. Amongst all such distinct answers, output the lexicographically K^{th} amongst them.

QUICK EXPLANATION:

If the given string in not balanced, then the string itself is the largest non-regular(unbalanced) pattern. On the contrary, if the given string is regular(balanced), then all subsequnces having length (N-1) will be non-regular and there is a pattern in their lexicographic ordering.

EXPLANATION:

The problem can be broadly classified into two cases:

Case I: Given parenthesis string is not balanced, which can be checked using stack data structure described here. In this case, the largest unbalanced sub-sequence will be then the given string itself. So if K=1, then the answer will be the given string, -1 otherwise.

Case II: Given parenthesis string is the balanced one. Consider that L is the length of the given string, then there will L sub-sequences of length (L-1), and each of them will be unbalanced. This is because, in any sub-sequence, number of opening and closing parenthesis will not be the same.

How to find the number of distinct unbalanced sub sequences?

Consider a string S = “(())”, what will be number of distinct unbalanced sub-sequences? it can be easily claimed that if we delete S[0] or S[1], we will get same string i.e. “())”. So, it can be seen easily that if we delete any character from a contiguous chunk of the characters, we will get same unbalanced sub-sequence. It can be formally written as follows:

Consider the any contiguous chunk of same characters in the given string, Suppose S[i] = S[i+1] = S[i+2] =\ ....\ = S[j] =\ '(' or ')'. If we delete any character between i^{th} and j^{th} positions (both inclusive), it will produce the same sub-sequence, because deleting any of them will replace (j-i)+1 number of same characters by (j-i) number of same characters. So number of distinct sub-sequences of length (L-1) will be number of different contiguous chunks of same characters. For example, string (())((())) will have 4 distinct sub-sequence of length 9, and each of them will be unbalanced.

How to do lexicographic ordering of those sub-sequences?

Suppose that the string which we generate by deleting a character from the i^{th} chunk (0-indexed) is L_i. As our given string is balanced, it will consist of one or more opening parentheses and one or more closing parentheses alternatively (starting with opening). So, to produce L_{2n} we need to delete an opening parenthesis, and to produce L_{2n+1}, we need to delete a closing parenthesis. Now, let me claim that the lexicographic ordering:

L_{2i+1} < L_{2j+1}, \hspace{2 mm} L_{2i} > L_{2j} \hspace{2 mm} for \hspace{2 mm} (i < j)

L_{2i+1} < L_{2j} \hspace{2 mm} \text{for all} \hspace{2 mm}(i,j)

So for string S = ''(())()((()))(())", the lexicographic order will be L_1 < L_3 < L_5 < L_7 < L_6 < L_4 < L_2 < L_0. To produce lexicographically 3^{rd} string we need to produce L_5 by deleting any of the bold characters : “(())()((()))(())”.

Proof:
Consider the proof L_{2i+1} < L_{2j+1},\hspace{2 mm} \forall \hspace{2 mm} (i < j). Others can be proved in the same manner.

Consider that the i^{th} chunk has A_i number of characters. Now consider the (A_1 + A_2 + .... A_{2i+1})^{th} character in L_{2i+1} and L_{2j+1}, it will be ‘)’ in L_{2j+1}, as i < j, and it will remain the same as in the original string, but it will be ‘(’ in L_{2i+1} and all previous characters will be the same in both of the strings. As opening parenthesis is lexicographically smaller than closing parenthesis, hence L_{2i+1} < L_{2j+1}.

Solution:

Setter’s solution can be found here
Tester’s solution can be found here

vamsi_ism · June 22, 2015, 12:12am

Can any one help me understand why this code (gist link) is giving NZEC?

Edit: A slightly modified solution (gist link) to consider bad input gave me AC.

amitpandeykgp · June 22, 2015, 12:14am

Could you tell me your approach(in short)?

vamsi_ism · June 22, 2015, 12:21am

First, I am checking if string is regular or not.
If it is regular, Kth subsequence can be obtained by removing ‘)’ from Kth group of ‘)’ (I mean continuous sequence of ‘)’) from left to right.
If there are not enough groups of ‘)’ (i.e K is greater) then Kth subsequence can be obtained by removing ‘(’ from (K - no.of ‘)’ groups)th group of ‘(’ from right to left.

luc4sdreyer · June 22, 2015, 12:27am

I’m getting NZEC but I can’t understand why. I’ve verified my answers for all possible input strings up to length 12, and thousands of random input strings up to max length. Here is my code, it uses the same method as the official solution. public static String ANKPAREN(char[] s, int k, String str) on line 87 is the method that does the actual work.

EDIT: to be clear I don’t care if my solution is correct or not, I’m just looking for a way that it can break (NZEC).

EDIT 2: I found the problem: when I read the input I have to say:

String s = scan.next();

instead of

String s = scan.nextLine();

I always use nextLine() in CodeChef competitions, and both cases work on the sample input. This means my answer was correct but failed due to bad input. This is very disappointing.

arnab_0017 · June 22, 2015, 12:29am

what will be the answer for an empty sequence as in the question it is mentioned to be regular ?

amitpandeykgp · June 22, 2015, 12:30am

Length of the given string is greater than or equal to 1. It is given in the constraints.

dollarakshay · June 22, 2015, 12:41am

Blockquote
“So, to produce L2n we need to de…”

what is L2n ? more specifically what is n ?
If n is the length of the string then how can you delete the 2n th character ?

amitpandeykgp · June 22, 2015, 12:43am

Sorry for confusion. Length of the string is N not n. L_2n is the subsequence which gets generated by deleting a character from (2n)th chunk.

ar56 · June 22, 2015, 12:46am

Please help me to understand the following sentence of yours. How did you come up with this lexicographic order?

“So for string S=′′(())()((()))(())”, the lexicographic order will be L1<L3<L5<L7<L6<L4<L2<L0. To produce lexicographically 3rd string we need to produce L5 by deleting any of the bold characters : “(())()((()))(())”."

mohit22995 · June 22, 2015, 12:47am

Seems like there is some problem with input. My submission using BufferedReader gave NZEC while one using Scanner passed.

amitpandeykgp · June 22, 2015, 12:56am

You have problem in the statement or in the proof?

ironmandhruv · June 22, 2015, 1:01am

nice editorial!

snk967 · June 22, 2015, 1:01am

The links to the solutions do not work [ Access Denied ].

luc4sdreyer · June 22, 2015, 1:06am

My solution failed due to bad input, maybe you have the same problem: http://discuss.codechef.com/questions/71904/ankparen-editorial/72019

luc4sdreyer · June 22, 2015, 1:07am

I had a similar same problem, but I only realized it after the competition

I wonder if the mods will consider rejudging this problem.

dpraveen · June 22, 2015, 2:34am

Hi luc4sdreyer,

We are investigating the issue. If such a issue is there, we will rejudge the problem.

a · June 22, 2015, 7:27am

@code_master01 & @code_note & @amitpandeykgp.: The editorial is great. Thanks for that. But let me ask you one small thing…as you have said there are 2 key ideas -1. one of chunk 2.other of
L2i+1<L2j+1,L2i>L2jfor(i<j) L2i+1<L2jfor all(i,j). How did you arrive at this? Can you please tell me the intuitive idea? What ideas did you have before setting this beautiful question/writing the editorial…hope that will be helpful to beginners like me.

ar56 · June 22, 2015, 11:55am

I’m not getting the lexicographic order you’ve mentioned.

rajan_parmar · June 22, 2015, 12:17pm

@ar56
Given sequence is S="(())()((()))(())", You know that ‘(’ < ‘)’ . So let ‘(’ = 1 ans ‘)’ = 2.

Now S = 11 22 1 2 111 222 11 22

L0 = By deleting any character from 0th pair i.e 11 .

L1 = By deleting any character from 1st pair i.e 22 . and so on.

Therefore L1 will be : 11 2 1 2 111 222 11 22

L3 = 11 22 1 111 222 11 22

L5 = 11 22 1 2 111 22 11 22

L7 = 11 22 1 2 111 222 11 2 and so on .

As it become clear that L1<L3<L5<L7 . . .

lexicographically largest will be L0 = 1 22 1 2 111 222 11 22 .

Yet I want to know @amitpandeykgp how did you come with the scenario . Amazing question .