KGP16K- Editorial

vijju123 · February 13, 2018, 2:42pm

PROBLEM LINK:

Contest
Practice

Setter- Arjun Arul
Tester- Multiple
Editorialist- Abhishek Pandey

DIFFICULTY:

MEDIUM

PRE-REQUISITES:

KMP Algorithm, Looping, Logical thinking.

(Similar pattern finding algorithm may also work. The editorial is based on KMP Algorithm for pattern finding.)

PROBLEM:

Given a pattern and text, you need to find how many times the pattern, or a power of the pattern exists in the text.

QUICK EXPLANATION:

We notice that “power” of the sequence makes it unfeasible to do KMP Search on original array. So we modify it to store the character/number and how many times it occurs consecutively. We can now do KMP search in this array. We first search for where the pattern exists, irrespective of whether its a “power” of it or not. Then we do KMP search for, if some “power” (of any sequence) exists, irrespective of what sequence it is, or if its the given sequence or not. Then, we find where both of these (i.e. pattern and power) intersect, i.e. things/indices/positions which are common in both. The number of common instances gives us the answer.

EXPLANATION:

This is a very specific question in my opinion. Its quite easy actually, if you know what KMP Algorithm, or any other pattern finding algorithm works. If you dont, then it might be a bit hard for you. Nonetheless, I will try to make it as easy as possible. Just make sure to head over and learn KMP Algorithm first!! The link is in pre-requisite.

The editorial is divided into 3 parts discussing 3 steps each helping to find the answer, and one final conclusion on how to finally calculate answers.

1. Making the array/text KMP-Searchable-

Reading the question, and the KMP algorithm, one thing is clear- We cannot directly apply KMP Search in original array. Usual KMP Search works for power 1 only.

The Problem- When power (say k) is more than 1, same consecutive characters occur k times. It will hinder our construction of lps array, and will give a big difficulty in comparing text and pattern to determine if pattern or its power exists or not.

Solution- Its actually simple. We “reduce” the array to power 1. What we do, is, that for ever character ch, we store the character and how many times it occurs consecutively after that.

Eg-

11111 is stored as 1 5
44 is stored as 4 2
100 is stored as 100 1

& etc.

We can easily do so by creating 2 arrays, or our own custom data type. Now we can do KMP Search on this array using normal KMP Algorithm!!

Note that we must similarly modify the pattern as well!!

2. KMP Search of Pattern, irrespective of power-

This step is nothing but usual KMP Search. We will check at what places the pattern exists, IRRESPECTIVE of the fact if its a valid power or not.

Eg-

If pattern is abccd , and we see in our array a 1 b 1 c 4 d 1 , we will store this in "pattern exists" because a b c d occur (Representation by alphabets just for clarity- representation by numbers in next example)

If pattern is 50 40 10 10 10 20 and out (modified) array is "(50 3) (40 2) (10 5) (20 7)", we store this location (starting index of the pattern) because the sequence "50 40 10 20" occurs in it. We will bring in  powers in next step.

So not much for this section. Its the usual KMP Search, we will just search for pattern, and store the starting index of the pattern.

3. KMP Search for Powers-

Now we bring in powers. This section needs to address some questions.

First, how do we find power?

We already stored how many times a character occurs in text, along with the character, so this will not be problem. The problem will be, finding what power is it. Is it to the power 1? to the power 50? How do we accurately check that? This is actually more complex than one thinks and has an edge case in it. Hence, its divided into 2 parts.

a. Sequence has more than 4 elements-

We have one problem here. Say our sequence was 1 2 3 4 and our text is 1 1 1 2 2 3 3 4 4 4 4 4. How do we check that the text has the power 2 of the sequence?

In above example, our modified array will look like {(1 3},{2 2},{3 2},{4 5}). If we start from second 1, and end at second 4, we will find the pattern. What I want to say is, we cannot rely on starting and ending element to determine the power. All characters of starting and ending element may not be included in pattern! Eg- Here 4 occurs 5 times but only two fours are included in pattern.

However, note that middle elements are always included in pattern wholly. This will help us finding the power. What we will now do, is that, we will compare ratio of occurrence of second and third element in text to second and third element in sequence. As an example, in above example, we will check ratio of element 2 and element 3 in sequence and in text. If its equal, then it represents a valid power. However, checking ratio will involve floating point and decimals. So lets tinker the expression a little as-

Let Text(i)= Current Frequency element i occurs in text, and Pat(i)=Current Frequency element i occurs in pattern. The proposed formula is-

Text(i+1)/Text(i)==Pattern(i+1)/Pattern(i)

A simple cross multiplication and we get-

\implies Text(i+1)*Pattern(i)==Pattern(i+1)*Text(i).

Thus, we get rid of any decimals. In a gist, to see if the sequence in text is actually a power of given sequence, we check that ratio of Text(i+1)/Text(i)==Pattern(i+1)/Pattern(i) for all i such that i is not the starting and ending element of the pattern.

b. When sequence has only 3 elements-

This is a bit tricky. We dont have 2 middle elements to find if the sequence in text is a valid power or not. So, now what?

Turns out, we can easily deal with this case as well. All we need to make sure that we have “sufficient” starting and ending elements, which can combine with frequency of middle element to give a power.

Eg- if original sequence is “1 2 2 3” and original text is “1 1 1 2 2 2 2 3 3” , we see that middle element 2 occurs 4 times in text, and 2 times in sequence. This hints a power of 2. Now we see if we have sufficient 1s and 3s , as such to satisfy power 2 of the sequence. In this case, we clearly have that.

So if we have only 3 elements in the sequence, the test in part a. wont hold good, we just check for pattern and use if-else for checking if we have sufficient starting and ending elements or not. You can look at eidtorialist’s solution for details.

4. Concluding the final answer-

Now, we have positions where we see a valid pattern exists. Then, we also have positions where a valid power exists. We need to find number of positions where a valid pattern and a valid power exists- which is nothing but number of common elements in then.

So, we take the elements common in then, do checks like-

Checking for power - If 2 middle element of pattern is divisible by middle element of sequence found in text. Only once is sufficient- must be done especially when pattern has only 3 distinct elements/sequences.
Checking we have sufficient starting and ending elements- again, must be done especially for pattern containing only 3 distinct sequences.

And thats how we are done with another question

SOLUTION:

Editorialist’s Solution

TimeComplexity=O(N)
SpaceComplexity=O(N)

CHEF VIJJU’S CORNER:

This question seemed to be a bit on implementation side as well. But honestly, even at World Finals level of ICPC, we do get some implementation based question, so I will suggest better practice and suffer now than suffer then.
One interesting thing is - The worst case complexity of KMP Search is O(N). What if I ask you to prove it? Think a while, its an interesting one! (Hint: See how many comparisons made in worst case for different values of M and N, find that the maxima is at value \sqrt{N} )
As usual Arjun Arul’s question give me a hard time passing XD. Obligatory Screenshot-