KCOMPRES - Editorial

likecs · August 13, 2018, 3:01pm

Problem Link

Practice

Contest

Author: Bhuvnesh Jain

Tester: Mark Mikhno

Editorialist: Bhuvnesh Jain

Difficulty

MEDIUM

Prerequisites

Binary Search, Greedy, Segment trees.

Problem

You are given a sequence of integers A_1, A_2, \dots, A_N. For an integer K, let’s define a K-compressed sequence B_1, B_2, \dots, B_N as follows:

for each valid i, B_i is a positive integer.
for each valid i, if there are exactly X numbers smaller than or equal to A_i in the subsequence A_{\mathop{max}(1, i-K)}, \dots, A_{\mathop{min}(N, i+K)}, then there must be exactly X numbers smaller than or equal to B_i in the subsequence B_{\mathop{max}(1, i-K)}, \dots, B_{\mathop{min}(N, i+K)}.
B_1 + B_2 + \dots + B_N is minimum possible.

For a given integer S, find the number of values of K (0 \le K \le N) such that the sum of elements of the K-compressed sequence does not exceed S.

Explanation

Let us first understand how to construct a K-compressed array B for given array A and value K. For this, we will iterate over the numbers in increasing order and try to assign the smallest number not yet assigned to any number smaller than it until now. This will be optimal as we try to assign the smallest possible number to each number and iterating in increasing order ensures that if the number assigned to a given number can’t be decreased further, the sum can’t be minimised further. This proves our greedy strategy. Let me explain it through an example, where A = [4 2 8 1 4 3 8 1] and K = 3.

Let the current array B be [-, -, -, -, -, -, -, -]. We now group the numbers having the same value and iterate in increasing order. The following are the values of the array B after each step:

[-, -, -, 1, -, -, -, 1]. Since both 1 are the first number to be considered, we assign them the smallest possible number in B, i.e. 1.
[-, 2, -, 1, -, -, -, 1]. Since, index 2 has number 1 less than it in range [max(1, 2-3), min(8, 2+3)] = min[1, 5]. We assign it the next biggest number.
[-, 2, -, 1, -, 2, -, 1]. Since, index 6 has number 1 less than it in range [3, 8], we assign it next highest number not present in this range i.e. 2. Note that though A[2] < A[6], we can still have B[2] = B[6] as there ranges do not coincide to give a conflict.
[3, 2, -, 1, 3, 2, -, 1]. For both index 1 and 5, we have it as the third largest number in their respective range. So, we assign them the next biggest number which was not used.
[3, 2, 4, 1, 3, 2, 4, 1]. This is similar to the above process.

Thus, the minimum possible sum is (3+2+4+1+3+2+4+1) = 20. Make sure you are clear with the idea before you proceed further.

Now there are some issues which might occur while implementing the above approach. The first thing is that we should deal with all the numbers having the same value together should be dealt together instead of simply iterating in increasing order. One simple counterexample for this is the array A = [10, 30, 30, 20, 10] and K = 1. The compressed array should be B = [1, 3, 3, 2, 1] itself but if we simply iterate over numbers in increasing order, we might end up getting array B as [1, 2, 3, 2, 1] or [1, 2, 2, 2, 1] which is again highly dependent on your implementation.

The correct logic is to first group the numbers by their values. Find what value you might end up giving them based on their ranges. Then, we need to be sure that the value we might give is correct or now. For this, we again iterate over the numbers and group them if their ranges coincide with each other. We assign it the all the numbers in the group the largest number we thought of assigning to any number in the range. For the example A = [10, 30, 30, 20, 10] and K = 1, the process is as follows:

The initial array is [-, -, -, -, -].
[1, -, -, -, 1]. We group 10 first. The ranges for them are [1, 2] and [4, 5]. Since they do not conflict, we assign them separately the minimum number to them. Both of them end up getting 1.
[1, -, -, 2, 1]. Since there is only one 20, we simply assign it the smallest number in the range [3, 5] which is not assigned yet, i.e. 2.
[1, 3, 3, 2, 1]. The ranges for 30 are [1, 3] and [2, 4]. Since they both coincide, we will assign them value together. The value we might end up giving A[2] is 2 since it is the smallest which is not used till now in the range [1, 3] in B. The value we might end up giving A[3] is 3, which is the smallest not used till now in the range [2, 4] for B. Since the ranges coincide, we should give each of them 3, the maximum of the number we might think of assigning them.

Thus, we can easily build the K-compressed algorithm using the above ideas. But how fast can we do it?

Doing it naively will take O(N^2) as it will require you to find the smallest number not assigned yet in a range, taking O(N) for this step alone. But if you restate this problem, it is similar to the following 2 operations:

Update the number at given index.
Find the largest number in a range.

This is a very familiar problem which can be solved using segment trees in O(\log{N}) for each operation. You can read about it here.

Thus, we can built a K-compressed array for given array A and K in O(N * \log{N}) complexity.

To find what possible values of K will lead to compressed array having sum less than S, we can simply iterate over all possible values of K and update the answer. This approach has a complexity of O(N * N * \log{N}) which is enough to pass the first 2 subtasks.

The next thing to note that we can binary search on the answer. To prove this, we need to prove that the minimum sum for K-compressed array does not decrease with increasing K. This again relies on the way we build our K-compressed array using a greeedy algorithm. Since we make sure that each number is compressed to the smallest possible value and the sum can’t decrease with increasing K as the smallest number which might get assigned to a number can only increase if it’s range (or window) increases.

Thus, the overall complexity of the approach reduces to O(N * {\log}^2{N}). This is enough to pass all the subtasks as well.

Once, you are clear with the above idea, you can see the author implementation below for help.

Feel free to share your approach as well, if it was somewhat different.

Note from Author:

The test case data was bit weak for the small subtasks where some wrong greedy approaches passed as well. Though the large test case ensured, wrong solutions could not pass for the full solution, but still, I failed in generating stronger test data for smaller ones. This is not completely our fault as there are many ways the greedy solution for this compression could be written where even index errors could happen which implementing. I would like to thank all the people who testing their solutions as well and helped me strengthen the test as well (Special thanks to Stepan). But still, we alone cannot come up with all possible greedy solution which one might write, so stronger test could not be prepared by me. By the time, I came to know about this it was already to late for a rejudge, but I will take care of it in the future problem.

Also, the problem statement seemed quite tough to comprehend for most of the contestants as well. User acmonster even pointed out a flaw in the English statement as well. Below were his comments:

“I guess that the problem (and test data) actually requires that sequence B should preserve all the relative sizes for each index pair (i, j) such that |i - j| <= K. In other words, if A[i] < A[j], we should have B[i] < B[j]; if A[i] = A[j], we should have B[i] = B[j], etc.
If this is correct, the fix does not seem to be sufficient. Consider A = (1, 1, 1, 2, 3), B = (2, 1, 1, 1, 2), and K = 2: both sequences has a “signature” of (2, 2, 2, 2, 2), but, again, the optimal sequence B does not preserve A[1] = A[2] = A[3] < A[4] < A[5].”

Though most of the contestants got what the problem statement meant, I will make sure to make better statements in future as well.

Time Complexity

O(N * {\log}^2{N})

Space Complexity

O(N)

SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.

Editorialist’s solution can be found here.

include_sajal · August 13, 2018, 4:34pm

That thing with balancing the similar numbers took too much time for me to solve finally and come above all those TLEs and WAs, because of some strong test cases in the last subtask. But, really a great question in the first place. Worth it!

dhruvgupta167 · August 13, 2018, 4:44pm

https://www.codechef.com/viewsolution/19654812
Can anyone please help me.My solution passed all the solution except the last one which is giving WA.
@include_sajal my solution was also giving TLE and WA first.I got rid of the TLE but there is still one WA.Could please tell what approach of handling those testcase.

hrishabh0901 · August 14, 2018, 10:15am

Can anyone please explain how O(N∗N∗logN) transformed into O(N∗log2N) using binary search??

namanjain007 · August 14, 2018, 11:24am

[hrishabh0901][1]
instead of iterating over increasing values of K,you can do bsearch as with increasing k,your sum would also be non decreasing…
[1]: https://discuss.codechef.com/users/229755/hrishabh0901

kshitij_07 · August 14, 2018, 4:39pm

Can anyone please explain the fault in my code. I coded as per the editorial, using only the greedy approach (No Segment Tree) and getting AC on 6 subtasks out of 10 but no complete AC in any task.
@likecs it would be great if you could kindly help out.

My Code

include_sajal · August 14, 2018, 5:27pm

Please tell me your B array for the input

1

20 2

3 6 7 5 3 5 6 2 9 1 2 7 0 9 3 6 0 6 2 6

decentgeek · August 14, 2018, 5:37pm

Please help me to find counter test case of my code of the problem KCOMPRES.
Only two task remaining.
My solution included segment tree approach.
My Solution

rohitp12 · August 14, 2018, 10:52pm

can someone explain the k-compress part of the question…thanks in advance
“for each valid i, if there are exactly X numbers smaller than or equal to Ai in the subsequence Amax(1,i−K),…,Amin(N,i+K), then there must be exactly X numbers smaller than or equal to Bi in the subsequence Bmax(1,i−K),…,Bmin(N,i+K).”

utkalsinha · August 15, 2018, 12:06am

You might want to change your custom comparator function. Particularly, if (a.first == b.first) . Because of it, your program might have been entering in an infinite loop and due to which you got TLE even for small input. Its behaviour is unpredictable so probably this is the reason you got another WA. Change it and try running it again and see if you can get all your test cases passed. See here for more on this: https://stackoverflow.com/questions/18291620/why-will-stdsort-crash-if-the-comparison-function-is-not-as-operator

Google more to find out about it.

faizz7 · August 15, 2018, 12:18am

Managed to optimize my brute-force approach and solve it without segment tree.

Time and space complexity remains the same as that of this editorial.

Binary search was the key in solving this problem.

My solution: https://www.codechef.com/viewsolution/19673140

chandyshot · August 15, 2018, 3:38am

at what value of k

chandyshot · August 15, 2018, 3:41am

Linear time is taken to traverse for each k, which can be reduced to logarithmic time if we use binary search for maximum k upto which sum<=s hold… thus reducing one N to logN

jlewis200 · August 15, 2018, 8:21am

I was able to solve all test cases during the competition with a completely different approach. The problem reminded me of software dependencies and I started looking at directed acyclic graphs and found that they fit this application fairly well. I sorted the list elements into a topological order where repeated elements were all given the same value. This is definitely similar to your grouping step in the editorial solution. For k = 0, each compressed element is given a value of 1. I then iterated through the values of k from 1 to N. For each incremental value of k, and each list element, there are at most 2 new “dependencies.” For example, at k = 0, each element must be greater than or equal to at most 0 elements. At k = 1, each element must be greater than or equal to at most 2 elements. At k = 2, each element must be greater than or equal to at most 4 elements, etc. … After each iteration of k, I found the max of the dependencies and incremented or equivalated as necessary. I did have an issue with solving the test cases for the largest values of N within the time constraints, but this was solved by recognizing that if S > ((N * (N + 1) / 2 ), then the result should be N + 1. That was my basic approach, but there was some optimization code to remove unnecessary dependencies, ones which would be resolved by other dependencies with a higher topological order. This was definitely a tough one, the comments section was full of frustrated and angry people, I was among them for a long time.

chandyshot · August 15, 2018, 8:35am

I’ve also observed this but couldn’t come out with a solution during the contest… sure it’s another way to solve this problem.

decentgeek · August 15, 2018, 4:04pm

I Changed, But didn’t work
Also, Thank you for your time which you give it.

rashomon · August 15, 2018, 11:11pm

I came up with this approach as well. What did you do to handle cases like [10, 30, 30, 20, 10], K = 1? Also, the expected complexity for this in my case was O(n^2 * lg(n)), since I was rebuilding my DAG for each K. Did you pass the last subtask with just the optimization on the value of S?

include_sajal · August 16, 2018, 1:04am

Oh yeah, sorry, for k=9.

jlewis200 · August 16, 2018, 7:03am

I retained the DAG for each incremental value of k and then just added the newest 0, 1 or 2 dependencies. Maintaining the dependencency list and eliminating the non-essential dependencies was key to getting this solution to work within the time constraints. I may have oversimplified my explanation a bit. For each list element I had what I called a “dependency” list, where the element in question must be greater than any element in the dependency list; and also an equivalency list, where the element in question must be greater than or equal to every element in the equivalency list.

jlewis200 · August 16, 2018, 7:05am

This approach mandates that k is iterated through from 1 to N, so the binary search approach isn’t possible here.