BINSTR - EDITORIAL

taran_1407 · November 28, 2018, 3:49pm

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2

Setter: Danya Smelskiy
Tester: Zhong Ziqian
Editorialist: Taranpreet Singh

DIFFICULTY:

Medium-Hard

PREREQUISITES:

XOR Operation, Binary Search, Persistent Data Structures, Segment Tree, Tries and a lot of Implementation.

PROBLEM:

Given N binary strings numbered from 1 to N, we are to answer queries of form (l,r,X) where we have to output Minimal index i such that max(A_l \oplus X, A_{l+1} \oplus X, \ldots ,A_R \oplus X) = A_i \oplus X.

SUPER QUICK EXPLANATION

In case length of string X is Greater than or equal to Maximum of length of all strings in range [l,r], we can just ignore the bits of X over and above the maximum length of binary strings in range [l,r] and find out the best string using remaining bits of X.
Otherwise we have at least one binary string having the length greater than X, say length D. So, to maximize XOR, It is optimal to choose the set of string having length D. If multiple strings, choosing the set of string having maximal prefix up to length D-|X|. If still multiple strings having the maximal prefix, We proceed with all strings in this set and try to choose the best string using bits of X.
To get the best string, i.e. to maximize XOR with X, we always try to choose the set of strings out of our current set of strings, which has the current bit opposite to current bit of X. In case we have no such string in this set, we are forced to use the set of strings having the same bit.
For doing all this, we can build an array B which stores the number assigned to each string had the strings been in sorted order of strings. By building a persistent tree with values inserted from right to left in order of B[i], we can check if a range in B contains at least one value belonging to the query range.
In case of the query of type 2, For finding rightmost string having prefix just less than or same as the length of the longest string in range by another segment tree giving the length of the common prefix in a range.

EXPLANATION

This problem probably has a variety of solutions, but I’m focusing on setter’s approach to this problem. Feel free to share your approach in answers.

Let us consider a basic solution, where we insert all strings into a trie having set at each node denoting indices of strings passing through the current node. For answering queries, we need to move over the trie steps equal to the maximum of length of all strings. For our problem, this length can go up to 10^6 in the worst case, which will lead to TLE for answering 10^5 queries. Clearly, we need a faster approach.

We can create an array B storing indices of strings, sorted in order of the value of the binary string. We have the index of smallest string at first position, second smallest string at the second position. Note that here, a string is smaller than other string, if it has the smaller length than other string, or it has the same length as other string and is lexicographically smaller than the other string.

Let us try to answer queries now. We have query (l,r, X) and suppose, we have a range (ll, rr) initially (1,n) denote the current range over the B array. We can classify queries into two types. One in which range (l,r) do not have any string larger in length than the length of the query string, and queries of the second type will have at least one string which has the length greater than the length of the query string.

Consider only queries of the first type now. See, In this case, all bits of X over the length of the maximal string in the range (l,r) are useless and can be ignored. So, we can just maintain a range (ll,rr) of B array denoting the range in which our answer lies in. While processing the ith bit, we maintain the invariant, that all bits higher than ith bit are same for all positions (ll,rr) in the range B. We also need a helper function check, which tells us, whether (l,r) of the A array contains any string from range (ll,rr) from the B array or not.

Now, How do we handle our query, assuming we have range (ll,rr) in the B array and query (l,r, X) while processing the i bit. Since we ensure that all strings in the range (ll,rr) have all bits significant than ith bit same, then we can find using binary search, the position K which is the leftmost position in the range (ll,rr) having the ith bit on. This way, if ith bit of X is off, we would want to have ith bit of answer string on, so we want to check whether in range (K, rr) of B array, do we have any string from range (l,r) of original order of strings, using our helper function. If yes, we update range (ll,rr) to (K, rr) and move toward lower bit. If helper function returns false, we know, all strings in the range (l,r) have ith bit off, and we update (ll,rr) to range (ll, K-1) and move to lower bit. After last bit, if we still have multiple strings in the range (ll,rr). We just find the smallest index out of those strings.

This way, if we can get the helper function to work, we can easily answer queries of the first type. One way is to make a segment tree with an ordered set of indices at each node, storing indices of strings which lie in the range (ll,rr) in the B array. Using this, we can query on this tree to get the answer to our function in O(log^2N) time. Since the Sum of query string length overall queries may lead up to 10^6, O(log^2N) per query is too slow. We need a bit of persistence.

Here, we shall build persistent segment tree with range min query where each version of the tree will represent a suffix (or prefix, your choice of implementation) of positions of original array inserted at their positions in the B array. This way, we can check, in the version of segment tree representing suffix (l,n) of the array A contain any element in the range (ll,rr) if the minimum index inserted in range (ll,rr) is \leq r or not.

This way, we have answered queries of the first type. But what happens when there’s a string in the range (l,r) having the length greater than X.

Consider following

000000XXXXX (query string ,0 added to make strings equal in length)
XXXXXXXXXXX (Maximum string in query range)

We know, to get Maximum XOR, we would want the left portion of answer string to be as large as possible, and if multiple strings having a same left portion, the best string shall be decided using right portion. Due to the ordering of B, we know, all strings of the same length, having same left prefix shall form a consecutive segment in the B array, which contains our answer string. So, We need to find this interval and then, if this interval contains more than one string, we can just solve it the same way as queries of the first type.

So, we need the endpoints of the segment in the B array, which have left portion same as the left portion of the maximal string in query range. We can find the right end by simply finding the position of the largest string in query range, because, all strings in query range shall always lie to the left of this position. This can be done using a query to our persistent segment tree.

To find the left end, we know, the string at left end position has the same prefix up to the length of left portion. We can build another Segment tree answering length of the common prefix in a given range, then we can easily find the leftmost position of string which has the same prefix as the maximum string. So, using this tree, we can easily find the leftmost index in the B array, which has the same prefix as the maximal string. This way, we know both ends of the segment of the B array, which contains our answer, depending upon the right portion of the X string. We can now proceed the same way as the first type of query now, within this range of the B array, thus solving the task.

Implementation

The implementation of this problem is a bit complex, so, feel free to refer solutions below.

About Finding smallest position K in the range (ll,rr) of the B array, which has the ith bit on, given all strings in the range (ll,rr) have bits larger than ith bit same.

Since we have all bits significant than ith bit same, due to the ordering of indices in B, a prefix of range (ll,rr) (possibly empty) will have ith bit off, while the remaining suffix (possibly empty) shall have the ith bit on. This allows us to use binary search to find out minimum position K in the range (ll,rr) having the ith bit on. we may also build additional bit vector array for same.
Maximum prefix tree is just a range minimum tree, where for the base level, we manually calculate the length of the common prefix of adjacent strings in B.
In alternate implementations, string can been reversed for ease of implementation, so, the bit at position 0 corresponds to the least significant bit.
Tester uses complicated square root type implementation, not recommended, but still, you may refer, if looking for a different solution and are not afraid.

Related Problems

Two recommended problems related to tries are GPD and XRQRS. Apart from that, It is nice to practice to solve only first subtask in O(|X|) time using persistent trie where |X| is the sum of the length of all strings in the input. Apart from that, These on tries, persistence, persistence form a good reading materials for this and related problems.

For Segment tree and Binary search, I guess this would be sufficient.

Time Complexity

Overall Time complexity is O((N+Q)*logN).

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution
Tester’s solution
Editorialist’s solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed.

aryanc403 · December 3, 2018, 3:36pm

Long Scary Editorial.

sdssudhu · December 3, 2018, 5:55pm

It can be solved by just compressing the trie : https://www.codechef.com/viewsolution/21617045

It is a bit faster than the setter’s solution also.

vijju123 · December 3, 2018, 6:51pm

*le me scrolls down the editorial*

*le me cut-copy-paste*

<copy>

Woah!! This long editorial, s̶c̶a̶r̶y̶ ̶e̶v̶e̶n̶ ̶a̶f̶t̶e̶r̶ ̶s̶o̶l̶v̶i̶n̶g̶ ̶t̶h̶i̶s̶ ̶p̶r̶o̶b̶l̶e̶m̶. 1000x more scary as I did not upsolve the problem yet.

</copy>

Also,

SUPER QUICK EXPLANATION

I am questioning life and universe and everything else after looking at this and the length of 4 points beneath it.

(Hopefully people know the context of this by now xD)

joffan · December 3, 2018, 7:21pm

Thanks so much for this detailed editorial. Worth the wait

l_returns · December 4, 2018, 10:50am

Hahahaha @vijju123

taran_1407 · December 4, 2018, 11:12am

Glad you liked

taran_1407 · December 4, 2018, 11:13am

Nice

Would be good for all if you explain your solution a bit.

taran_1407 · December 4, 2018, 11:14am

Thanks @vijju123 for this comment

dextrous · December 4, 2018, 1:18pm

editorialist’s solution page is not visible

ryan312 · December 4, 2018, 4:58pm

Editorialist’s solution page is not working. Please resolve it.

vijju123 · December 4, 2018, 5:20pm

No need to thank @taran_1407 :). You have earned this by your generous comments on FCTR editorial. Also, you’d be pleased to know that these comments are earned in a pack of at least 50 per comment :D.

l_returns · December 4, 2018, 5:26pm

XDD @vijju123
pack of 50 reminds me that answer you wrote

taran_1407 · December 4, 2018, 5:28pm

You really deserve a thanks for commenting this here, and on problem ABGAME, and for coming editorials too.

taran_1407 · December 4, 2018, 5:53pm

Posted my solution at https://ideone.com/dyc1Pq

vijju123 · December 5, 2018, 6:10pm

No issues @taran_1407 , though I doubt you anticipate my comments so much as “someone else” on the forum

aryanc403 · December 5, 2018, 7:05pm

No issues @vijju123, though I doubt deep inside @taran_1407 knows he anticipates only one person’s comment on this forum

vijju123 · December 5, 2018, 8:53pm

@aryanc403 - True, perhaps he is anticipating that his list of people whose comments he anticipates grows ? :o

aryanc403 · December 5, 2018, 10:08pm

Or perhaps he is anticipating comments of a coin instead of heads/tails xD

taran_1407 · December 5, 2018, 10:41pm

I am anticipating the end of this useless conversation