BINSTR - EDITORIAL

I am anticipating the end of this useless conversation :P

Because from ashes of one’s end, rises another. :smiley:

1 Like

Yeah sure. Will do it by today evening.

2 Likes

So here is the full explanation of my solution.

The first thing we need to notice here is that searching in the trie is impossible if all input strings are going to be of different length.

So in order to solve this problem we pad 0’s to the beginning of all input strings to make to a length maxi, which denotes the length of the largest unmodified input strings.

Second thing to notice here is that if we have a query string whose length > maxi, we can trim to maxi. If we have a query string whose length < maxi , we can pad 0’s to it.

Now the only thing left here is that the zero padding. This can take a lot of space since there can be an input string of length 10^5 and all other strings of length 1. This would take the complexity to 10^10.
So instead what we do is that we just have a count of 0’s to be padded.

For the input strings we initially insert into the trie a string of maxi 0’s and store the memory address of all these zeroes in a map whose key is the count of 0’s before it.
What this does is that we can now identify the count of 0’s required for the input string and jump directly to that index and then start adding the input string to the trie. Thus, the complexity of input is just O(len + n*log(maxi)).

Also in the leaf node we add the index of the string to a vector.

Now we have to compress the trie. Initially while constructing the trie itself we have a pointer for the next node/nodes from this node. Now we run a dfs on the trie.
Also notice here that there are a lot of wasted padded 0’s which may be not required. For this what we do is that if a child of the node does not have any element in the index vector then we mark that child as NULL.

Now we have 3 cases :
1.If a node is a leaf node(has both children as NULL) we set next as that node itself.

2.If the node has exactly one child we set the next pointer as the next pointer of that child.

3.If the node has 2 children set next pointer as that node itself and merge the indices vectors of the 2 children into the index vector of this node.

Now on doing these steps from the leaves to nodes you end up with a compressed trie.

Now the last part is querying. For this as discussed earlier we make the length of the string as maxi either by removing characters or padding 0’s. Now we query similar to querying a normal trie.

If a node has exactly 1 child go to next pointer of that.
If a node has 2 children find the index with which you can maximise the XOR. Now check if that child has atleast 1 element from l to r. This can be done by binary searching on the index vector.

Now notice that when we jump in the trie we should also jump a specific number of characters in the input string. For this in the trie we store the depth of each node. This does not get modified during the compression.
So the element in the string can be found by refering to the particular index in the string. This is done as s[depth-pad_length], where pad_length is the number of 0’s that have to be padded to the input string. If depth<pad_length] then the index of the child to be preferred is given as 1(Since 0 is maximized with 1 in xor).

2 Likes

@taran_1407 Added the explanation as a separate answer here.

1 Like

Interesting solution. :slight_smile:

My first idea too was trie compression, but didn’t reach this solution. Thanks

1 Like

@sdssudhu , bro can u plzz explain this
Second thing to notice here is that if we have a query string whose length > maxi, we can trim to maxi , we can take the max value from the 0 to maxBit using the Trie but the trimmed bits can also contribute to the answer if 0’s can be appended at the input string. I know i am missing something . Plzz can u correct me out.

@devil2202 Yeah the trimmed bits also contribute to the answer no doubt. But what we want is the index of the element which makes the greatest xor and not the value.

So the trimmed bits are going to remain the same for all the elements, since all bits > maxi are going to be set as 0 and 0^x=x. Hence we can ignore it

What is the time complexity of querying the compressed trie solution as explained in the answer given by @sdssudhu? Isn’t it O(Q * sqrt(length of all input strings) * log(n)) since in worst case we might visit sqrt(length of all input strings) compressed trie nodes?

The input generated by the following python code takes < 1 sec on the editorial’s solution and > 20 sec on the compressed trie solution.

print("1000 100000")
print("1"*1000)
for i in range(1, 1000):
	t = "1"*i + "0" + "1"*(999-i)
	print(t)

for i in  range(100000):
	print ("1 1000 1")
1 Like