Problem Link
Author: Azat
Tester: Pawel Kacprzak and Misha Chorniy
Editorialist: Bhuvnesh Jain
Difficulty
EASY-MEDIUM
Prerequisites
Tries, State Transitions
Problem
You are given a list of websites, some of which are blocked. You need to find a set of prefixes such that each of the blocked websites are recognised by some prefix but none of the non-blocked websites are. The aim is to minimise the sum of the length of the prefixes found.
Quick Explanation
Build a tries of the words, storing information regarding the words, their type (blocked or non-blocked). Use transitions during the traversal of trie (dfs) to find the optimal answer, if it exists.
Explanation
First of all, the problem deals with prefixes. Generally, problems on strings related to prefixes can be solved using hashing or trie. We will use trie to solve this problem.
Before starting to solve the problem, like what kind of data structure we would require and what information it should store, let us first analyse when the solution exists or not. The solution will not exist only when the blocked site is a prefix of a non-blocked website.. In all other cases the solution will exist because in the worst case, we might select the whole word for all the blocked websites.
Now, suppose the solution exist. We would like to minimise the total length of the selected prefixes. Now, since the length of one prefix is independent of the other, the total length would be minimum when we minimise each prefix selected. For minimising each prefix selected, we just want to selected the first character which differentiates between 2 blocked and non-blocked websites. Such a letter will exist as otherwise the solution would not have existed.
Let us understand this through examples. Let the blocked website be “codeforces” and allowed website be “codechef”. Now, the 5^{th} letter (“f” vs “c”) is the one which differentiates the 2 websites. Let the blocked website be “bing” and allowed website be “google”. In this case, the 1^{st} letter will only differentiate the 2 websites. Let the blocked be “cs” and allowed website be “csacademy”. In this case the solution will definitely not exist. As all the prefixes, “c” and “cs” would block both the websites.
From the examples, above our data structure should be able to handle 2 type of operations. They are :
- Detecting whether the blocked website is a prefix of any non-blocked website.
- Finding efficiently the first point of difference between 2 websites names.
Doing this through brute force will take complexity O(\sum_{i=1}^{i=n} \sum_{j=i+1}^{j=n} min(L_i, L_j), where L_i = length of string i. This is because the comparison between 2 strings to do the above operations can be done using a simple for loop, which terminates in worst case when either of the string to be processed finishes. In the worst case, it will be quadratic in the sum of length of strings given. Thus, it will only be able to solve subtask 1. For the complete solution, we use tries and store the following information in it to be able to handle above operations efficiently.
- State 0 = no websites.
- State 1 = non-blocked websites only.
- State 2 = blocked websites only.
- State 3 = both blocked and non-blocked websites.
After building of the trie, we just do a traversal (dfs) on it. Now, few things should be noted. Whenever we are in state 3, we can go to state \{0, 1, 2, 3\} in the child node. When we are in state 1, we can only go to state \{0, 1\} and when we are in state 2, we can only go to \{0, 2\} in the child node. This information will be enough to be able to handle above operations. Below are the claims for it :
- Whenever, we make have a node at state $3$ and all of its children have state $\{0, 1\}$ only, we see that a blocked website is a prefix of a non-blocked website. Thus we should terminate our search here and claim that it is "impossible" to constrict an answer.
- Otherwise in all cases, the solution exist. To find the minimal length, whenever we see that a state $3$, has a child with states $\{0, 1, 2\}$, we are sure to have found a point of difference between the blocked websites and non-blocked websites. This is when we should stop are search together on this current branch and add the required prefix to our solution.
- Otherwise, the whole search space would only consist of wither blocked or non-blocked websites. This is easy as non-blocked website are not required to be searched. Also, for blocked websites, the $1^{st}$ character would only differentiate between the blocked and non-blocked website and thus we can terminate the search here too.
All the above operations can be done in O(\sum_{i=1}^{i=n} {L}_{i}), where L_i = length of string i. This is because the trie can be build in above complexity and can be traversed in above complexity too.
If you had some other logic/algorithm to solve the above problem, feel free to discuss below
Time Complexity
O(\sum_{i=1}^{i=n} {L}_{i}), where L_i = length of string i