Encoding Trees to Strings, KMP, Hash
Given an original labeled binary tree with N nodes, for a query labeled binary tree with M nodes, report whether it is a subtree of the original one. Subtree is a subgraph with a node as its root and all its direct or indirect children. We need to deal with Q queries in total.
This problem looks like a special cases, let consider a more general case. The tree is not restricted to binary. Furthermore, both nodes and edges are labeled. For example,
First of all, we should observe that in the Depth-First Search (DFS) traverse, all nodes in a subtree are consecutive in the DFS Sequence. Because of that, we can define the following recursive equations to transform the tree into a string.
if u is a leaf
= (nodeLabel(u) edgeLabel(u, 1) subtree(child_1) edgeLabel(u, 2) subtree(child_2) ……)
But here is another problem, how can we order the children? It can be solved by lexicographical order of “edgeLabel(u, i) subtree(child_i)”.
Take the subtree in the above for example, subtree(3) = “(b)”, subtree(4) = “(b)”, subtree(2) = “(bA(b)B(b))”, subtree(1) = “(aA(bB(a))B(bA(a)A(bA(b)B(b)))B(cA(a)))”.
But directly using strings is too expensive in time. So we can adopt hash to save time. When hash is adopted, the order of children can be solved by the numeric order of the hash values of “edgeLabel(u, i) subtree(child_i)”. And thus, we can finish the preparation in O(NlogN) time, logN is caused by the sorting children in a certain order.
Since we only have N subtrees in the original tree, simply store them in some data structures such as hash table, sorted array, balanced binary search tree, etc… can deal with queries. That is, we can deal a query in O(MlogM) time.
So, the total time complexity for Q queries is O(NlogN + QMlogM). In this problem, since we only have two children, sorting can be replaced by a single comparison, and thus the time complexity is O(N + QM).