SADPAIRS - Editorial

dpraveen · June 15, 2016, 11:55am

PROBLEM LINK:

Contest
Practice

Author: Kevin Charles Atienza
Testers: Istvan Nagy
Editorialist: Praveen Dhinwa

DIFFICULTY:

hard

PREREQUISITES:

tree dp, merging smaller component with largest component

PROBLEM:

You are given a graph network consisting of n nodes and m edges. i-th person uses color c_i. For each vertex i, you have to answer following quantity - assume If we remove i-th vertex, how many pairs of vertices with same colors will get disconnected. We call such pairs sad pairs.

QUICK EXPLANATION:

Solve the problem on a tree
Convert the given graph into block cut tree and now apply the algorithm for solving the problem on a tree.

EXPLANATION:

Solve on a tree
Let us first simplify the problem by solving it on a tree. Note that each vertex of tree will be an articulation point - a vertex in a connected graph is considered articulation point if removing it from the tree divides the graph into more than one disconnected components. To be more precise, all the non-leaf nodes will be articulation points.

Let us root the tree at any vertex (say vertex 1). Let us understand what will be number of sad pairs if we removing vertex 1. Let there be child children of vertex 1. Consider a color c, Let cnt_1, cnt_2, \dots cnt_{child} denote the number of colors c all the children subtrees of vertex 1. Now number of sad pairs for this color will be sum of cnt_i * cnt_j for i < j childs.

Directly iterating over i, j will take O({child}^2) time. Instead, for each j, we can find sum of terms multiplied with cnt_j efficiently. Essentially it will be cnt_1, cnt_2, \dots, cnt_{j - 1}. You can compute this quantity by maintaining a prefix sum. Pseudo-code follows.

sum[0] = 0
For (j = 2; j ≤ child; j++)
    sad += cnt[j] * (sum[j])
    sum[j] += cnt[j]

So, now we know to how to find sad pairs for a given color. Finally total sad pairs will be sum of sad pairs for each distinct color.

Let us maintain a map<int, int> for each node v in the tree - denoting the counts of various color in the subtree of v. So, the key will correspond to color and the value will denote the count of corresponding color in the subtree of v.

Now a we do a dfs from starting node 1. In the dfs, we will maintain a map as described above. See the pseudo code

void dfs(from, parent):
    childs = {};
    for each neighbour to of from:
        if to = parent:
            continue
        childs.add(to);
        dfs(to, from)
    // Now at this stage, we will have maps for the childs computed.
    // Let us create corresponding map for vertex from using the child maps.
    map fromMap
    for each child c in childs:
        add map corresponding to c to fromMap

Time complexity of this solution will be O(n^2) and similarly space complexity will also e O(n^2). We need to optimize both space and time complexity.

Now, we will use a trick of merging smaller subtrees into the largest one at each node. For a very clear understanding of this trick, please see this nice tutorial. If possible, also solve the problems provided in the practice section.

For each node, let us call the child whose subtree has largest number of nodes as heavy child (if there are more than one childs, pick any of those). Call the remaining children as lighter children.

While computing map for a node, we will merge the maps of lighter children into the map of heavy child. Now, the map corresponding to the current node will be same as that of heavy child.

void dfs(from, parent):
    childs = {};
    heavy = none;
    maxSize = 0
    for each neighbour to of from:
        if to = parent:
            continue
        dfs(to, from)
        childs.add(to);
        if size[to] >= maxSize:
            maxSize = size[to]
            heavy = to
    // Now at this stage, we will have maps for the childs computed.
    // Let us create corresponding map for vertex from using the child maps, by merging the smaller maps into heavy child.
    let heavyMap = map corresponding to heavy map; // Please note that this is not equivalent to creating a copy. It is reference to that map.
    for each child c in childs:
        if c != heavy:
            for each (key, value) pair in map[c]:
                heavyMap[key] += value
    map[from] = heavyMap; // Please note that this is also not a copy. You have to make sure that this is a reference. In other words, make sure that memory used for map is same as that of heavyMap.

Let us consider an edge of the tree (u, v) - u is parent of v. We will call the edge heavy if v is a heavy child of u, light otherwise.

Let us estimate time and space complexity of this approach. The key, value pair maintained for a color in vertex v is moved into the heavy subtree of parent u, if u is light child, i.e. the movement of (key, value) pair is happending across the light edge.

Number of light edges in any path from root 1 to any other vertex can be at max \mathcal{O}(log n). Hence, it means that each key will be moved at most log n times.

Hence, it means that time complexity of this approach will be \mathcal{O} (n log n) * log n \text{(for operations on map)}.

Note that space complexity of this approach is \mathcal{O}(n log n) too. Please note the very crucial point that when you have merged the ligher subtrees into heavy subtree, then you don’t have to create an entirely new map having the content of merged subtree. You can just reuse the memory that you had allocated for heavy subtree. You can implement this by either using references or pointers etc. Please see various solutions for understanding details.

Solve the problem on a graph
We will find all the articulation points of the graph. Construct block-cut tree corresponding to the graph using these articulation points. Now, on this block-cut tree, we can apply the same algorithm as previous one to solve the problem. Note that you don’t need to construct the tree explicitly, you can use the tree strcuture implicitly, see the code of tester for details.

Please note that I have skipped some details on how to compute the sad pairs using the maps created for each subtree. These details should be relatively easier to figure out.

Time Complexity:

\mathcal{O}(n + m) (for finding block-cut tree) + \mathcal{O}(n * {log n}^2) for solving the problem on created tree.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter
Tester

atulshanbhag · June 15, 2016, 3:27pm

“We call such pairs such pairs”. Correct it… should be sad pairs

rajat1603 · June 15, 2016, 3:50pm

You can solve the question without block cut tree with just DFS in O(N\sqrt N\log(N)).

Lets solve for each color separately

If a color has appears more than \sqrt N time , then you can calculate answer for every node for this color in O(N) time with a dfs by looking at the discovery time and the topmost node visitable from this node with just 1 back edge

If a color appears less than \sqrt N time , then see all pairs of nodes with this color and add an update on path from one node to other and at the end see the value of each node and whether there is a back edge going up or not.

Complexity Proof

There are atmost O(\sqrt N) colors with frequency \geq \sqrt N So this part is O(N \sqrt N).

Now for colors with frequency \leq \sqrt N , lets say the frequency is X , then complexity for this color is X^2 So total can be atmost (N / X) * X^2 which is \leq O(N\sqrt N).

So Total Complexity is O(N\sqrt N log(N)) but the log factor is almost negligible because its just a binary search on adjacently list of a node.

AC code

TLE / RTE code but Neater

vishfrnds · June 15, 2016, 11:30pm

I have not read block cut tree. But this question can be simply solved by dfs (iterative due to stack over flow) and using John Hopcroft and Robert Tarjan articulation point algorithm’s low and depth variable to decide if child node values should contribute to its parent or be deferred to be merged at end after all contributing nodes are merged to its parent.

solution

pb_ds · June 16, 2016, 1:21pm

I used Tarjan’s algorithm and virtual trees to get O(n + m) + O(n \log n). It runs as quickly as I want.

The elicitation method of merging runs very quickly too, unlike other O(\log^2 n) algorithms.

fauzdar65 · June 16, 2016, 3:31pm

So basically you do similar Dp as in editorial but on DFS-tree instead of block-cut tree. right ?

wittyceaser · June 16, 2016, 5:24pm

Can you provide some link about virtual trees or explain what they are?

vishfrnds · June 17, 2016, 1:02am

No, it is not a dp. As it does not have overlapping sub problems. A simple dfs is enough.

r_64 · June 17, 2016, 6:30am

@pb_ds : “virtual tree” is called “auxiliary tree” in this editorial. Here is the code to create an auxiliary tree for a given set of points (from the editorial above):

void CreateEdges(vector<int> &nodes) {
    stack st;
    for (int v : nodes) {
        while (!st.empty()) {
            int u = st.back();
            if (u is an ancestor of v)
            {
                addEdge(u, v);
                break;
            }
            st.pop();
        }
        st.push(v);
    }
}

My approach for the problem :

First use tarjan and construct the dfs tree.

For every color c find all vertices of color c, and create their auxiliary tree. Suppose there are k vertices of color c. We are going to consider this color’s contribution to every vertex(in the original tree).

First calculate subtree sizes. Let’s denote this number as siz[x]: how many vertices in subtree x in auxiliary tree has color c.

For a vertex x, if it’s in the auxiliary tree, it’s answer is \binom{k}{2}-\sum_a \binom{s(a)}{2}, where a is a connected block if x is removed, and s(a) is its size(how many vertices of color c is in a). To calculate this value, we need to iterate all children y of x. Let z be the vertex in the original tree that is x's child and y's ancestor, if low[z] \ge dfn[x] then minus the answer by \binom{siz[y]}{2}. At last let s_{out}=k-\sum_{low[z]\ge dfn[x]}siz[y] which is the size of the connected block where root locates, minus the answer by \binom{s_{out}}{2}.
For a vertex x, if it lies on an edge of the auxiliary tree, let’s see if removing the edge between x and its parent (in the original tree) makes the graph disconnected. That means low[x] \ge dfn[parent[x]]. If it does, remove the edge in the auxiliary tree and it breaks into 2 connected components of size a and b. Then the answer is \binom{a+b}{2}-\binom{a}{2}-\binom{b}{2}.
For a vertex x, if none above, the answer is 0.

Consider how to do 2) efficiently. Let’s assign a value for each edge in the original tree. For an edge (p, c)(p is the parent) in the auxiliary tree calc the value Value = \binom{k}{2}-\binom{siz[c]}{2}-\binom{k-siz[c]}{2}, and add the value of all edges in the original tree Value, if this edge fulfills(p' is the parent of the edge and c' is the child):

low[c']\ge dfn[p']
This edge lies on the path from p to c.

It can be done in linear time, because we only perform add operations and at last query for all edges’ values.

The overall complexity is O(m+n\log n), because creating auxiliary tree requires a log.

fauzdar65 · June 17, 2016, 9:11am

If you are using map to store frequency count of a node and building it using maps of children, by definition it is dp even if subproblems are not overlapping

vishfrnds · June 18, 2016, 3:31am

Yes you are you are right.

dhirajmadan1 · April 2, 2017, 1:31pm

Hi
I have understood that each key can be copied at most log(n) times in any path.However I haven’t understood how that contributes to nlog(n)^2 in the overall time complexity.I think it should rather have been O(n log(n)).(Since there are n distinct keys each copied at most log(n) times).Is there any tutorial which explains this analysis in more detail?