CHSEQ22 - Editorial

djdolls · February 17, 2014, 3:00pm

PROBLEM LINK:

Practice
Contest

Author: Roman Furko
Tester: Hiroto Sekido, Gerald Agapov
Editorialist: Ajay K. Verma

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Connected components

PROBLEM:

We are given an array of N elements and a set of M intervals. Initially all the elements in the array are set to zero. In a single operation an interval is picked and all the array elements inside that interval are flipped. The task is to find out how many different arrays can be obtained by applying the above operation repeatedly.

QUICK EXPLANATION:

Refer to the next section.

EXPLANATION:

Let us start with an example. Suppose we have 5 intervals: [3, 5), [2, 4), [2, 8), [4, 8), and [2, 13). Note that all the intervals are represented in left-closed, right-open way (i.e., the left end point is included in the interval, while right end point is not).

We perform operations on the array based on the above intervals, and obtain sequences. Let us say we have one such sequence w. The sequence w will start with a series of 0’s, at some point it will make a transition to a series of 1’s and then back to 0’s and so on. The positions at which these transitions happen must be one of the end points of the given intervals. We call these positions as the critical position of the sequence. It is easy to see that if we know the critical position of a sequence, and its length, we can in fact construct the sequence. In other words, there is a one-to-one correspondence between the subset of critical positions and the sequences obtained. Hence, the task reduces to find the number of valid subsets of critical positions.

Any subset of critical positions will only contain the positions which are the end points of some of the given M intervals. For example if, we flip the interval [2, 4), [4, 8) and [3, 5), then the obtained sequence will be [0, 1, 0, 0, 1, 1, 1, 0, …], which has critical positions {2, 3, 5, 8}.

It can be seen easily that if we pick a set of intervals I₁, I₂, …, I_k and flip then, then an endpoint will be a critical position in the obtained sequence if and only if this endpoint occurs in an odd number of the chosen intervals. In the above example 4 is the endpoint of two intervals [2, 4) and [4, 8), hence it is not among the critical positions of the sequence, while all other endpoints are the endpoints of a single intervals, and hence they are among the critical positions of the sequence.

This is because whenever we pick an interval it is equivalent to insert two critical positions corresponding to the endpoints of the interval, it means that we need to make a transition at these positions. However, if any of these positions are already present in the set of critical positions, then we are basically trying to make a double transition at this position, which is equivalent to no transition at all.

Reduction into a graph problem:

Based on the above observation, we can create a graph where nodes correspond to the end points of the given M intervals. For each interval, we create an edge between the nodes corresponding to its endpoints. Each edge can be labeled as 0 or 1, representing whether the corresponding interval is picked or not. For a given labeling of the edges, we can compute the critical positions of the obtained sequence easily: For each node compute the XOR of labels of edges incident on this node, if the value is one, then this endpoint is one of the critical positions of the sequence, otherwise it is not.

So now the problem reduces to the following: We are given an undirected graph, and we are allowed to label each edge as 0 or 1. How many different set of node labels can be obtained by such labeling (the label of a node is the XOR of the labels of edges incident on it).

The advantage of this reduction is that we can partition the graph into connected components, compute the number of possible node labels for each component, and then multiply these values together to get the number of possible node labels of the whole graph.

Number of node labels in a connected component:

In this section we show that if a connected graph has n nodes, then the number of possible node labels is (2^n-1). The proof is easy and is based on induction. First, we show the result for a tree, and then show that adding an edge between the nodes already connected does not affect the result.

Tree:

If there is a single node, then only one node label is possible namely zero. If there are two nodes connected by edge, then two node labels are possible: (0, 0) if the edge label is 0, and (1, 1), when the edge label is 1.

Now assume that the result holds for any tree with n nodes. And we insert a new node by connecting it to one of the existing node v. If we label the inserted edge as 0, then the new node will have label zero, while all other nodes will have the same label as before, i.e., we have 2^n-1 labels. On the other hand, if we label the inserted edge as 1, then the new node will have label 1, the label of v is flipped, and all other nodes will their label unchanged, i.e., we have additional 2^n-1 labels. This means, for a tree of (n + 1) nodes we have 2ⁿ possible node labels, which completes the induction step.

Edge insertion between two nodes which are already connected:

Let us say we have nodes u and v which are already connected by a path consisting of edges e₁, e₂, …, e_k. Now, if we connect the two nodes by an edge e, the labeling of this edge as 1 is equivalent to flip the label of node u and v, while keeping the labels of all other nodes as is. This could have also been achieved by flipping the edge labels of e₁, e₂, …, e_k. In other words, inserting the edge (u, v) does not create any extra set of node labels.

This completes the proof that number of node labels of a connected graph with n nodes is 2^n-1.

Algorithm:

Create the graph corresponding to intervals as described above. Let us say this graph has m nodes. Compute the connected components of the graph. Let us say there are k connected components of size m₁, m₂, …, m_k, then the answer would be
2^m1-1 * 2^m2-1 * … * 2^mk-1
= 2^m-k

TIME COMPLEXITY:

O (M)

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution will be put up soon.
Tester’s solution will be put up soon.

xellos0 · February 17, 2014, 5:16pm

The “number of node labels” part can be done without any induction at all.

It’s clear that the order of the operations doesn’t matter, and we don’t need to use any operation twice. Also, any operation is its own inverse.

Notice an analogy with linear independence of a set of vectors (vectors in the most general algebraic sense) and our intervals.

To be precise, applying several operations is their “sum”; if this sum is zero (no bits flipped) for any non-empty set of operations, where each operation is used at most once, we can remove any one of them without changing the result (we can replace applying this operation by applying all the other ones). That’s a linearly dependent set.

On the other hand, if there are 2 different ways (sets of operations A and B) to get the same array, then the set A U B, after removing pairs of identical operations, is lin. dependent. So, if there’s no lin. dependent set, then any sequence of operations gives a different array; if there are k sets left, the answer is 2^k.

Now, we need to know what a lin. dependent set is. If its operations are applied to an array, no bits are flipped, which is only possible if there’s an even number of endpoints between any 2 elements of the array (think why). In terms of a graph with vertices representing endpoints and edges representing operations (no labels here), it means an Eulerian subgraph, which a cycle obviously satisfies - so if there’s no lin. dependent set in a graph, it must be a forest. We can see that any tree (and therefore any forest) satisfies the conditions, because an Eulerian graph must contain at least one cycle, which a tree can’t.

The problem now reduced to counting connected components, just like in the editorial.

akumar3 · February 17, 2014, 7:54pm

If, we flip the interval [2, 4), [4, 8) and [3, 5), then the obtained sequence will be “[0, 1, 0, 0, 1, 1, 1, 0, …]” not what is written there…

There are people reading editorials for learning something new…
And its very irritating to read editorials with editing mistakes

Although, I accept that mistakes cannot be avoided, but they are in every editorial I am reading…common!

georgejoseph · February 17, 2014, 9:31pm

Sir,
I am new to the blog. So if any dialogue feels offensive i am really sorry.
I think that the problem doesn’t accept solutions of complexity O(n^2). But I see some solutions that are of this complexity.
I am attaching on of the accepted code that i saw with this complexity.

N = c = getn(), M = getn(), r = 0;

for(i = 0; i < M; i++){

	L = getn(), R = getn();
	if(L > c || r == N)
		continue;
	while(1){
		if(R == a[L])
			break;
		if(!a[L]){
			a[L] = R;
			if(L == c)
				while(a[c--]);
			r++;
			break;
		}
		if(R > a[L]){
			L = a[L]+1;
		}else{
			t = R, R = a[L], a[L] = t;
			L = a[L]+1;
		}
	}
}

I think for a sequence of n=k=10^5…and for a input

this code will exceed the time limit…

shashwat001 · February 17, 2014, 10:42pm

There are many mapping of this problem and I was very interested in knowing the one used in editorial. I reduced this problem to solving the rank of the matrix. Consider an M X N matrix ( M rows and N columns).
Each row has elements either 0 or 1. Each row corresponds to one of the range given in the problem. Consider range l_i - r_i, which corresponds to i^th row in matrix. The elements from l_i^th column to r_i^th column in the row are 1 and other are 0. So we need to find the rank of the formed matrix. If rank is R, the solution of the problem is 2^R.

The required rank can easily be found by simulating book method. First leave only one row with 1 in first column by carrying out suitable row subtractions. Then take on second column and so on. First sort the ranges in order of increasing first index, then subtract the ranges with same first index ( subtract smaller range from larger range).

Time Complexity: O(N log M)

akumar3 · February 17, 2014, 11:00pm

Wow!
I don’t have any intuitive definition for rank of a matrix except that “it is the size of the largest square sub-matrix with non-zero determinant”.
Can you please give the intuition behind calculating the rank of above formed matrix?
Thanks.

xellos0 · February 17, 2014, 11:08pm

^ It’s the dimension of the subspace of row vectors (or column vectors, they can be shown to be equivalent). If that helps

dawnavd · February 18, 2014, 9:58am

Really a very smart approach …

buda · February 27, 2014, 6:49pm

Do you have an argument that you don’t generate more than O(1) new ranges (on average) in the worst case when processing each range? I mean, as far as I understand your code, it’s based on the fact that if you have two ranges that start at the same position, then that’s the same as keeping the larger range (the one that ends more to the right) and replacing the smaller range with a new range that is basically its inverse within the larger range. This is good because it moves the problem to the right and it will eventually be solved. However, I was worried about the complexity of this…

lerence1201 · March 3, 2014, 2:08pm

I like the “Reduction into a graph problem” part. I stuck at this part in the contest. And it turns out if you can figure out this trick, this problem can be easily solved by Disjoint set.

atulsehgal · March 3, 2014, 6:56pm

I solved this problem by converting it to a graph problem, although my approach is a bit different…
I have tried to explain it here