F2NDMAX - Editorial

PROBLEM LINK:

Div1
Div2
Practice

Setter- Hasan Jaddouh
Tester- Ivan Safonov
Editorialist- Abhishek Pandey

DIFFICULTY:

Easy-Medium

PRE-REQUISITES:

Graph Theory, Concepts of Acyclic Graph like Indegree & Outdegree, Correlating problems to Graph Theory, Logical Reasoning.

PROBLEM:

Given an array A of N integers, you are given M relations which tell that element at index A is greater than element at index B. You need to find minimum number of questions to ask to determine the second maximum element, provided that Chef is trying to maximize the number of questions asked.

QUICK EXPLANATION:

Key Concept- Correlate the question to Graph Theory. Say there is a directed edge from A to B if A>B. Now, think about the final graph we get after we determine the second largest element. In it, the maximum array element will have a indegree 0, second maximum element must be a direct child of it.

Let us construct a graph from the details given in the question. The question guarantees consistency, hence cycles are not allowed in this graph (Why?). Note that the maximum element will have in degree 0. Also, indegree of second-largest element cannot be more than 1 in any case. Hence, remove all nodes with indegree >2 from list of candidate nodes for answer.

Find number of 0 indegree nodes in this graph. We must first connect each of them to determine the maximum element. However, we cannot arbitrarily connect nodes. In the ideal way, we tend to always ask questions/add edge between two 0 degree nodes first, as this removes at least 1 element from candidate elements for second maximum (we will realize this at the end - because, ultimately either that node, or its direct-son can be second maximum- in any case after adding the last edge we will see we eliminated 1 node per edge/query) . After that, we connect as such to minimize number of direct-sons of final 0 degree node (the maximum element).

Once this is done and we have found out the maximum element, check number of direct sons of it. By direct son, I mean "sons which have indegree=1" (we can easily see that second maximum element cannot have more than 1 indegree). If it has A children, then A-1 questions need to be further asked to determine second maximum (Why?).

EXPLANATION:

The answers of all the “(Why?)” will be given in Chef Vijju’s corner. This editorial will have a single section describing setter’s logic which is quite easy to implement.

Setter’s Solution-

Note a few observations with me first. This question is all about logic and observations :slight_smile:

Let us make a graph such that there is an edge from A to B if A is greater than B.
Think about the graph we will get at the end, after asking sufficient questions to determine the second largest element. Now, this will mean that-

  • Maximum element will have a indegree= 0.
  • Second maximum element will have an indegree 1 as maximum will be connected to it. An edge from ONLY maximum element can goto second maximum element.
  • Maximum and Second maximum must be connected. No connection between maximum element and second maximum element means we have not yet queried these elements. Say the nodes above are A and B. This means we still dont know which of the two is greater than other. Since it is impossible to have an edge from any other node to second maximum (see above 2 observations for that) this would mean that currently maximum and second-maximum are two separate connected components. We will need to connect them to tell which is greater by asking a question. An alternate of situation is in the tab below.
  • In view of above 3, any grandchild of maximum and further can be automatically removed from list of potential candidates of “second maximum” element, as they will have edges from elements other than maximum element. When I say maximum here, I mean, all 0-indegree nodes of that component.
Click to view

Say you have two connected components. For sake of easy explanation, assume that there is only one 0-indegree node in these 2 components. Now, you know that the node with in-degree 0 is greatest element of that component, as no edge goes to it. Say the two such nodes (or “roots” of the connected components) are A and B. We know that A is greatest in its component, and we know that B is greatest in its own component. Which of them is greater overall then, A or B? We cannot tell!! We need to ask a question/query/add-an-edge for this.

Please try to grasp the above 4 observations very well. These are the basic fundamental of this question. If you have a doubt here, proceeding further will be difficult. Feel free to ask doubts if above proofs dont help you.

By above, we establish that second-largest element, if determined, must have an edge from maximum element (and can ONLY have an edge to it from maximum element.)

For now, look at question from this point of view-

There are N nodes. We initially have M directed edges between them. Now, of course, there are no guarantees on value of M, which means that graph can be disconnected. The only assurance is that there are no cycles and self loops.

Now, by above 4 observations we can get that our very first job is to “connect” all the 0-indegree nodes by asking questions. (Recall that maximum element has indegree=0 and that second largest element will be a direct son of it.) Also, side-by-side we should aim to connect in such a way that number of direct sons of maximum element (say A) is minimized (as A-1 has to be added to answer at last).

Now, lets add Chef to the picture :slight_smile:

Chef also knows this, and he is free to permute the array after every question. He wants to maximize the number of queries asked. So, what will he do? When we ask him “Is node A greater than node B?", he will see the number of direct sons of A and B. If say A has more sons, he will say "A is greater than B.” This means, we will add an edge from A to B and increase the number of direct sons of A by 1. This way, he will hinder the minimizations of direct sons of maximum- for which we can do nothing.

Also, there is one thing to clarify. When I say “direct sons”, I intrinsically mean "Direct sons who have indegree=1". We have earlier observed that second maximum element cannot have an indegree >1, hence if there is any child of maxima which has indegree of 2 or more, we dont even need to query to remove it from list of potential candidate nodes for answer. To put it formally in terms of code, I will recognize direct son as-

“If ( Indegree of parent==0 AND indegree of this node==1)”

Indegree of parent condition is to verify that we are querying over only direct sons of maximum element (or maximum element/“root” of the connected component).

Now lets we know what Chef will do. Its time to think of what we can do to counter it! Ask yourself this question - “In what order I ask question about maximums of 2 connected components?”

One thing is obvious, in every question, chef will increase number of direct-sons of node A (the node with greater number of direct sons) by 1. We obviously want to minimize the number of direct sons. One good thing is, no matter what is the number of direct sons of B (node with lesser direct sons), only 1 is added to number of direct sons of A (as based on our 4 key observations, we can eliminate all children of B from list of potential candidates). This is good. There is an alternate reduction of this question, if you’re interested.

Click to view

The alternate reduction is, given an array of N elements, you can remove 2 elements and have to add max(a,b)+1 back to array. Minimize the final element left after all operations.

This is actually a well known question, and the optimal strategy followed relies on the fact that “Only 1 is added to direct sons of A no matter how many sons B has. Hence, greedy works. Start from the smallest 2 integers. At every iteration, pick the nodes with least and second-least number of direct sons and query. This way, you can remove 1 node from potential candidate per query, as well as minimize the number of direct sons of A at last.”

(Formal proof will be uploaded on Chef Vijju’s corner later. I first want you guys to give a try. If you succeed in doing so, I will happily grant you karma for your efforts :slight_smile: )

Whats left now is simply to simulate it!

We can implement it easily as well. While taking input, increase the indegree of the respective nodes as well. Then check how many nodes have indegree==0 and are directly connected to a node with indegree==1. This gives us the number of direct sons for each node. Now, make a priority_queue, or heap, and insert the number of direct sons for each connected component (or each node with indegree==0). Now at each iteration, pick the node with least and second-least number of direct sons, remove them from the queue, and add max(A_i,B_i)+1 back to the queue, where A_i is number of direct sons of A and same for B.

In the end, find how many direct sons are left and add K-1 to answer where K is number of direct sons. A commented implementation of this idea is below-

Click to view
for(int i=0;i<m;i++){
			list[i][0]=readIntSp(1,n);
			list[i][1]=readIntLn(1,n);    			
			deg[list[i][1]]++;//Increase degree    			
		}
for(int i=0;i<m;i++){
			if(deg[list[i][0]] == 0 && deg[list[i][1]] ==1){
				num[list[i][0]] ++ ;//Number of direct sons
			}
		}
		for(int i=1;i<=n;i++){
			if(deg[i] ==0 ){
 
				gg.push(-num[i]);//Push number of direct sons for each "root" of connected
			}//component
		}
		int sol=0;//Negative sign taken as its a "max-heap" instead of "min-heap". 
		while(gg.size() > 1){
			int a = -gg.top();gg.pop();//Pick least and second least
			int b= -gg.top();gg.pop();
			gg.push(-(b+1));//Add second least+1
			sol ++;
		}
		sol += -gg.top() - 1;//Add remaining direct sons.
		cout<<sol<<endl;

SOLUTION:

The codes of setter and tester are also pasted in tabs in case the links do not work. Please refer to them if the links dont work :slight_smile:

Setter

Click to view
#include <iostream>
#include <algorithm>
#include <string>
#include <queue>
#include <map>
#include <assert.h>
using namespace std;
 
 
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
 
 
int T;
int n,m;
int sm_n=0,sm_m=0;
int list[300300][9];
 
int deg[300300];
int num[300300];
priority_queue<int> gg;
 
queue<int> chk;
int deg_chk[300300];
vector<int> adj[300300];
 
bool isDag(){
	int cnt=0;
	for(int i=1;i<=n;i++){
		if(deg_chk[i] == 0){
			chk.push(i);
			cnt++;
		}
	}
	while(!chk.empty()){
		int nd=chk.front();
		chk.pop();
		for(int i=0;i<adj[nd].size();i++){
			int ch=adj[nd][i];
			deg_chk[ch] -- ;
			if(deg_chk[ch] == 0){
				chk.push(ch);
				cnt++;
			}
		}
	}
	return cnt == n;
}
int main(){
	//freopen("00.txt","r",stdin);
	//freopen("00o.txt","w",stdout);
	T=readIntLn(1,1000);
	while(T--){
		while(!gg.empty()){
			gg.pop();
		}
		n=readIntSp(2,300000);
		m=readIntLn(0,300000);
		sm_n += n;
		sm_m += m;
		assert(sm_n<=300000);
		assert(sm_m<=300000);
		while(!chk.empty()){
			chk.pop();
		}
		for(int i=1;i<=n;i++){
			deg[i]=0;
			adj[i].clear();
			deg_chk[i]=0;
			num[i]=0;
		}
		for(int i=0;i<m;i++){
			list[i][0]=readIntSp(1,n);
			list[i][10]=readIntLn(1,n);
			adj[list[i][0]].push_back(list[i][11]);
			deg[list[i][12]]++;
			deg_chk[list[i][13]]++;
		}
		assert(isDag());
		for(int i=0;i<m;i++){
			if(deg[list[i][0]] == 0 && deg[list[i][14]] ==1){
				num[list[i][0]] ++ ;
			}
		}
		for(int i=1;i<=n;i++){
			if(deg[i] ==0 ){
 
				gg.push(-num[i]);
			}
		}
		int sol=0;
		while(gg.size() > 1){
			int a = -gg.top();gg.pop();
			int b= -gg.top();gg.pop();
			gg.push(-(b+1));
			sol ++;
		}
		sol += -gg.top() - 1;
		cout<<sol<<endl;
	}
	assert(getchar()==-1);
} 

Tester

Click to view
#include <bits/stdc++.h>
#include <cassert>
 
using namespace std;
 
typedef long long ll;
typedef long double ld;
 
// read template
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
 
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
 
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
 
long long readIntLn(long long l,long long r){
	return readInt(l,r,'\n');
}
 
string readStringLn(int l,int r){
	return readString(l,r,'\n');
}
 
string readStringSp(int l,int r){
	return readString(l,r,' ');
}
// end
 
const int M = 3 * 1e5 + 239;
 
int sn, sm;
bool used[M];
int n, m, in[M], pos[M], a[M], to[M];
vector<int> v[M];
vector<int> topsort;
vector<pair<int, int> > r;
 
void dfs(int p)
{
	used[p] = true;
	for (int i : v[p])
		if (!used[i])
			dfs(i);
	topsort.push_back(p);
}
 
void solve()
{
	n = readIntSp(2, (int)(3e5));
	m = readIntLn(0, (int)(3e5));	
	sn += n;
	sm += m;
	for (int i = 0; i < n; i++) in[i] = 0, used[i] = 0;
	for (int i = 0; i < n; i++) v[i].clear();
	r.clear();
	for (int i = 0; i < m; i++)
	{
		int f = readIntSp(1, n);
		int s = readIntLn(1, n);
		s--, f--;                    
		assert(s != f);
		r.push_back(make_pair(f, s));
		in[s]++;
		v[f].push_back(s);
		to[s] = f;
	}    
	sort(r.begin(), r.end());
	r.resize(unique(r.begin(), r.end()) - r.begin());
	assert((int)r.size() == m);
	topsort.clear();
	for (int i = 0; i < n; i++)
		if (in[i] == 0)
			dfs(i);
	assert((int)topsort.size() == n);
	reverse(topsort.begin(), topsort.end());
	for (int i = 0; i < n; i++) pos[topsort[i]] = i;
	for (pair<int, int> t : r) assert(pos[t.first] < pos[t.second]);
	for (int i = 0; i < n; i++) a[i] = 0;
	for (int i = 0; i < n; i++)
		if (in[i] == 1 && in[to[i]] == 0) 
			a[to[i]]++;
	multiset<int> q;
	for (int i = 0; i < n; i++)
		if (in[i] == 0)
			q.insert(a[i]);
	int ans = (int)q.size() - 1;
	while (q.size() > 1)
	{
		int x = (*q.begin());
		q.erase(q.begin());
		int y = (*q.begin());
		q.erase(q.begin());
		q.insert(max(x, y) + 1);
	}	
	ans += (*q.begin() - 1);
	cout << ans << "\n";
}
 
int main()
{ 
	int T = readIntLn(1, 1000);
	sn = 0;
	sm = 0;
	while (T--) solve();
	assert((sn <= (int)(3e5)));
	assert((sm <= (int)(3e5)));		
	assert(getchar() == -1);	
	return 0;
}  

Time Complexity= O(NLogN)

CHEF VIJJU’S CORNER :smiley:

1. Why cycles are not allowed in this graph-

Click to view

Take example of simplest cycle, A is connected to B and B is connected to A. This would imply that A>B and B< A. Hence an inconsistency.

2. “If maximum has A children, then we need to ask A-1 questions to find second maximum”

Click to view

After maximum, the second largest element should be the largest. This means, it is the largest element among all children of maxmimum. If we ask questions ideally, we can eliminate 1 child each time be comparing as out of 2 children only 1 can be greater. Hence, A-1 questions if there are A children, as we eliminate 1 child per query.

3. Argue on this deduction of this question - “This question can be reduced to - Add edges in such a way that all nodes are accessible from maximum element.” Is this correct?

4. Formal proof of our strategy of minimizing the direct sons of A. Will be given later here, for now try yourself. 25 karma for the one answering it :). Edit- Added it now.

Click to view

Refer to the alternative problem we discussed in editorial. We said that the problem can be reduced to "Given an array, you can do given operation- remove 2 elements a,b and add max(a,b)+1. Minimize the last element left in array."

Obviously, we must minimize the operations done with maximum element involved, else it will increase further more! Also, if we arbitrarily do any 2 elements, its possible that we get another element whose value is equal to maximum added in the array. This forces the answer to be at least CurrentMaximum+1. Eg, take this case- [5,4,3,1]. If we select 4,3 first then another 5 and we can see the final answer becomes 7. But if we use 3,1 we got 4 and thus are able to reduce the answer to 6. Hence, we not only want to not use maximum element, but also minimize the newly added element. This is possible if and only if we use the least and second-least valued element each operation.

By above we discussed why its not optimal to choose any random elements, and gave an intuition on why lowest 2 elements are chosen.

5. Setter’s Notes-

Click to view

First construct the graph (which is dag), now remove every node that has at least 2 indegree.

now only after removing those nodes, each component consist of 1 node of 0-indegree with a number of nodes of 1-indegree, there’s an edge between the 0-indegree node to all 1-indegree nodes, in other words,
each component is like a star shape.

now you have to ask questions, asking a question between two nodes of same component is never optimal (why?) of course unless there’s only 1 component.

asking a question that involves 1-indegree node is not optimal (unless there’s only one component left) (why? )

so as long as we have at least two components we should always ask questions between two 0-indegree nodes. now what happens when we get an answer, suppose two nodes that we asked about are A and B, now if A
is bigger than B, consider component of B it will be removed completely except B itself which will become a 1-indegree in A’s component. so let’s encode each component by the number of 1-indegree nodes. for example if we have sequence 3, 4, 5 it means we have 3 components, first have edges like A->B,A->C,A->D. second has edges like Z->Y, Z->X, Z->W, Z->V and so on.

one question means selecting two numbers from our sequence and delete one of the numbers and add 1 to the other number to return it to the sequence (why?)

Of course Chef will always select to remove the smaller number and add 1 to the bigger (because this is optimal for him)

Now after we have only one component (let its size be X) we clearly know that the 0-indegree in this node is the maximum element, we need to ask more X-1 questions to find out the second max.

So total number of questions is Y-1 + X-1, where Y is the number of components initially (star components), X is the size of last remaining component. Y is constant depending on initial M pieces of
information we have. so Chef’s brother need to minimize X, while Chef need to maximize it.

Now we have a sequence, chef’s brother will select two numbers from it, and chef will remove of them and add 1 to the other number and return it. of course it’s optimal for chef to always add 1 to bigger
number. but what is the order of choosing two numbers from the sequence to make it optimal for chef’s brother? it can be proven that he should always select the lowest two numbers (it’s clear from my implementation)

I felt the no of questions required is always equal to N - M. I don’t follow the graph theory part well.
Why?

M questions already asked. x < 2M nodes covered. But for simplicity, let’s take 2M as covered. I’m sure the x part will get eliminated in the end. Therefore, Left to cover = N - 2M. => questions required = (N - 2M)/2.

To get the second greatest, compare all the greatest nodes => N/2 questions.

total questions accounting for even and odd stuff => (N - 2M)/2 + N/2 = N - M.

I think the manner in which questions are covered is also necessary. Eg-

M questions already asked. 2M nodes covered

Not necessary. Say Chef’s brother initially asked-

1 2
1 3

1 4

and so on. Your assumption is not guaranteed to hold in input.

Take this test case-

Input
1
5 6
1 2
1 3
1 4
1 5
3 4
3 5

You’d print N-M=-1

This was an epic question !

I hope you liked the editorial as well :slight_smile:

@vijju123, I didnt understand the last sentence in this:“Find number of 0 indegree nodes in this graph. We must first connect each of them to determine the maximum element. However, we cannot arbitrarily connect nodes. In the ideal way, we tend to always ask questions/add edge between two 0 degree nodes first, as this removes at least 1 element from candidate elements for second maximum.”

If I select any 2 nodes(A and B) with indegree 0 and ask a question for relation between them, how does it remove 1 element from potential candidates? It can happen that out of them, A can be 2nd max or B can be 2nd max going further.

Never considered the possibility of a graph approach to the problem. Thanks for the in-depth explanation :slight_smile:

1 Like

@sushu-

That is because, Chef will permute his array after every question such that edge goes to the node with more children.

While you’re right, that just by asking one question you cannot conclude anything, think about the end-graph and following-

  • Our first step was finding maximum element. Hence w.r.t one element is removed from candidate for maximum element. Hence K-1 edges sufficed
  • Say A turns out to be the overall maximum element. In that case, A is eliminated from list after connecting all 0 degree nodes. If A is NOT the overall maximum element, then B is eliminated if there is an edge from A to B because B becomes grand child of maximum.

I will make the wording of that statement better, thanks :slight_smile:

@vijju123
So this is my understanding: Basically, you are trying to determine the direct sons of maximal element(say A= no of direct sons) by using greedy approach and then adding A-1 to the answer, right?

Not determine, we are trying to minimize that. If there are N nodes, then by optimum method you can determine within N-1 questions the maximum one among them. Its based on standard problems taught in DSA courses.

@vijju123, Link to the practice problem is wrong.

Fixed, thanks!

1 Like

@vijju123 , If there are no of connected components > 1 then there may be possibility leader of component (leader means max of a connected component) can be 2nd max element instead of direct child of leader of component who is gobally maximum.