The first thing you need to understand is the question itself. I found its language a bit weird and that it interchanged a couple of terms, so I’ll restate the question in purely technical terms (and hopefully not make a mess of it).
Given a tree with n nodes and weighted edges rooted at node 1, find the maximum weight of a sub-tree rooted at 1 with exactly q edges.
In other words, what they are calling branches are just edges.
While performing dp on trees our state usually looks like dp(i,something) is the best we can do for the sub-tree rooted at i such that something is true.
So, lets go with the most natural state we can think of, which is:
let dp(i,j) be the maximum weight of a sub-tree rooted at i with exactly j edges.
So, dp(i,0) will be 0 for all nodes since, a sub-tree with 0 branches will have 0 weight. For the leaves of the tree only dp(i,0) will exist since they don’t have the option of having any more branches.
The final result I want is dp(1,q).
Now, lets take some arbitrary node v with children w1, w2 …wm. (In this question there are always 2 children, but its better to understand the more general technique for m children)

Now, if we are trying to find dp(v,3) i.e. the best sub-tree rooted at v with 3 edges then we have the following choices:
- Sub-tree rooted at v, takes edge 1, and then best possible sub-tree with 2 edges rooted at w1
- Sub-tree rooted at v, takes edge 2, and then best possible sub-tree with 2 edges rooted at w2
- Sub-tree rooted at v, takes edges 1&2, and then one edge from either sub-tree rooted at w1 or w2
Note that when I said best possible sub-tree with 2 edges rooted at w1 that essentially means dp(w1,2).
So our dp is starting to take shape in that it now depends on smaller sub-problems, but it is still too tedious to actually compute all these cases.
So we decide to process children in order and define a new dp, say, dp_v s.t. dp_v(i,j) is the best possible sub-tree rooted at v with j edges considering only the first i children of v.
First we process w1:
Every result after processing w1 will go into dp_v(1,j) since, we have just considered w1 i.e. the first child, so far. Since we are only considering w1 right now, the only way for v to have j edges in its subtree is to take edge 1 and then take (j-1) edges from sub-tree at w1 in best possible way.
Therefore, dp_v(1,j)=weight_of(edge 1) + dp(w1,j-1)
Now, we process w2:
Every result we get now will be put into dp_v(2,j) since we will consider first 2 children (w1 and w2). So if we are calculating dp_v(2,j) then we have the following options:
- Don’t take edge 2 at all, and take all j edges from previous children only
- Take edge 2 and some k more edges from sub-tree at w2 (total k+1 edges) and take the remaining j-(k+1) edges from previous children. Since the weight of optimally chosen k edges from sub-tree of w2 is nothing but dp(w2,k), we get the following relation:
dp_v(2,j)=max(dp_v(1,j) , dp_v(1,j-(k+1)) + weight_of(edge 2) + dp(w2,k)) for all k
After we process all m (in this question m=2 always) children of v we will then have dp_v(m,j) which is the best sub-tree rooted at v with j edges using its first m children, but since v has only m children, dp_v(m,j) is nothing but best sub-tree rooted at v with j edges considering all children, in other words dp_v(m,j) is the best sub-tree rooted at v with j children which is nothing but dp(v,j).
I’ll attach my code soon using the same terminology. Hope this helped.
edit:
my solution
I have not been very efficient in my solution as the aim was to illustrate dp on trees. One of the very easy ways to optimize this is to see that we do not need to declare a separate n x n dp_v for each node. Instead we can get away with a global 1-D array of size n.