Problem link:
PREREQUISITES:
Dynamic Programming
PROBLEM:
We have a Project containing some files present over various directories on our system. We need to upload(push) the Project to Git. For each file it is given whether it must be present on the cloud or not. For this purpose we first put the files on a stage. The files are the finally pushed from stage to the git. To acheive our aim we are allowed to modify the stage by 2 commands:
-
git push path: This copies files specified by path from our system to the stage.
-
git remove path: This deletes the files specified by path from the stage. Note that there is no deletion on our system.
If this path is a directory, the command is applied recursively to the contents of this directory.
Note that no modification is done to files on our computer system.
EXPLANATION:
As we are given a valid file directory system, we can assume that all the files and directories can be arranged in a tree. Any directory/file is represented a node of the tree. Node C is the child of Node P if C is a directory/file present in directory P. All the files are represented as unique leaves of this tree.
For each file (∴ all leaves) we are given whether this file should be present (STAGE) or absent(UNSTAGE) in the final stage of the project.
Formally in this problem we have been given a tree. We have to modify the tree that represents the file system. For each leaf of the tree we have been given whether it must be present in the final configuration or not. We are allowed to do 2 modifications.
-Delete the subtree rooted at v from the current configuration.
-Add to the current tree the whole subtree rooted at v ( in the original tree).
We need to answer the minimum number of commands required to transform the tree to the final desired configuration.
Let us define dp[v][0] and dp[v][1] to achieve the optimal result.
dp[v][0] = minimum number of commands required to achieve the desired state of all the nodes present in the subtree rooted at v assuming that no node from the subtree is present on the stage.
dp[v][1] = minimum number of commands required to achieve the desired state of all the nodes present in the subtree rooted at v assuming that every node from the subtree is present on the stage.
Let ci’s be children of node v.
Let sum0 denote the sum of dp[ci][0]
Let sum1 denote the sum of dp[ci][1]
While calculating dp[v][1] state of node v, assume that the whole subtree rooted at v is present on stage. This could have happened by git push “some ancestor of v”.
Now we have two options
-perform no command for v and deal with the children : As the whole subtree rooted at v is present on stage, we just need to add the corresponding cost of children of v. => Cost = sum1
-perform an unstage of node v: This removes all the nodes present in the subtree of v from the stage. Since none of the children are now present on stage, cost = sum0 + 1. 1 is added as we used a remove(unstage) command here.
dp[v][1] = min(sum1, 1 + sum0)
While calculating dp[v][0] state of node v, assume that the none of the nodes present in subtree rooted at v is present on stage. This could have happened by git remove “some ancestor of v” or no command was used on any of the ancestors of v.
Now we have two options
-perform no command for v and deal with the children : As there is no node present in subtree rooted at v on stage, we just need to add the corresponding cost of children of v. => Cost = sum0
-perform an stage of node v: This adds all the nodes present in the subtree of v to the stage. Since now the whole subtree is present on stage, cost = sum1 + 1. 1 is added as we used a remove(unstage) command here.
dp[v][0] = min(sum0, 1 + sum1)
For base cases when v is leaf:
dp[v][0] = 1 if v needs to be present on final stage, otherwise 0.
dp[v][1] = 1 if v should not be present on final stage, otherwise 0.
Contrsuction of the tree can be done in linear time. For a given path, traverse the corresponding path in the tree. Create nodes when a node corresponding to a file/directory does not exist. Need to take special care when two files/diectiories share the same name but does ot share the same parent directory.
Time Complexity O(N + L) per test case