RECIPES - EDITORIAL

taran_1407 · November 23, 2018, 11:45am

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2

Setter: Bogdan Ciobanu
Tester: Zhong Ziqian
Editorialist: Taranpreet Singh

DIFFICULTY:

Medium-Hard

PREREQUISITES:

Linear Independence,Rank of a matrix, Gaussian Elimination, Monte-Carlo algorithm and a bit of Randomization.
Probabilistic analysis for proof only.

PROBLEM:

Given a set of Recipes of size N with some recipes being the base recipes while others being the combination of some bases/non-base recipes. A base decomposition of a recipe is the composition in terms of base recipes present in a recipe. In case two recipes having any same base recipe are merged, the recipe same in both, disappears from the merged recipe.

Answer Q queries, each giving K dishes, we need to answer, whether we can select two sets of recipes such that their base decomposition is the same.

SUPER QUICK EXPLANATION

Assign all base recipes some random number in F_{2^{64}} and all non-base recipes as xor-sum of all their constituent recipes.
For every query, for the set of recipes in the query, we represent each recipe in binary form to get a K*64 matrix. If this matrix’s rank is less than K, It means, that at least one of the dish can be represented as a linear combination of other dishes in the query, so we can simply answer yes. Rank of a matrix can be calculated in O(K^2) using Gaussian Elimination.
Due to random values assigned to base recipes, we may encounter a bad case where it might appear as if the matrix’s rank is less than K while it is not. So, It is advisable to run gaussian elimination multiple times, though, for K = 30, it was sufficient to run only once (Unless you are having a really unlucky day).

EXPLANATION

Let us discuss a solution intuitive from problem statement.

Suppose we have X base recipes. If all base recipes are assigned values 2^0, 2^1, \dots 2^{X-1}, we can represent all recipes as binary numbers having exactly X digits, ith bit specifying whether the ith base recipe is in the base decomposition of the current dish.

Using this, our queries become, given a set of Binary numbers, all having X digits, Find out whether we can select two sets of values such that if we take the xor-sum of all values in both sets, we get the same binary number.

One thing we can notice is, that if we merge these two numbers by taking its xor, we obtain 0 as the xor-sum.

So, now query become, to determine whether we can select a non-empty subset from a set of binary numbers having X digits the xor-sum of select elements is 0.

Assuming we consider each binary value as a vector having X values, we get a set of K vectors each of size X out of which we need to select a set of vectors such that xor-sum of all selected vectors is 0 for all corresponding values.

This seems much more familiar than the original statement if we are aware of the concept of Linear Independence of a vector.

A vector is said to be linearly dependent on a set of vectors if this vector can be written as a linear combination of other vectors. Hence, a set of vectors is said to be linearly independent, if we cannot write any vector as a linear combination of other vectors.

Now, For our problem, we have to answer query whether we can select a non-empty set of vectors such that their xor-sum is zero. Suppose we take out one vector from this set. It is not hard to see, that the xor-sum of all other vectors in the set is same as the vector removed.

This implies, that we are required to check if the given set of Linearly dependent or not.

To check this, we can resort to representing this set of vectors in a matrix and use the Row Echelon form of the matrix to find Rank of a matrix, which, is the maximum number of linearly independent vectors in a given set of vectors. If we have the rank of this matrix as K, No vector is linearly dependent on other, implying we cannot select a set of vectors which has xor-sum zero in all dimensions, Hence, the answer is no in this case.

If Rank of this matrix is less than K, we can select at least one vector which is the linear combination of a set of vectors, hence, the answer is yes in this case.

Now, To find the Rank of a matrix, we use Gaussian Elimination.

Procedure for finding the rank of a matrix is simple using row reduction and thus, is a part of the exercise.

But, We cannot even store N binary numbers having X digits each since both X and N can range up to 10^5. Also, Gaussian Elimination would be O(X*K*min(X, K)) per query. We need something faster.

We can use randomization here. Suppose, we assign each base recipe any random number out of F_{2^{64}} and work the same way. This way, when we can solve this problem faster.

But, Using just a random value and 64 bits in place of X bits has some consequences. The consequence is, that though we will report the set of vectors to be dependent when they are actually dependent, we may also end up reporting some linearly independent set of vectors as being linearly dependent. This happens, because of random values assigned to base recipes. I’ll give an example of such a case.

Three Base recipes numbered 1, 2 and 3 assigned random values 5, 17, and 20. In the query, we are given all three recipes. We can see, that 5^17 = 20 which may lead to recipe 3 being reported as the linear combination of recipe 1 and 2, while it is not true. Hence, If we run this process a number of times with different random values assigned to base recipes in each run, the probability of still reporting such false positive is negligible. Actually how much, we can see below.

Probabilistic Analysis

Using the above-mentioned randomized solution, we get a K*64 matrix having each value either 0 or 1. The probability of a matrix being singular is \frac{2^{64}-1}{2^{64}} when K = 1. When K = 2, The matrix can be singular when either all elements of the row are same as the previous row, or are zero, leading to the probability of no singular as \frac{2^{64}-2}{2^{64*2}}. For Higher values of K, we can see, that the matrix can be transformed into a singular matrix when values in the current row can be represented as the xor-sum of a subset of previous K-1 rows. This happens in 2^{K-1} cases. Hence, the probability of having K th row as a combination of xor-sum of a subset of rows, is \frac{2^{K-1}}{2^{64*K}}, leading to non singular cases as \frac{2^{64} - 2^{K-1}}{2^{64*K}}.

We can see, that probability of having matrix singular is the product of probabilities of having the ith row as xor combination of previous rows, leading to the probability of singular matrix as \frac{\prod{1-2^{64-i}}}{2^{64*K}} which is very small for K up to 30. Even a single run of algorithm suffices.

Note on Randomization

The Randomization we used above, comes under False based Monte Carlo algorithms, which always give the return false when the correct answer is false, but may also give true verdict even when the verdict is actually false. More can be read about Monte Carlo algorithms and probability analysis at Wikipedia page, here.

Randomization is really useful and can be effectively employed in problems if you can keep the probability of error below a certain threshold, as required. Try out this Nice Problem Factorize which uses a bit of randomization along with Number Theory, and has a detailed long Editorial.

Gaussian Elimination in O(K^2) (specific to this problem only)

For this particular problem, we have K*64 matrix represented by K values of 64 bits. In Gaussian elimination, when we reduce the contribution of a vector from the remaining rows below the current row, we subtract current row multiplied by a factor to eliminate current dimension by iterating over the whole row.

But here, we can use xor in place of subtraction, because -1 = 1(\bmod 2). Also, rows have either value 0 or 1, we also do not need to multiply the row by any constant value. This way, we can just xor the value of the current row with all subsequent rows which have current column bit standing. Details can be found in solutions.

Time Complexity

Time complexity is O(N+\sum{p}+Q*K^2*C) where C is the number of times we run above algorithm.

AUTHOR’S AND TESTER’S SOLUTIONS:

Setter’s solution
Tester’s solution
Editorialist’s solution

Feel free to Share your approach, If it differs. Suggestions are always welcomed.