Currently we are publishing the author’s solution of the problem. Let us discuss about possibly strategies that you used in the problem. Later we will merge them into the editorial to create a more detailed version.
Author’s solution of the problem
There were no special prerequisites that were needed in order to solve this problem as even the simplest strategy (don’t store anything and respond with -1 -1 to any query) was enough to score some points. And this was the starting strategy for a lot of contestants. It is pretty obvious that there are better strategies, so this one will probably get you something close to 0. One small observation: it is better to buy the cheapest HDD before all events happen. This way, you will also have some crashes and those are free. You will get a much better score (still close to zero though) if you buy something first. This might seem counterintuitive and it was a nice thing to discover during testing.
I will describe some of my ideas first.
One basic strategy would be to implement RAID 1 (i.e. keep a (secondary) copy of each (primary) HDD and basically store each piece of information twice). This has a clear disadvantage: you have to buy twice as many HDDs than you need. For this strategy is better to but HDDs from the second category (expensive but with low costs for reading/writing) as you’ll do a lot of copying. One can notice that it is better to copy data to secondary storage as soon as possible, because of a possible crash. Crashes are ‘solved’ by resyncing (copy from primary/secondary to secondary/primary) the two pairing HDDs.
When to buy a new HDD is also important. There is no point in buying them all in the very beginning. You can buy another one as soon as you believe that you won’t have enough storage space for the next event (i.e. you have less than 1000 - or even 500 if you are willing to risk a little).
Other RAID levels could also be implemented but it is much harder and one could not expect RAID 6 to be requested by a task in this kind of contest. You can learn more about RAID levels by reading the Wikipedia page: https://en.wikipedia.org/wiki/Standard_RAID_levels
Another strategy, harder to code but also adaptive, is to decide if you need to store everything twice. If you have a lot of crashes (10% or 15%) it might be a good idea to store data twice. For 1% crashes you might want to pay a penalty (a couple of times) than to buy twice as many HDDs and do all the copying associated with RAID 1.
How can you detect that you have only 1% crashes? By running some simpler strategy (the one above?) on a few hundred events and compute the probabilities for each type of event. This is why there were only 6 types and described in the Test generation section.
I’d like to use this chance to invite a couple of contestants to explain their approaches.
Some (funny) notes about this problem:
- there were 5375 (3450 AC) submissions for this problem (when this editorial was written)
- the judge is (currently) 531 lines long and was easily hacked a couple of times by our tester @kingofnumbers
- the problem originally contained another type of operation, fragmentation (i.e. store only a fraction of a client’s data or store multiple parts on multiple HDDs). This op can be currently simulated by using a high-capacity HDD and was deleted because the judge was getting way too complicated.
- @mugurelionut’s solution is 1546 lines long (I believe it is the longest for this problem)
- We had to increase the number of submissions to 500 because @allllekssssa managed to exhaust the standard set of 200 submissions.