PROBLEM LINKS
DIFFICULTY
HARD
PREREQUISITES
String Parsing, Predictive Parsing of an LL(1) Grammar, Regular Expressions
PROBLEM
You are given a string which defines what an ID should look like followed by a list of IDs. You have to report which one of them are valid.
QUICK EXPLANATION
Refer to the problem statement to carefully understand the description of the grammar for parsing the expressions that describe the valid IDs for each test case. Following that description should lead you to the following context free grammar (from the Problem Setterâs notes).
Tokens or : "or" times : "times" optional: "optional" letter : "letter" letters : "letters" digit : "digit" digits : "digits" to : "to" exactly : "exactly" upto : "upto" l : [A-Z] d : [0-9] num : [0-9]{2,} //take care of leading zeroes separately string : [A-Z0-9]+ Grammar S : E S' ; S' : E S' | ; E : T E' ; E' : or E | ; T : F T' ; T' : num T'' | d T'' | optional T' | ; T'' : times T' | ; F : d F1 | l F2 | upto num F3 | upto d F3 | exactly num F4 | exactly d F4 | digit | letter | string | num | ( S ) ; F1 : to d | ; F2 : to l | ; F3 : digits | letters ; F4 : digits | letters ;
Note that â|â followed by nothing indicates the âepsilon productionâ.
The grammar is indeed a LL(1) grammar. This means that you can always figure out which of the above rules to apply after parsing some part of the string, just by looking at the next token.
EXPLANATION
After taking the input you may first want to clean the spaces in the expression
- Add spaces around paranthesis
- Remove repeated spaces
This ensures that all tokens are now separated by single space. Now you can tokenize the string and mark each token to the type of token provided.
The next step is to parse the list of tokens according to the rules of the grammar provided. It is important to note that given a LL(1) grammar, you can parse the array of tokens in linear time. The testerâs solution below uses the Recursive Descent Parsing with the simplification that because the grammar is LL(1), you never have to back track.
The outcome of parsing the expression is creating a meaningful regular expression. As well as, calculating the largest possible ID that can be generated so as to discard the case if it exceeds the bound on the length of the IDs.
The GNU C library has <regex.h> include that lets you use create a regex using the POSIX regular expressions syntax.
Once you have the regular expression compiled, you can execute it on all the IDs to check whether they match or they donât.
Refer to the Setterâs Solution for a clean implementation of the Predictive parsing and learn how the various rules map to ideas in POSIX regular expressions. Each ruleâs corresponding method are named accordingly and the approach is easy to follow.
SETTERâS SOLUTION
Can be found here. (some permission issue on uploaded solution, fixing ASAP)
TESTERâS SOLUTION
Can be found here.