## Writing a parser: Calculator example
at 2008-02-10
in Examples
by friebe
(0 comments)
In many situations, we need to transform user input to machine readable data. Mostly, a regular expression, Using eval is not an option because you're not in control over what is passed - this creates security problems!The grammarThe grammar files you need to write are easy to read and resemble the syntax used in Yacc. Here's a basic skeleton: %{ TopSourcecode included in top is placed above the generated parser's class declaration statement. Here's what we'll start with:%{ TokensThe tokens section contains a list of tokens that consist of more than one character. A character on its own has a token number of its ASCII value and the character as value (e.g. the "+" sign is #43). The lexer is responsible for producing the tokens, so it can decide to return "mod" as one token. In this case, we need to declare this token (the name doesn't matter, but is suggested to be written in all-uppercase). We'll define two tokens by adding a line as follows: %token T_INTEGER T_DOUBLE RulesThis is the interesting part! Here's the general skeleton for a rule: name:This can be read as: "The rule name contains either a rule, a rule1 (followed by the token token1, in which case a Rule1 object containing the token values of rule1 and token1 is created and returned - via $$) or a rule2."Confused? Our first grammarThe first rule needs to be named "start". This is for the parser generator to know where to set the entry point. So we'll start out as follows: start: Read this as follows: "At the start, we expect an expression. An expression may be an integer number (in which case we create - and return - an Integer instance wrapped by a Constant), a double number (same procedure, but use a Double instead), or an expression surrounded by brackets (which will simply return the expression inside, to be used for grouping).Putting it togetherThis command will generate our parser: $ sh ~/devel/xp/trunk/ports/technologies/opt/jay/generate.sh grammar/calculator.jay php5 'math·' > math/Parser.class.phpAfter that, we'll have to create a lexer which will tokenize the input into tokens. For the expression 1 + 2.5, it should return T_INTEGER(1), 43('+') and T_DOUBLE(2.5) (no tokens are returned for whitespace).Here's an abbreviated version: class extends AbstractLexer { More rulesBy now, we can only parse number constants as mathematical expressions, which of course isn't satisfying yet. So we'll extend the grammar to allow for simple mathematical operations: expression: This looks rather intuitive, doesn't it? And it should just work out fine, right? Unfortunately not: When regenerating the parser, we'll see the following appear: phpJay: 16 shift/reduce conflicts. What the hell?The problem is here that this does not completely specify how to handle the following input: a + b + c - is that (a + b) + c or is it a + (b + c)? Of course, it doesn't matter here (it would if we had a + b * c, though!)So we will need to tell the parser generator which way to structure the input. We will use the following: %left '+' '-'...and will see the conflicts have disappeared again Quoting The Lex & Yacc Page: This describes the precedence and associativity of the four arithmetic operators. Plus and minus are left associative, and have lower precedence than star and slash, which are also left associative. Because we also want to support negative numbers (the tokenizer will split "-3" into #45("-") and T_NUMBER(3)) and also want something like "-(4 * 3)" to work, we will add a rule reading:| '-' expression { $$= new math·Multiplication(new math·Constant(new Integer(-1)), $2); } The complete grammar can be seen here.Making it workNow that we have the parser and lexer in place, we will want to try out the functionality. Therefore we will have to create the classes referenced in the grammar. The inheritance tree looks like this: + lang.ObjectBoth Constant and Operation classes implement the interface math.Expression (empty for the moment, we will need it later on!)The Constant class is simply a wrapper around a lang.types.Number:class extends Object implements { The Operation class is an abstract base class for the concrete operation implementations:abstract class extends Object implements { The concrete implementation for addition, subtraction, multiplication and divisions are empty subclasses of the above class: class extends { As the parser's parse() method returns whatever is returned from the start rule, we can use the following to see the parsing results:echo xp::stringOf(create(new ())->parse(new ($input))); Here's what will be output: - For the input string
`"1"`, we will see`math.Constant(lang.types.Integer(1))` - For the input string
`"1.5"`, we will see`math.Constant(lang.types.Double(1.5))` - Same applies if we surround input by brackets as in
`"(1)"` - Given an expression
`"3 * 5"`, we will read`math.Multiplication(math.Constant(lang.types.Integer(3)), math.Constant(lang.types.Integer(5)))` - If we do not pass anything, we'll receive a syntax error reading "Unexpected end-of-file" - our grammar does not allow for empty expressions!
Making it usefulWe have now succeeded in parsing expressions from the input source. Of course this is no real help just to see the object created but we'd like to see some results. For this purpose, we'll add an evaluate() method to the Expression interface as follows:interface {...and then add functionality in the implementing classes. The Constant class will simply return the contained number:class extends Object implements { The Operation class will unbox the left- and righthand sides' evaluation results to primitives, pass them to a perform method implemented by the concrete implementations and box the value back to a lang.types.Number.abstract class extends Object implements { This way, the code in the Addition class is quite simple (same goes for all the other operations):class extends { Command lineFinally, we'll create a command line class to provide a calculator utility: uses('util.cmd.Command', 'math.Lexer', 'math.Parser'); This can now be called from the command line $ xpcli cmd.Calculator [expression] and will output its evaluation's result.Of course, this is a very simple example for a very simple situation, but has hopefully given you a head start at using the parser generator distributed with the XP framework and maybe it's sparked some ideas on how to integrate it in your application The complete sourcecode can be found in the experiments directory | ## SubscribeYou can subscribe to the XP framework's news by using RSS syndication. ## CategoriesNewsGeneral PHP5 Announcements RFCs Further reading ExamplesEditorial EASC Experiments Unittests Databases 5.8-SERIES Unicode Language 5.9-SERIES 6.0-SERIES ## RelatedFind related articles by a search for «Writing». |