There’s an tutorial shortly due to appear at IBM developerWorks that covers the process behind building a calculator using the lex and yacc (or flex and bison) tools to build a parser. The tutorial covers a natural expression parser, i.e. one capable of processing:
(4+5)*6
I suggest a couple of extensions in the tutorial, namely a Reverse Polish Notation (RPN) calculator, and translators that convert to/from RPN and standard equation format. Here, we’re going to start with looking at the RPN calculator. The RPN system is more straightforward for people to learn when you think about typical equations, for example you might write:
4563 +
In RPN, you would enter this as:
45 63 +
From a parsing point of view, the process is also easier, because you can perform the calculation by pushing the numbers on to the stack and then performing a calculation with those two numbers. This hugely simplifies the parser, but it only has to push numbers and pop them off when it sees the operand, rather than having to extract both numbers and parser from the input text. Even better, compound calculations can be made easier because the result of one calculation can be pushed back on to the stack for the next part. For example, the following equation:
45 63 + 23 *
Can be read as:
- Push 45 on to stack
- Push 63 on to stack
- Pop value off stack, add it to another value popped off the stack
- Push result to stack
- Pop value off stack, multiply by value popped off stack
The lexical analysis component (i.e. the lex definitions) remain the same, it’s only the parser that changes. Before we examine the yacc rules, you need to see the simple stack system. It provides two functions, one pushes values on, and the other pops values off. All the values are stored in a simple array and a global stack pointer holds the current storage location so that values can be popped off or pushed back:
#include "globdefs.h"int sp=0;double val[MAXVAL];void push(f)double f;{ if (sp 0) return(val[--sp]); else { printf("error: stack emptyn"); return 0.0; }}
The yacc rules for a simple RPN parser are shown below (the rest of the surrounding code is identical).
%%list: /* nothing */ | list EOLN { printf( "%gn" , pop()); } | list exprlist EOLN { printf( "%gn" , pop()); } ;exprlist: shift_expr | exprlist shift_expr ;shift_expr: add_expr | shift_expr LEFTSHIFT { temp=pop(); push(((int)pop()) > ((int)temp)); } ;add_expr: mul_expr | add_expr PLUS { push(pop()+pop()); } | add_expr MINUS { temp=pop(); push(pop()-temp); } ;mul_expr: unary_expr | add_expr MUL { push(pop()*pop()); } | add_expr DIV { temp=pop(); push(pop()/temp); } | add_expr MOD { temp=pop(); push(fmod(pop(),temp)); } ; unary_expr: primary | MINUS primary %prec UNARYMINUS { push(-pop()); } | unary_expr INC { push(pop()+1); } | unary_expr DEC { push(pop()-1); } ; primary: NUMBER { push($1); } | PI { push(M_PI); } ;%%
You can see here that numbers are simply pushed onto the stack:
primary: NUMBER { push($1); } | PI { push(M_PI); } ;
While any calculation is a case of popping off the values and putting them back on the stack:
add_expr: mul_expr | add_expr PLUS { push(pop()+pop()); } | add_expr MINUS { temp=pop(); push(pop()-temp); } ;
The ruleset is shorter, partially because this RPN calculator is not as advanced, but also because the process is much simpler because the rules don’t need to take into account the complex structure of a typical equation line. In a future post we’ll cover the RPN to equation and equation to RPN parsers. You can download the complete code for the RPN calculator as rpn.tar.gz (Unix).