YACC – PARSER GENERATOR
The utility yacc (Yet Another Compiler Compiler) is an LALR parser (which is essentially an LR parser with one token look-ahead) generator. It parses a stream of token, typically generated by lex, according to a user-specified context free grammar.
First, a file, say translate.y, containing a Yacc specification of the translator is prepared.The UNIX system command yacc translate.y transforms the file translate.y into a C program called y.tab.c using the LALR method. The program y.tab.c is a representation of an LALR parser written in C,along with other C routines that the user may have prepared. By compiling y.tab.c we obtain the desired object program a.out that performs the translation specified by the original Yacc program. If other procedures are needed, they can be compiled or loaded with y.tab.c, just as with any C program.
How to execute ?
Compile lex file using lex filename1.l (to generate lex.yy.c)
Compile yacc file using yacc filename2.y (to generate y.tab.c)
Compile c file using cc y.tab.c (to generate executable file a.out)
Execute the file using ./a.out
YACC Program Specification
… declarations …
%%
… translation rules …
%%
… supporting C routines…
Declarations Part
There are two sections in the declarations part of a Yacc program, both are optional. In the first section, we put ordinary C declarations, delimited by %{ and %}. Here we place declarations of any temporaries used by the translation rules or procedures of the second and third sections. As with lex, all code in this section is copied to the beginning of the resulting C file. If lex is to return tokens that yacc will process, they have to agree on what tokens there are. This can be done by including token declarations in the declarations part in the yacc program as given below.
%token NUM
declares NUM to be a token. Tokens declared in this section can then be used in the second and third parts of the Yacc specification. If Lex is used to create the lexical analyzer that passes token to the Yacc parser, then these token declarations are also made available to the analyzer generated by Lex. Yacc provides a general mechanism for resolving shift reduce conflicts. In the declarations portion, we can also assign precedences and associativities to terminals. The declaration
%left ’+’ ’-’
makes + and – be of the same precedence and be left associative. We can declare an operator to be right associative by writing
%right ’ˆ’
The tokens are given precedences in the order in which they appear in the declarations part, lowest first. Tokens in the same declaration have the same precedence. The sequence of lines indicates increasing operator precedence and the keyword sets the associativity type.
Translation Rules Part
In the second part of the Yacc specification after the first %% pair, we put the translation rules. Each rule consists of a grammar production and the associated semantic action.
A set of productions that we have been writing:
In a Yacc production, unquoted strings of letters and digits not declared to be tokens are taken to be non terminals. Alternative bodies can be separated by a vertical bar, and a semicolon follows each head with its alternatives and their semantic actions. The first head is taken to be the start symbol.
A Yacc semantic action is a sequence of C statements. In a semantic action, the symbol $$ refers to the attribute value associated with the nonterminal of the head, while $i refers to the value associated with the ith grammar symbol (terminal or nonterminal) of the body. The semantic action is performed whenever we reduce by the associated production, so normally the semantic action computes a value for $$ in terms of the $i’s. For example consider the two E-productions
E–>E+E / NUM
and their associated semantic actions as:
E: E ’+’ E {$$ = $1 + $3;}
| NUM {$$ = $1;}
;
The semantic action associated with the first production adds the value of the expr1 and the expr2 of the body and assigns the result as the value for the nonterminal expr of the head.
The Supporting C-Routines Part
The minimal main program is
main()
{
yyparse();
return 0;
}
int yyerror()
{
}
The third part of a Yacc specification consists of supporting C-routines. Code section can be very elaborate, but the main ingredient is the call to yyparse(), the grammatical parse. Sometimes, yacc reports ‘syntax error’ and stops processing. This means that an un-expected symbol is found. In the case of syntax errors yacc calls the function yyerror(). The default implementation of the yyerror() produces a ‘syntax error’ message, but it can be redefined at will.
Lex – Yacc interaction
The lexical analyser created by Lex behaves in concert with the parser as follows. When called by the parser, the lexical analyzer begins reading its remaining input, one character at a time, until it finds the longest prefix of the input that matches one of the patterns Pi. It then executes the associated action Ai. Typically, Ai will return to the parser, but if it does not, then the lexical analyzer proceeds to find additional lexemes, until one of the corresponding actions causes a return to the parser. The lexer now no longer has a main function.
If your lexical analyzer is to be used along with a parser, each lex action should end with a return statement of the form:
return token; or return(token);
Here, token is an integer-valued variable, literal, or macro, or a character enclosed in single quotes. Yacc-generated parsers use yylex() to invoke the lexical analyzer. The lex rules will probably function by calling return every time they have parsed a token. The value returned to the parser indicates what kind of token the lexical analyzer has found. The parser then resumes control and later makes another call to the lexical analyzer (via a call to yylex()) when it needs another token.
The lexical analyzer yylex() produces tokens consisting of a token name and its associated attribute value. If a token name such as NUM is returned, the token name must be declared in the first section of the Yacc specification. The attribute value associated with a token is communicated to the parser through a Yacc defined variable yylval which is a global variable shared by lex and yacc. By default, this is defined as an int. The type of yylval is determined by YYSTYPE. In the case of floating point numbers YYSTYPE must be redefined as follows:
#define YYSTYPE float
The attribute value, whether it be another numeric code, a pointer to the symbol table, or nothing, is placed in a global variable yylval, which is shared between the lexical analyzer and parser, thereby making it simple to return both the name and an attribute value of a token. If the character is a number, the value of the number is stored in the variable yylval, and the token name NUM is returned. Otherwise, the character itself is returned as the token name.
YACC PROGRAMS
1) yacc-program-implement-arithmetic-operators
2) yacc-programs-implement-logical-operators
Really good writing on Yacc and Language Processors!! Thanks for your contribution .. much appreciated..