YACC  –  PARSER  GENERATOR

The utility yacc (Yet Another Compiler Compiler) is an LALR parser (which is essentially an LR parser with one token look-ahead) generator. It parses a stream of token, typically generated by lex, according to a user-specified context free grammar.

First,  a  file,  say  translate.y,  containing  a  Yacc specification  of  the translator  is  prepared.The  UNIX  system  command yacc translate.y transforms  the  file  translate.y  into  a  C  program  called  y.tab.c  using  the LALR  method. The  program  y.tab.c  is  a  representation of  an LALR  parser  written in C,along  with other C  routines  that the user  may  have prepared. By compiling  y.tab.c  we obtain  the  desired object program  a.out  that  performs  the  translation  specified  by  the  original Yacc program. If other procedures are needed, they can be compiled or loaded with y.tab.c, just as with any C program.

yacc

How to execute ?

Compile lex file using lex filename1.l (to generate lex.yy.c)

Compile yacc file using yacc filename2.y (to generate y.tab.c)

Compile c file using cc y.tab.c (to generate executable file a.out)

Execute the file using ./a.out

YACC Program Specification

… declarations …

%%

… translation rules …

%%

… supporting C routines…

Declarations Part

There are two sections in the declarations part of a Yacc program, both are optional.  In the first section, we put ordinary C declarations, delimited by %{ and %}. Here we place declarations of any temporaries used by the translation rules or procedures of the second and third sections.  As with lex, all code in this section is copied to the beginning of the resulting C file. If lex is to return tokens that yacc will process, they have to agree on what tokens there are. This can be done by including token declarations in the declarations part in the yacc program as given below.

%token NUM

declares NUM  to be a  token.  Tokens declared  in  this  section  can  then be used in the  second and third  parts  of  the  Yacc specification.  If  Lex  is  used to create  the  lexical analyzer  that  passes token to the  Yacc parser, then these token declarations  are  also  made available  to the  analyzer  generated by Lex. Yacc provides a general mechanism for resolving shift reduce conflicts.  In the declarations portion, we can also assign precedences and associativities to terminals. The declaration

%left ’+’ ’-’

makes +  and  –  be of  the  same precedence and  be left  associative.  We can declare an operator to be right associative by writing

%right ’ˆ’

The tokens are given precedences in the order in which they appear in the declarations part, lowest first.  Tokens in the same declaration have the same precedence. The sequence of lines indicates increasing operator precedence and the keyword sets the associativity type.

Translation Rules Part

In the second part of the Yacc specification after the first %% pair, we put the translation rules.  Each rule consists of a grammar production and the associated semantic action.

A set of productions that we have been writing:

translation rule

In a  Yacc production, unquoted  strings of  letters  and  digits not declared  to be  tokens are  taken  to be non terminals.  Alternative  bodies  can  be separated  by a  vertical bar,  and  a  semicolon follows each head  with  its alternatives and  their semantic actions.  The first head is taken to be the start symbol.

A  Yacc semantic action is a sequence of C statements.  In a semantic  action, the  symbol $$ refers  to the  attribute  value associated  with the  nonterminal of the  head, while $i  refers  to the  value associated  with  the  ith  grammar symbol (terminal or nonterminal)  of  the  body.  The  semantic action  is  performed whenever  we reduce  by the  associated  production,  so normally the  semantic action computes a  value for  $$ in terms  of  the  $i’s.  For example consider the two E-productions

E–>E+E / NUM

and their associated semantic actions as:

E: E ’+’ E            {$$ = $1 + $3;}

| NUM              {$$ = $1;}

;

The  semantic action  associated  with the  first  production  adds the  value of  the  expr1 and  the  expr2 of  the  body and assigns the  result  as  the  value for  the  nonterminal expr of  the  head.

The Supporting C-Routines Part

The minimal main program is

main()

{

yyparse();

return 0;

}

int yyerror()

{

}

The third part of a Yacc specification consists of supporting C-routines.  Code section can be very elaborate, but the main ingredient is the call to yyparse(), the grammatical parse. Sometimes, yacc reports ‘syntax error’ and stops processing. This means that an un-expected symbol is found. In the case of syntax errors yacc calls the function yyerror(). The default implementation of the yyerror() produces a ‘syntax error’ message, but it can be redefined at will.

Lex Yacc interaction

The lexical analyser created by Lex behaves in concert with the parser as follows.  When called  by the  parser, the  lexical  analyzer  begins  reading  its remaining  input,  one character  at a  time,  until it  finds the  longest prefix  of  the input  that  matches one  of  the patterns Pi.  It then executes the associated action Ai. Typically, Ai will  return  to the  parser, but  if  it  does not, then the  lexical analyzer  proceeds  to find additional  lexemes,  until one  of the  corresponding  actions causes  a  return  to the  parser. The lexer now no longer has a main function.

If your lexical analyzer is to be used along with a parser, each lex action should end with a return statement of the form:

return token;                    or               return(token);

Here, token is an integer-valued variable, literal, or macro, or a character enclosed in single quotes. Yacc-generated parsers use yylex() to invoke the lexical analyzer. The lex rules will probably function by calling return every time they have parsed a token. The value returned to the parser indicates what kind of token the lexical analyzer has found. The parser then resumes control and later makes another call to the lexical analyzer (via a call to yylex()) when it needs another token.

The  lexical analyzer  yylex()  produces  tokens  consisting of  a  token  name and  its  associated  attribute  value.  If  a  token  name  such as NUM is  returned, the  token  name  must  be  declared in the  first  section of  the  Yacc specification. The  attribute value  associated  with  a  token  is  communicated to the  parser through  a  Yacc defined variable  yylval which is a global variable shared by lex and yacc. By default, this is defined as an int. The type of yylval is determined by YYSTYPE. In the case of floating point numbers YYSTYPE must be redefined as follows:

#define YYSTYPE float

The attribute  value,  whether it be another  numeric  code,  a  pointer  to the  symbol table,  or nothing,  is placed in a  global variable yylval, which is shared  between  the  lexical analyzer  and parser, thereby making it  simple  to return both  the  name and  an attribute  value of  a token. If  the  character is  a number, the value of  the  number is  stored in  the  variable  yylval,  and  the  token  name NUM is  returned. Otherwise, the character itself is returned as the token name.

YACC  PROGRAMS

1) yacc-program-implement-arithmetic-operators

2) yacc-programs-implement-logical-operators

 

 

 

 

 

 

1 thought on “YACC”

Leave a Reply

Your email address will not be published. Required fields are marked *