This section describes the implementation of hoc1, a program that provides about the same capabilities as a minimal pocket calculator, and is substantially less portable.
GrammarsEven since Backus-Naur Form was developed for Algol, languages have been described by formal grammars. The grammar for hoc1 is small and simple in its abstract representation:
- list: expr \n
-
list expr \n
-
expr: NUMBER
-
expr + expr
-
expr - expr
-
expr * expr
-
expr / expr
-
( expr )
In other words, a list is a sequence of expressions, each followed by a newline. An expression is a number, or a pair of expressions joined by an operator, or a parenthesized expression.
Overview of yacc yacc is a parser generator, that is, a program for converting a grammatical specification of a language like the one above into a parser that will parse statement in the language. yacc provides a way to associate meanings with the components of the grammar in such a way that as the parsing takes place, the meaning can be "evaluated" as well. The stages in using yacc are the following.
- First, a grammar is written, like the one above, but more precise. This specifies the syntax of the language. yacc can be used at this stage to warn of errors and ambiguities in the grammar.
- Second, each rule or production of the grammar can be augmented with an action---a statement of what to do when an instance of that grammatical form is found in a program being parsed. The "What to do" part is written in C, with conventions for connecting the grammar to the C code. This defines the semantics of the language.
- Third, a lexical analyzer is needed, which will read the input being parsed and break it up into meaningful chunks for the parser. A NUMBER is an example of a lexical chunk that is several characters long; sigle-character operators like + and * are also chunks. A luxical chunk is traditionally called a token.
- Finally, a controlling routine is needed, to call the parser that yacc built.
yacc processes the grammar and the semantic actions into a parsing function, named yyparse, and writes it out as a file of C code. If yacc finds no errors, the parser, the lexical analyzer, and the control routine can be compiled, perhaps linked with other C routines, and executed. The operation of this program is to call repeatedly upon the lexical analyzer for tokens, recognize the grammatical (syntactic) structure in the input, and perform the semantic actions as each grammatical rule is recognized. The entry to the lexical analyzer must be named yylex, since that is the function that yyparse calls each time it wants another token. (All names used by yacc start with y.)
To be somewhat more precise, the input to yacc takes this form:
- %{
-
C statements like #include, declarations, etc. This section is optional.
-
%}
-
yacc declarations: lexical tokens, grammar variables, precedence and associativity information
-
%%
-
grammar rules and actions
-
%%
-
more C statements (optional):
-
main() {...; yyparse(); ...}
-
yylex() {...}
-
...
This is processed by yacc and the result written into a file called y.tab.c, whose layout is like this:
- C statements from between %{ and %}, if any C statements from after second %%, if any:
-
main() {...; yyparse(); ...}
-
yylex() {...}
-
...
-
yyparse() {parser, which calls yylex() }
It is typical of the UNIX approach that yacc produces C instead of a compiled object (.o) file. This is the most flexible arrangement---the generated code is portable and amenable to other processing whenever someone has a good idea.
yacc itself is a powerful tool. yacc-generated parsers are small, efficient, and correct; many nasty parsing problems are taken care of automatically. Language-recognizing programs are easy to build, and can be modified repeatedly as the language definition evolves.
Stage 1 programThe source code for hoc1 consists of a grammar with actions, a lexical routine yylex, and a main, all in one file hoc.y.
- %{
-
#define YYSTYPE double /* data type of yacc stack */
-
%}
-
%token NUMBER
-
%left '+' '-' /* left associative, same precedence */
-
%left '*' '/' /* left assoc., higher precedence */
-
%%
-
list: /* nothing */
-
| list '\n'
-
| list expr '\n' {printf("\t%.8g\n", $2);}
-
;
-
expr: NUMBER {$$ = $1; }
-
| expr '+' expr {$$ = $1 + $3;}
-
| expr '-' expr {$$ = $1 - $3;}
-
| expr '*' expr {$$ = $1 * $3;}
-
| expr '/' expr {$$ = $1 / $3;}
-
| '(' expr ')' { $$ = $2;}
-
;
-
%%
-
/* end of grammar */
-
-
#include <stdio.h>
-
#include <ctype.h>
-
char *progname; /* for error messages */
-
int lineno = 1;
-
-
main(int argc, char* argv[])
-
{
-
progname = argv[0];
-
yyparse();
-
}
-
-
yylex()
-
{
-
int c;
-
while ((c=getchar()) == ' ' || c == '\t')
-
;
-
if (c == EOF)
-
return 0;
-
if (c == '.' || isdigit(c))
-
{
-
ungetc(c, stdin);
-
scanf("%lf", &yylval);
-
return NUMBER;
-
}
-
if (c == '\n')
-
lineno++;
-
return c;
-
}
-
-
yyerror(char *s)
-
{
-
warning (s, (char *)0);
-
}
-
-
warning (char *s, char *t)
-
{
-
fprintf(stderr, "%s: %s", progname, s);
-
if (t)
-
fprintf(stderr, "%s", t);
-
fprintf(stderr, " near line %d\n", lineno);
-
}
Compilation of a yacc program is a two-step process:
- $ yacc hoc.y
-
$ cc y.tab.c -o hoc1
-
$./hoc1
-
2/3
-
0.6666667
-
-3-4
-
hoc1: syntax error near line 1
-
$
A digression on make
The program make reads a specification of how the components of a program depend on each other, and how to process them to create an up-to-date version of the program. It checks the times at which the various components were last modified, figures out the minimum amount of recompilation that has to be done to make a consistent new version ,then runs the processes. make also understands the intricacies of multi-step processes like yacc, so these tasks can be put into a make specification without spelling out the individual steps.
make is most useful when the program being created is large enough to be spread over several source files, but it's handy even for something as small as hoc1.
- $ cat makefile
-
hoc1: hoc.o
-
cc hoc.o -o hoc1
-
$
阅读(526) | 评论(0) | 转发(0) |