This directory contains awkpretty, a prettyprinter for the awk programming language. It is based on minor modifications of Brian Kernighan's awk code in ../19990305 (== /u/sy/beebe/src/bwkawk/19990305), following the observation made during a study of the code that awk operates by first calling yyparse() to build a parse tree in memory, then calls run() to execute that parse tree. yyparse() in turn calls yylex() to get lexical tokens. The idea is that by wrapping yylex() inside another function, wwlex(), we can arrange to output the lexical token stream while grammar conformance is being checked by yyparse(), and then we can just skip the normal run() step by providing a dummy version of that function. Here are the changes made: * completely new Makefile * two new token types (COMMENT and WHITESPACE) added to awkgram.y * modification of the two lex rules for COMMENT and WHITESPACE in lex.c to actually return those tokens, instead of discarding them * new function, wwlex() in wwlex.c, wrapping yylex(), allowing the output of a lexical token before returning to yyparse() * dummy function, wwrun(), in wwlex.c, so that main.c needs no modifications * compilation of main.c with -Drun=wwrun, so that the run() step is dummied out * compilation of ytab.c (the yacc output from awkgram.y) with -Dyylex=wwlex, so that yyparse() calls the wwlex() wrapper, instead of yylex(). It sees exactly the same token stream in either case. * compilation with -Dtrue=ctrue -Dfalse=cfalse so that C++ compilers can be used * (char*) typecasts added calls to malloc() and realloc() in run.c to allow C++ compilation * three () argument lists changed to (void) in lex.c to allow C++ compilation The lexical token stream, similar to that produced by bibclean and biblex, is piped into a completely separate, and relatively simple, prettyprinter, written in awk itself for compactness and ease of modification. This approach guarantees that the prettyprinter sees exactly the tokens that awk sees and that the token sequence will have been verified to conform to the awk grammar. Best of all, it requires only the addition of two lines to awkgram.y and about 50 lines to lex.c, modification of about 10 lines in awklex.l, and a 140-line file, wwlex.c: under 200 new lines of C, lex, and yacc code. This is a huge bonus compared to writing almost 8,500 lines from scratch: % cat ../19990305/*.[chly] | wc -l 8463 Here is a sample of the lexer output: ./awklex 'BEGIN {print "hello, world"}' /dev/null # line 1 "/dev/stdin" 261 XBEGIN BEGIN 337 WHITESPACE 123 token 123 { 319 PRINT print 337 WHITESPACE 334 STRING "hello, world" 59 token 59 } 125 token 125 } 0 token 0 } Fields are tab-separated, but only the first two tabs on a line are significant; all others are just data. Since awkpretty has relatively complex logic, and the awk language has some `dark corners', it is possible that a bug in awkpretty could result in a change in the meaning of its input program. One simple such case would be the incorrect introduction of a non-backslashed line break at the space in the program "pattern {action}". To increase confidence in awkpretty, extensive regression testing has been carried out, using a test body of about 500,000 lines of real awk programs in the large file systems from several major UNIX vendors at the author's site. The regression tests compare the token stream produced by awklex on the original programs with that from the prettyprinted programs: there should be no differences, except in line number directives and horizontal and vertical spacing. The only such tests found to fail were those where there were syntax errors in the original programs. The validation suite included with the awkpretty distribution, and run by "make check" tests the formatting of more than 3300 lines in sample files that have been devised to attempt to expose problems for prettyprinting, and to exhibit uses of all possible language constructs. The results are compared against results obtained at the author's site, and believed to be correct. There is another make target intended primarily for the developer: "make maintainer-check" runs a regression test of the type described above, using all of the check*in files used by "make check", plus all of the awk programs included in the distribution (another 1200+ lines of code).