$Id: README,v 1.4 1994/04/05 20:33:58 hays Exp $ Ah, the wisdom of the ages... Introduction ------------ What you are looking at is a basic (very basic) parser for a PL/M language. The parser does nothing useful, and it isn't even a terribly wonderful example. On the other hand, it appears that no one else has bothered to publish even this much, before. However, the parser does recognize a language very like PL/M-86, PL/M-286, or PL/M-386, as best we can determine. All the information used to derive this parser comes from published manuals, sold to the public. No proprietary information, trade secrets, patented information, corporate assets, or skulduggery was used to develop this parser. Neither of the authors has ever seen the source to a working PL/M compiler (or, for that matter, to a non-working PL/M compiler). Implementation Limits --------------------- This PL/M parser was developed and tested on a 486DX2/66 clone PC running Linux. The C code is written for an ANSI-compliant C compiler; GCC was used in our testing. Also, flex and bison were used, not lex and yacc. Paul Vixie's comp.sources.unix implementation of AVL trees was used to implement symbol table lookups. You should expect some problems if you plan on building this parser with a K&R style C compiler. Using yacc and/or lex may be problematic, as well. This parser does not support any of the "dollar" directives of a proper PL/M compiler. In fact, it will croak with the helpful message "parse error". Thus, implementing include files and compiler directives is left as an exercise for the reader. The macro facility (aka "literally" declarations) depends on the lexical analysis skeleton allowing multiple characters of push-back on the input stream. This is a very, very poor assumption, but, with flex, at least, workable for this example. A real PL/M compiler would allow literals of unlimited length. To find the offending code, grep for the string "very weak" in the file "plm-lex.l". No error recovery is implemented in the parser, at all. There are no shift-reduce conflicts, nor reduce-reduce conflicts. There are a couple of places in the parser where similar constructs cannot be distinguished, except by semantic analysis. These are marked by appropriate comments in the parser source file. The "scoped literal table" implementation depends on Paul Vixie's (paul@vix.com) public domain AVL tree code, available as comp.sources.unix Volume 27, Issue 34 (`avl-subs'), at a friendly ftp site near you. We use "gatekeeper.dec.com". The benefits of using AVL trees for a symbol table (versus, say, hashing) are not subject to discussion. We used the avl-subs source code because it is reliable and easy to use. This grammar has been validated against about 10,000 lines of real and artificial PL/M code. PL/M Quirks ----------- PL/M has some very interesting quirks. For example, a value is considered to be "true", for the purposes of an `if' test, if it is odd (low bit set). Thus, the value 0x3 is true, whereas 0x4 is not. The language itself, given a boolean expression, generates the value 0xff for true. [This factoid doesn't affect the parser per se, but does appear to be the main pitfall for those whose hubris leads them to translate PL/M to C.] String constants can contain any ASCII value, excepting a single apostrophe, a newline, or 0x81. The latter, presumably, has something to do with Kanji support. To embed a single apostrophe in a string constant, two apostrophes may be used. Thus, 'k''s' is a string consisting of a letter k, a single apostrophe, and a letter s. Strings are not null terminated, so our example string, 'k''s', requires just three bytes of storage. PL/M supports a macro language, of sorts, that is integrated into the language's declaration syntax: declare Ford literally 'Edsel'; declare Mercury literally 'Ford'; After the above declarations, any instance of the identifier "Ford" will be replaced with the string "Edsel", and any occurrence of the identifier "Mercury" will be replaced by the string "Ford", which will then be replaced by the string "Edsel". The literal string can be more complicated, of course. Only identifiers are subject to substitution - substitution does not occur inside string constants. Literal macros are parameterless, and obey the scoping rules of the language. Thus, it is possible to have different values for the same macro in different, non-nested scopes. [Exercise: Why can't you have different values for literals in nested scopes?] Keywords, of course, cannot be macro names, because they are not allowed as variable names. PL/M allows dollar signs ("$") to be used inside keywords, identifiers, and numerical constants. PL/M is also case insensitive. Thus, the following two identifiers are the "same": my_very_own_variable_02346 m$Y_$$$VeRy_$$O$$$$$W$$$$$$N_varIABLE$$$$$$$$$$_$02$346 Loverly, eh? Obfuscated C, stand to the side. Casting in PL/M (a relatively late addition to the language) is provided by a motley assortment of functions with the same names as the basic types to which they are casting, accepting a single argument of some other (or even the same) type. Note that the EBNF grammar published in what must be considered the definitive work, _PL/M Programmer's Guide_, Intel order number 452161-003, Appendix C, is incorrect in several respects. If you're interested in the differences, we've preserved, as much as is possible, the production names of that EBNF in the YACCable grammar. Some known problems with the published, Appendix C, EBNF grammar: - One of the productions is an orphan, ("scoping_statements"). - unary minus is shown as a prefix operator, and unary plus as a postfix operator ("secondary"). - Casting does not appear in the published grammar. - Nested structures do not appear in the published grammar, and the reference syntax for selecting a nested structure member is also missing. - The WORD type is missing from the "basic_type" production. - The "initialization" production allows the initial value list only after the INITIAL keyword, when, in fact, the initial value list may follow the DATA keyword, as well. On the other hand, the precedence of the expression operators is correct as written in the EBNF grammar, the dangling else problem is non-existent, and there are no associativity problems, as all operators associate left-to-right. To complicate matters, the above referenced manual may be out of print. A more recent version, which covers the PL/M-386 dialect only, is _PL/M-386 Programmer's Guide_, Intel order number 611052-001. The latter manual has some corrections, but has some introduced errors in the EBNF, as well. The problems with the unary minus and the "initialization" production are repaired, but the definition for a "binary_number" is malformed, as are the definitions for the "fractional_part", "string_body_element", "variable_element", and "if_condition" productions. We're right, they're wrong. The Authors ----------- Gary Funck (gary@intrepid.com) was responsible for starting this effort. He authored the original grammar. Kirk Hays (hays@ichips.intel.com) wrote the lexical analyzer and the scoped literal table implementation. He also validated and corrected the grammar, and extended it to cover documented features not appearing in the published EBNF. Future Plans ------------ If there is enough interest (or, even if there isn't), Kirk is planning on producing a PL/M front end for the GNU compiler. Contact him at the above Email address for further information. Donations of PL/M source code of any dialect (including PL/M-80, PL/M-51, and PL/M-96)(yes, we already have the Kermit implementations), or a willingness to be a pre-alpha tester with code you cannot donate, are sufficient grounds to contact Kirk.