HP UX Archive Centre

$Id: README,v 1.4 1994/04/05 20:33:58 hays Exp $

Ah, the wisdom of the ages...

Introduction
------------

What you are looking at is a basic (very basic) parser for a PL/M
language.

The parser does nothing useful, and it isn't even a terribly wonderful
example.  On the other hand, it appears that no one else has bothered
to publish even this much, before.

However, the parser does recognize a language very like PL/M-86,
PL/M-286, or PL/M-386, as best we can determine.

All the information used to derive this parser comes from published
manuals, sold to the public.  No proprietary information, trade
secrets, patented information, corporate assets, or skulduggery was
used to develop this parser.  Neither of the authors has ever seen the
source to a working PL/M compiler (or, for that matter, to a
non-working PL/M compiler).

Implementation Limits
---------------------

This PL/M parser was developed and tested on a 486DX2/66 clone PC
running Linux.  The C code is written for an ANSI-compliant C
compiler; GCC was used in our testing.  Also, flex and bison were
used, not lex and yacc.  Paul Vixie's comp.sources.unix implementation
of AVL trees was used to implement symbol table lookups.

You should expect some problems if you plan on building this parser
with a K&R style C compiler.  Using yacc and/or lex may be
problematic, as well.

This parser does not support any of the "dollar" directives of a
proper PL/M compiler.  In fact, it will croak with the helpful message
"parse error".  Thus, implementing include files and compiler
directives is left as an exercise for the reader.

The macro facility (aka "literally" declarations) depends on the
lexical analysis skeleton allowing multiple characters of push-back on
the input stream.  This is a very, very poor assumption, but, with
flex, at least, workable for this example.  A real PL/M compiler would
allow literals of unlimited length.  To find the offending code, grep
for the string "very weak" in the file "plm-lex.l".

No error recovery is implemented in the parser, at all.

There are no shift-reduce conflicts, nor reduce-reduce conflicts.

There are a couple of places in the parser where similar constructs
cannot be distinguished, except by semantic analysis.  These are
marked by appropriate comments in the parser source file.

The "scoped literal table" implementation depends on Paul Vixie's
(paul@vix.com) public domain AVL tree code, available as
comp.sources.unix Volume 27, Issue 34 (`avl-subs'), at a friendly ftp
site near you.  We use "gatekeeper.dec.com".  The benefits of using
AVL trees for a symbol table (versus, say, hashing) are not subject to
discussion.  We used the avl-subs source code because it is reliable
and easy to use.

This grammar has been validated against about 10,000 lines of real and
artificial PL/M code.

PL/M Quirks
-----------

PL/M has some very interesting quirks.  For example, a value is
considered to be "true", for the purposes of an `if' test, if it is
odd (low bit set).  Thus, the value 0x3 is true, whereas 0x4 is not.
The language itself, given a boolean expression, generates the value
0xff for true.  [This factoid doesn't affect the parser per se, but
does appear to be the main pitfall for those whose hubris leads them
to translate PL/M to C.]

String constants can contain any ASCII value, excepting a single
apostrophe, a newline, or 0x81.  The latter, presumably, has something
to do with Kanji support.

To embed a single apostrophe in a string constant, two apostrophes may
be used.  Thus, 'k''s' is a string consisting of a letter k, a single
apostrophe, and a letter s.  Strings are not null terminated, so our
example string, 'k''s', requires just three bytes of storage.

PL/M supports a macro language, of sorts, that is integrated into the
language's declaration syntax:

	declare Ford literally 'Edsel';
	declare Mercury literally 'Ford';

After the above declarations, any instance of the identifier "Ford"
will be replaced with the string "Edsel", and any occurrence of the
identifier "Mercury" will be replaced by the string "Ford", which will
then be replaced by the string "Edsel".  The literal string can be
more complicated, of course.  Only identifiers are subject to
substitution - substitution does not occur inside string constants.

Literal macros are parameterless, and obey the scoping rules of the
language.  Thus, it is possible to have different values for the same
macro in different, non-nested scopes.  [Exercise: Why can't you have
different values for literals in nested scopes?]

Keywords, of course, cannot be macro names, because they are not
allowed as variable names.

PL/M allows dollar signs ("$") to be used inside keywords,
identifiers, and numerical constants.  PL/M is also case insensitive.
Thus, the following two identifiers are the "same":

my_very_own_variable_02346

m$Y_$$$VeRy_$$O$$$$$W$$$$$$N_varIABLE$$$$$$$$$$_$02$346

Loverly, eh?  Obfuscated C, stand to the side.

Casting in PL/M (a relatively late addition to the language) is
provided by a motley assortment of functions with the same names as
the basic types to which they are casting, accepting a single argument
of some other (or even the same) type.

Note that the EBNF grammar published in what must be considered the
definitive work, _PL/M Programmer's Guide_, Intel order number
452161-003, Appendix C, is incorrect in several respects.  If you're
interested in the differences, we've preserved, as much as is
possible, the production names of that EBNF in the YACCable grammar.

Some known problems with the published, Appendix C, EBNF grammar:

     - One of the productions is an orphan, ("scoping_statements").

     - unary minus is shown as a prefix operator, and unary plus as a
       postfix operator ("secondary").
 
     - Casting does not appear in the published grammar.

     - Nested structures do not appear in the published grammar, and
       the reference syntax for selecting a nested structure member
       is also missing.

     - The WORD type is missing from the "basic_type" production.

     - The "initialization" production allows the initial value list
       only after the INITIAL keyword, when, in fact, the initial value 
       list may follow the DATA keyword, as well.

On the other hand, the precedence of the expression operators is
correct as written in the EBNF grammar, the dangling else problem is
non-existent, and there are no associativity problems, as all
operators associate left-to-right.

To complicate matters, the above referenced manual may be out of
print.  A more recent version, which covers the PL/M-386 dialect only,
is _PL/M-386 Programmer's Guide_, Intel order number 611052-001.

The latter manual has some corrections, but has some introduced errors
in the EBNF, as well.  The problems with the unary minus and the
"initialization" production are repaired, but the definition for a
"binary_number" is malformed, as are the definitions for the
"fractional_part", "string_body_element", "variable_element", and
"if_condition" productions.

We're right, they're wrong.

The Authors
-----------

Gary Funck (gary@intrepid.com) was responsible for starting this
effort.  He authored the original grammar.

Kirk Hays (hays@ichips.intel.com) wrote the lexical analyzer and the
scoped literal table implementation.  He also validated and corrected
the grammar, and extended it to cover documented features not
appearing in the published EBNF.

Future Plans
------------

If there is enough interest (or, even if there isn't), Kirk is
planning on producing a PL/M front end for the GNU compiler.  Contact
him at the above Email address for further information.  Donations of
PL/M source code of any dialect (including PL/M-80, PL/M-51, and
PL/M-96)(yes, we already have the Kermit implementations), or a
willingness to be a pre-alpha tester with code you cannot donate, are
sufficient grounds to contact Kirk.