A parser generator for extensible grammars

PPG is a parser generator for extensible grammars, based on the CUP parser generator. It provides the ability to extend an existing base language grammar written in CUP or PPG with localized, easily maintained changes.

Distribution of PPG

PPG was designed and written by Michael Brukman, Andrew C. Myers, and Chinawat Isradisaikul at Cornell University. It is distributed as part of the Polyglot extensible compiler framework.

PPG syntax

The PPG grammar syntax extends that of CUP with several new declarations:
  1. The first line of a PPG specification is
      include "filename"
    where filename is a relative path to the inherited specification. A PPG spec can include a CUP spec or another PPG spec. It is an error for the chain of included files to contain a cycle.
  2. A PPG grammar specification can modify the inherited grammar using the following commands:
    • precedence [ left | right | nonassoc ] tokenlist;
      is the syntax for specifying new precedence rules, which is exactly as they are specified in CUP. Specification of new precedence rules in a PPG specification replaces the precedence rules in the inherited grammar with these.
    • precedence;
      deletes all precedence rules from the inherited grammar, and is mutually exclusive with the syntax above: to add new precedence rules use the above syntax, use of this statement will only remove precedence rules.
    • drop { symbol }
      where symbol is an inherited terminal or nonterminal.
      The specified symbol is removed from grammar, and if a nonterminal, all productions where it is on the left-hand side are also eliminated. It is an error to drop a non-terminal and not drop productions where it is mentioned on the right-hand side.
    • drop { S ::= < productions> ; }
      where S is an inherited nonterminal, and productions are inherited productions. The specified productions are not inherited from the base grammar. The nonterminal remains in the grammar, even if ALL of its productions are dropped in this way; drop { S } must be used if the nonterminal is to be eliminated.
    • override S ::= < productions> ;
      The specified productions replace productions of S.
    • extend S ::= < productions> ;
      The specified productions are added to the nonterminal S.
    • transfer S to A1 { rhs1 } to An { rhsn }
      where rhsi is one or more right-hand sides of productions of S. Each of the nonterminals Ai is extended with productions as specified, and the transferred productions are not inherited by S, so a single production can be transferred to multiple nonterminals. Note that S may be one of the Ai, which has the effect of retaining the productions in S.
    • New terminals, nonterminals, and productions can be defined as in CUP and extend the base grammar.
  3. PPG supports multiple start symbols in grammars. To specify which symbols may be used as start symbols, PPG provides the following syntax:
    start with S1 func1
    start with Sn funcn
    where each Si is a non-terminal symbol that you would like to start the parser with, every time you call the function funcn. PPG will automatically generate a new nonterminal symbol for each of these productions to differentiate between the different nonterminals that the parsing may start with now, and patch the grammar accordingly. After the parser class is instantiated, any of the funci can be called on it to parse the appropriate part of the grammar.
    Note: if you are using multiple inheritance, make sure to specify the symbols class you are planning to use with the -symbols switch. See the section Running PPG.

Running PPG

The syntax for invocation of PPG is

Semantics of PPG

PPG is declarative: the order in which commands are specified is not significant. The resulting productions for any nonterminal N is as follows:

The resulting available set of terminals is simply:

where T is the set of terminals from base grammar, drop(T) is the set of dropped terminals using the drop command, newTerminals are terminals defined using CUP syntax. The resulting set of nonterminals is similar.

For precedence rules:

if "precedence;" is specified by the PPG grammar, thus not inheriting any precedence rules from the base grammar, or

where any newly defined precedence rules override any inherited precedence rules from the base grammar.

Debugging PPG grammars

PPG has a nearly unique feature that helps debug grammars: it generates explanatory counterexamples showing why grammar conflicts arise, either because the grammar is ambiguous or because it is simply not LALR. A counterexample is either a single string of nonterminals and terminals that can be parsed in two ways, or a pair of such strings that are identical up to the point of conflict. PPG both reports the string or strings and also shows how they are derived.

For more information, see the following paper:

Chinawat Isradisaikul, Andrew C. Myers. Finding counterexamples from parsing conflicts. Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI), May 2015.