Programming Assignment 2: Syntactic and Semantic Analysis
due Friday, March 5
In this programming assignment, you will build the syntactic and semantic analysis phases for the language Iota. The latest version of the Iota language definition may be found at http://www.cs.cornell.edu/cs412-sp99/05-iota.htm. Your program will determine whether a source file defines a legal interface or module. We expect you to build upon the code that you wrote for Programming Assignment 1, and to fix any problems with your lexical analysis phase.
Your program will have a main class, Iota.SemTest, that implements a syntactic/semantic checker. When invoked from the command line, this checker will implement at least the following minimal specification: it should expect to receive a single argument that is the name of a source file containing a module specification, with an extension of ".mod". Your checker will first determine whether the source file contains any syntactic errors, and if so, report them using the syntax "source-file-name:line-number:message". For example, a reasonable error message might look like the following:
hello.mod:4:syntax error at token "else"
Your syntactic analysis phase must be structured as a recursive-descent parser. In order to parse the Iota Language top-down, you will need to transform the Iota language grammar into an equivalent LL grammar, which you will turn in as part of your report. For this assignment, you are not permitted to use an automatic parser generator tool.
If the program is syntactically correct, your checker will produce an abstract syntax tree and analyze it to ensure that contains no semantic errors. As part of this analysis, the checker may need to read interface files (extension .int) to learn the signatures of operations from other modules. Interface files are read in when uses declarations are processed during semantic checking. The checker must report any syntactic or semantic errors found in the interface files. The checker should look for interface files in the same directory as the module file.
Semantic errors include the following categories: type errors, invalid types, undefined identifiers, overlaps of identifier scope (see the updated language definition). Your checker must report these categories of errors. It is not required to report more than one error; it may terminate after reporting the first syntactic or semantic error it encounters. These errors should be reported in the same format as syntax errors.
In addition, your checker must support a "-dump_ast" option. When invoked with this option and the name of a module file (e.g., jview Iota.SemTest -dump_ast hello.mod), the checker will print to System.out a representation of the decorated abstract syntax tree for the module it has checked. The AST representation that is printed should use parentheses to indicate the tree structure. For example, an IfStmt node with three children representing the tested expression E and the two alternative statements S1 and S2 should be printed as (if E S1 S2 :T), where E, S1, and S2 are the printed representations of the corresponding nodes in the abstract syntax tree. Those familiar with Scheme or Lisp will recognize this format as an s-expression. The first word within the parentheses -- if -- indicates the kind of node that this is; the remaining output within the parentheses describes the children of this node. For nodes representing expressions or statements that have a value, the type of the expression is included as the last component of the output for the node -- the :T, above. For example, a node representing the addition of two integers 2 and 3 could be output as follows:
(+ (integer 2) (integer 3) :int)In other words, the right-most component of the printed represention of an expression node should have the form :type, where type is the type of the expression. As shown in the example, it is not necessary to include this component for constants, where the type is obvious. Your printed representation may include newline characters to make the output more readable. This description has not been a precise specification of how to dump the AST; you have some latitude in choosing the output style. We recommend that you implement this output procedure by adding a method
void dumpAST(OutputStream);to the nodes of your abstract syntax tree; this method generates the appropriate representation on the named output stream, and can be implemented recursively.