CS 4120: Introduction to Compilers
Fall 2013

Programming Assignment 2: Syntactic Analysis

due: Monday, September 23

Update 9/16:

Uploaded full Xi example
Specified relaxation for desugaring: methods implemented in interfaces need not be taken care of for this assignment

Update 9/21:

Fixed flattening example

In this assignment you will implement a parser for the Cubex programming language. As discussed in class, a parser takes a stream of tokens and creates an abstract syntax tree from it. Your task will be to parse input files from the full Cubex language, desugar the programs and output the program, translated to the the core Cubex.

Note: the language specification has undergone some revisions and had to be changed slightly with regards to lexing: type names and type variables should be split into two different tokens. Type variables are now a single uppercase letter, and type names must be at least two characters long. This change does not affect the Lexer homework, but you may need to adapt your lexer to the new specification for this and future assignments.

Deliverables

Your submission has to have two parts:

An executable jar file containing the parser program and its sourcecode.
A text file as described below.

The parser

You have to use some language that uses the Java Virtual Machine and turn in an executable JAR-file. The program should accept a file name as an argument, read the specified file and print out the input program's core Cubex representation as a token stream similar to PA1. This time, however, names and literals will stay as they were in the input file.

The format of the output of you program should be as follows. For autograding purposes, you must exactly match this format.

As in PA1, you should output a single line of "tokens" (however, those should be the "tokens" of the desugared program), separated by a space where whitespace is permitted.
Curly braces enclosing a list of statements must appear exactly where they are semantically necessary and nowhere else. That imposes two restrictions on curly braces:
1. A list of statements must be surrounded by curly braces if and only if there are 0 or more than 1 statements in the list.
2. Lists of statements must be flattened -- that is, in a list of statements { s ... s } none of the s-es may be a list of statements. For example, { s1 ... sj { sk ... sm }} becomes {s1 ... sj sk ... sm }.
These restrictions are imposed for two reasons. First of all, you will eventually have to deal with the removal of unnecessary curly braces and the parser is a good place to do it. Second of all, narrowing down the allowable core Cubex programs in this way means that every program in full Cubex desugars to a unique program in core Cubex, which makes grading the assignment possible.

If a program cannot be lexed, your program should just print "lexer error", and nothing else.

If a program cannot be parsed, your program should just print "parser error", and nothing else.

Relaxation: you do not need to implement the full Cubex feature that methods can be implemented in interfaces. However, you should think about how to handle this feature later on. If you already implemented it, great! There won't be any test cases that feature method implementations in interfaces.

We encourage you to use a parser generator such as ANTLR in your implementation, but this is not required. If you do use ANTLR, a example code for the Xi language from a previous iteration of this course is provided (for use with the lexer grammar provided in our ANTLR guide).

You have to include the complete source code in the jar-file, including inputs for any lexer generators you might use. You will need a manifest file (MANIFEST.MF) that sets the classpath. Your manifest file should look like this:

Manifest-Version: 1.0
Class-Path: . antlr-4.1-complete.jar
Main-Class: [your-class-name-here]

The following code snippet does this packaging for you (provided Lexer is your main class; .g4 is the file extension for ANTLR files).

jar cvfm Parser.jar MANIFEST.MF *.class *.java *.g4

Hints

For this assignment, it is important that you carefully go through the language specification and understand the difference beween core Cubex and full Cubex and how to get from the latter to the former. Then use the parser to generate your internal core Cubex representation of an input program. From there you only need to implement methods to convert your AST nodes to strings and then traverse the tree.

The text file

Finally, we ask that you submit a small text file (.txt) that contains the following information:

Who you discussed the problem set with. Remember that the solution must be your own, but you can discuss your ideas with others provided you mention that in this file.
What the distribution of work in your team was like. Who did what?
Any additional comments you might have about the problem set.

Source control

We recommend that you use some kind of source control. Be aware that your repositories should not be publicly viewable on the web. GitHub offers private repositories for students.

Testing

You should thoroughly test your solutions before turning them in. We provided some basic test examples for you to play with, but keep in mind that we will test your submissions with more complex examples, too. The way a test works is that after running

java -jar Parser.jar parser_test1.in > core1.out

the content of core1.out should be equal to parser_test1.out .

Planning ahead

As with the last assignment, keep in mind that you should be able to re-use the code you create for this assignment in future assignments.

CS 4120: Introduction to Compilers Fall 2013