CS 4120: Introduction to Compilers
Fall 2013

Programming Assignment 2: Syntactic Analysis

due: Monday, September 23

Update 9/16:
Update 9/21:

In this assignment you will implement a parser for the Cubex programming language. As discussed in class, a parser takes a stream of tokens and creates an abstract syntax tree from it. Your task will be to parse input files from the full Cubex language, desugar the programs and output the program, translated to the the core Cubex.

Note: the language specification has undergone some revisions and had to be changed slightly with regards to lexing: type names and type variables should be split into two different tokens. Type variables are now a single uppercase letter, and type names must be at least two characters long. This change does not affect the Lexer homework, but you may need to adapt your lexer to the new specification for this and future assignments.


Your submission has to have two parts:

The parser

You have to use some language that uses the Java Virtual Machine and turn in an executable JAR-file. The program should accept a file name as an argument, read the specified file and print out the input program's core Cubex representation as a token stream similar to PA1. This time, however, names and literals will stay as they were in the input file.

The format of the output of you program should be as follows. For autograding purposes, you must exactly match this format.

If a program cannot be lexed, your program should just print "lexer error", and nothing else.

If a program cannot be parsed, your program should just print "parser error", and nothing else.

Relaxation: you do not need to implement the full Cubex feature that methods can be implemented in interfaces. However, you should think about how to handle this feature later on. If you already implemented it, great! There won't be any test cases that feature method implementations in interfaces.

We encourage you to use a parser generator such as ANTLR in your implementation, but this is not required. If you do use ANTLR, a example code for the Xi language from a previous iteration of this course is provided (for use with the lexer grammar provided in our ANTLR guide).

You have to include the complete source code in the jar-file, including inputs for any lexer generators you might use. You will need a manifest file (MANIFEST.MF) that sets the classpath. Your manifest file should look like this:

Manifest-Version: 1.0
Class-Path: . antlr-4.1-complete.jar
Main-Class: [your-class-name-here]

The following code snippet does this packaging for you (provided Lexer is your main class; .g4 is the file extension for ANTLR files).

jar cvfm Parser.jar MANIFEST.MF *.class *.java *.g4


For this assignment, it is important that you carefully go through the language specification and understand the difference beween core Cubex and full Cubex and how to get from the latter to the former. Then use the parser to generate your internal core Cubex representation of an input program. From there you only need to implement methods to convert your AST nodes to strings and then traverse the tree.

The text file

Finally, we ask that you submit a small text file (.txt) that contains the following information:

Source control

We recommend that you use some kind of source control. Be aware that your repositories should not be publicly viewable on the web. GitHub offers private repositories for students.


You should thoroughly test your solutions before turning them in. We provided some basic test examples for you to play with, but keep in mind that we will test your submissions with more complex examples, too. The way a test works is that after running

java -jar Parser.jar parser_test1.in > core1.out

the content of core1.out should be equal to parser_test1.out .

Planning ahead

As with the last assignment, keep in mind that you should be able to re-use the code you create for this assignment in future assignments.