CS412/413
Introduction to Compilers and Translators
Spring 2000
Cornell University Computer Science Department

Programming Assignment 1: Lexical Analysis

due Friday, February 4


In this programming assignment, you will build a lexical analysis phase for the language Iota, defined on the web page at http://courses.cs.cornell.edu/cs412/2000sp/iota/iota.html. Your lexer, or tokenizer, should have the following interface (it may have additional methods, of course):

class Lexer {
    Lexer(InputStream i);
        // Create a lexer that reads characters from the input stream i
    Token getToken( ) throws LexicalError;
        // Return the next language token on the input stream. Returns
        // a token representing the end of file as the last token, assuming
        // that no lexical error is encountered first.
}

The class Token may be defined in whatever way you prefer, but it should at least implement the following interface:

interface LexerResult {
    void unparse(OutputStream o);
        // Print a human-readable representation of this token on the
        // output stream o; one that contains all the relevant information
        // associated with the token. The representation has the form
        // <token-type, attribute, line-number>
    int lineNumber();
        // Return the number of the line that this token came from.
}

The class LexicalError should also implement the LexerResult interface, though you will want to choose a different output format for the unparse method.

You must also implement a lexer test-bed program. This program must be a class LexTest that implements the following behavior. When run from the command line, the LexTest program takes a single filename as an argument. It reads the file, breaks it into tokens, and uses the Token.unparse method to dump a representation of the file as a series of tokens. If a lexical error is encountered, it prints an error message that includes the line number on which the error occurred.

All of the classes you write should be in or under the package Iota, so the Lexer class will be Iota.Lexer, the testbed will be Iota.LexTest, etc.

You may use a lexer generator such as JLex to do this assignment. However, we do not take responsibility for helping you figure out how to use JLex; if you use it, you are on your own. If you use a lexer generator, you should turn in the lexer generator input rather than the Java source code that it emits!

We will test your lexer rigorously against our own test cases -- including programs that are lexically correct, and also programs that contain lexical errors. The correctness of your lexer will be important, and we will be more rigorous in our expectations for correctness if you use a lexer generator. We expect you to perform your own testing of the lexer. Often student projects do not handle erroneous input properly -- make sure that yours does! Testing your program on corner cases is also a good idea.

What to turn in

Because groups in this class are relatively large, we will be expecting a higher level of quality in your product than in some other courses you have taken. Much of the value in a compiler (or any other large program) is in how easily it can be maintained. A high value is placed here on both clarity and brevity -- both in documentation and code.

Turn in on paper:

Turn in electronically:

Submission instructions

To submit your Programming Assignment 1, please drop your files in \\goose\courses\cs412-sp00\grpX\pa1, where grpX is your group identifier.  Please organize your top-level directory structure as follows :

Note: Failure to submit your assignment in the proper format may result in deductions from your grade.