CS412/413
Programming Assignment 1: Lexical Analysis
due Monday, February 15
In this programming assignment, you will build a lexical analysis phase for the language Iota, defined in the accompanying handout (see http://www.cs.cornell.edu/cs412-sp99/05-iota.htm). Your lexer should have the following interface (it may have additional methods, of course):
class Lexer {
Lexer(InputStream i);
// Create a lexer that reads characters from
the input stream i
Token getToken( ) throws LexicalError;
// Return the next language token on the input
stream. Returns
// a token representing the end of file as the
last token, assuming
// that no lexical error is encountered first.
}
The class Token may be defined in whatever way you prefer, but it should provide at least the following methods:
class Token {
void unparse(OutputStream o);
// Print a human-readable representation of
this token on the
// output stream o; one that contains
all the relevant information
// associated with the token.
int lineNumber();
// Return the number of the line that this
token came from.
}
You must also implement a lexer test-bed program. This program must be a class LexTest
that implements the following behavior. When run from the command line, the LexTest
program takes a single filename as an argument. It reads the file, breaks it into tokens,
and uses the Token.unparse
method to dump a representation of the file as a
series of tokens. If a lexical error is encountered, it prints an error message that
includes the line number on which the error occurred.
All of the classes you write should be in or under the package Iota
, so
the Lexer class will be Iota.Lexer
, the testbed will be Iota.LexTest
,
etc.
You may use a lexer generator such as JavaLex to do this assignment. However, we do not take responsibility for helping you figure out how to use JavaLex; if you use it, you are on your own. If you use a lexer generator, you should turn in the lexer generator input rather than the Java source code that it emits!
We will test your lexer rigorously against our own test cases -- including programs that are lexically correct, and also programs that contain lexical errors. The correctness of your lexer will be important.
Because groups in this class are relatively large, we will be expecting a higher level of quality in your product than in some other courses you have taken. Much of the value in a compiler (or any other large program) is in how easily it can be maintained. A high value is placed here on both clarity and brevity -- both in documentation and code.
Information about how to submit solutions electronically is forthcoming.