due Wednesday, February 7
In this programming assignment, you will build a lexical analysis phase for the language Iota, defined on the web page at http://www.cs.cornell.edu/courses/cs412/2001sp/iota/iota.html. Your lexer, or tokenizer, should have the following interface (it may have additional methods, of course):
class Lexer {
Lexer(InputStream i);
// Create a lexer that reads characters from
the input stream i
Token getToken() throws LexicalError;
// Return the next language token on the input
stream. Returns
// a token representing the end of file as the
last token, assuming
// that no lexical error is encountered first.
}
The class Token
may be defined in whatever way you prefer, but it should
at least implement the following interface:
interface LexerResult {
void unparse(OutputStream o) throws IOException;
// Print a human-readable representation of
this token on the
// output stream o; one that contains
all the relevant information
// associated with the token. The
representation has the form
// <token-type, attribute,
line-number>.
// I/O exceptions on the output
stream o are passed through.
int lineNumber();
// Return the number of the line that this
token came from.
}
The class LexicalError
should also implement the LexerResult
interface, though you will want to choose a different output format for the unparse
method.
You must also implement a lexer test-bed program. This program must be a class LexTest
that implements the following behavior. When run from the command line, the LexTest
program takes a single filename as an argument. It reads the file, breaks it into tokens,
and uses the Token.unparse
method to dump a representation of the file as a
series of tokens. If a lexical error is encountered, it prints an error message that
includes the line number on which the error occurred. It must report the first lexical error in the
file; it may but need not report additional lexical errors.
String literals should be unparsed with their characters translated into
canonical form. The canonical form for any printable character is itself, e.g.,
"\065BC"
should be unparsed as "ABC"
.
For non-printable characters you may choose a suitable canonical form, such as
the "\
ddd"
or "\^
c"
escape sequences, as appropriate.
All of the classes you write should be in or under the package Iota
, so
the Lexer class will be Iota.Lexer
, the testbed will be Iota.LexTest
,
etc.
You may use a lexer generator such as JLex to do this assignment. However, we do not take responsibility for helping you figure out how to use JLex; if you use it, you are on your own. If you use a lexer generator, you should turn in the lexer generator input as well as the Java source code that it emits!
The correctness of your lexer will be important, and we will be more rigorous in our expectations for correctness if you use a lexer generator (though this should not discourage you from using such a tool). We expect you to perform your own testing of the lexer. Often student projects do not handle erroneous input properly -- make sure that yours does! You should develop a thorough test suite that, at a minimum, tests all legal tokens and all possible lexical errors. Testing your program on corner cases is also a good idea. We will test your lexer rigorously against our own test cases -- including programs that are lexically correct, and also programs that contain lexical errors.
This programming assignment is much smaller than the remainder of the assignments will be. Use this assignment as a warm-up and a chance to set up your code production process. Start thinking now about how you will manage the size and complexity of your source code and test cases. Although we cannot provide support in using them, CVS and Visual SourceSafe are both available for use in managing your code base. You may also wish to consider automation of your testing via shell scripts or other tools.
Because groups in this class are relatively large, we will be expecting a higher level of quality in your product than in some other courses you have taken. Much of the value in a compiler (or any other large program) is in how easily it can be maintained. A high value is placed here on both clarity and brevity -- both in documentation and code.
Your electronic submission is expected at the same time as your written
submission: at the beginning of class on the due date. Electronic
submissions after 10AM will be considered a day late. Place your files in \\goose\courses\cs412-sp01\grpX\pa1
,
where grpX
is your group identifier. Please organize your
top-level directory structure as follows :
src\
- all of your source code and class files. For example,
we expect to find a class file for your main program in src\Iota\LexTest.class, and
a companion .java file in the same directory.doc\
- documentation, including your write-up and a README.TXT
containing information on how to compile and run your project, a description
of the class hierarchy in your src\
directory, brief
descriptions of the major classes, any known bugs, and any other information
that we might find useful when grading your assignment.test\
- any test cases you used in testing your project.Note: Failure to submit your assignment in the proper format may result in deductions from your grade.