due: Wednesday, September 11
In this assignment you will implement a lexer (also called a scanner or a tokenizer) for the Cubex programming language. As discussed in lecture 2, a lexer provides a stream of tokens (also called symbols or lexemes) given a stream of characters.
Your submission has to have two parts:
You must use some language that uses the Java Virtual Machine and turn in an executable JAR-file. The program should accept a file name as an argument, read the specified file and print out the following text representations of tokens, separated by a space wherever whitespace would be permitted (here should be no leading or trailing spaces).
|Token type||Replace with|
|Type (variable) names||Name|
|Comments and whitespace||[remove]|
|Everything else||[leave as is]|
We encourage you to use a lexer generator such as ANTLR in your implementation, but this is not required. If you do use ANTLR, basic setup information and sample grammars for the Xi language from a previous iteration of this course are provided.
You have to include the complete source code in the jar-file, including inputs for any lexer generators you might use. You will need a manifest file (MANIFEST.MF) that sets the classpath. Your manifest file should look like this:
The following code snippet does this packaging for you ( .g4 is the file extension for ANTLR files).
Finally, we ask that you submit a small text file (.txt) that contains the following information:
We recommend that you use some kind of source control. Be aware that your repositories should not be publicly viewable on the web. GitHub offers private repositories for students.
You should thoroughly test your solutions before turning them in. We provided some basic test examples for you to play with, but keep in mind that we will test your submissions with more complex examples, too. The way a test works is that after running
the content of tokens1.out should be equal to simple_test1.out .
There may be a few special cases where lexing is ambiguous. However, those cases would later be rejected by the parser no matter what rules are chosen. We promise not to have test cases that are affected by these ambiguities.
You need to form a group of three or four people in CMS for this assignment. These groups will be kept for all future programming assigments.
For this example, you should sit in front of one computer as a whole group and write the grammar together. This will help you to get going as a group.
After that, one person can write the main function while the others generate test cases.
This assignment is much smaller than future assignments will be: it is intended primarily as a warmup assignment that gives your group the chance to practice working together. The later assignments will test your ability to work effectively as a group, so this is a great time to learn how to work together as a group. It is also a good time to set up the infrastructure that you will use for the rest of the semester.