CS412/413
Introduction to Compilers and Translators
Spring 1999

Homework 1: Lexical Analysis

Due: February 5, 1999 at the beginning of class


HW1 FAQ.

  1. One possible description of a floating point constant is as follows: it contains a numeric part followed by an optional exponent. The numeric part may be a positive or negative integer, or a number containing a decimal point and at least one digit on the left- or right-hand side. The numeric part may not start with leading zeros unless the zero is the only digit to the left of the decimal point. The exponent starts with either the character "e" or "E", followed by either a positive or negative integer. Write a concise regular expression that captures this definition.

  2. A string in C begins and ends with the double quotation character ". In the middle of the string, any printable character may occur. The sequence \" denotes the occurrence of a double quotation mark within the string, the sequence \n denotes a newline character, and the sequence \\ indicates the occurrence of a single backlash character. Other escape sequences are allowed, but we will ignore them here. Write a regular expression that recognizes strings of exactly this sort, using NEWLINE to represent "a new line" (since \n will be interpreted as the literal sequence \n). In practice, strings and comments usually are tokenized by special-case code that is invoked once the initial " is seen. Can you see why?

  3. Appel, problem 2.2 (a,b).

  4. Appel, problem 2.3.

  5. Appel, problem 2.5.

  6. Appel, problem 2.6.

  7. Appel, problem 2.9.