Due: February 5, 1999 at the beginning of class
Stats -- Min: 10, Max: 19, Mean: 14.2, Std.Dev.: 2.5
Notes: each problem was worth either 1 or 2 points, depending on whether or not it was reasonable to award partial credit for the problem. As might be expected, two-pointers left room for partial credit; one-pointers were deemed simple enough to warrant "all-or-nothing" grading. For two-pointers, I allowed three misses (i.e., anything that on its own would warrant a point deduction) before marking a 0.
A large population of the class does not quite understand the concept of regular expressions, characters, and metacharacters. This manifested itself in problems #1 and #2. Basically, any character is interpreted as a literal unless it is a regular expression metacharacter (those characters shown in Lecture 2). In particular, there is nothing special about double-quotation marks; they are a character just like any other that might be scanned. In general, I was lenient. But these must be learned for future homeworks and programming assignments.
Each solution is prefixed by a bracketed expression denoting the value of the problem, the range of scores earned on the problem, and the average and standard deviation of scores earned on the problem.
[2 points, 0-2, 0.9±0.4] One possible description of a floating point constant is as follows: it contains a numeric part followed by an optional exponent. The numeric part may be a positive or negative integer, or a number containing a decimal point and at least one digit on the left- or right-hand side. The numeric part may not start with leading zeros unless the zero is the only digit to the left of the decimal point. The exponent starts with either the character "e" or "E", followed by either a positive or negative integer. Write a concise regular expression that captures this definition.
We begin by constructing pieces of our regular expression:
NonZeroInteger [1-9][0-9]* Dot . SignedInteger -? NonZeroInteger Integer SignedInteger | 0 Digits [0-9]+ Numeric SignedInteger | Integer Dot | Dot Digits | Integer Dot Digits Exponent [Ee] SignedInteger Float Numeric Exponent?
Expanding, we have:
(-? [1-9][0-9]* | (-? [1-9][0-9]* | 0) . | . [0-9]+ | (-? [1-9][0-9]* | 0) . [0-9]+) ([Ee] -? [1-9][0-9]*)
Common mistakes for which points were not deducted:
Common mistakes for which points were deducted:
[2 points, 0-2, 0.8±0.6] A string in C begins and ends with the
double quotation character "
. In the middle of the string, any printable
character may occur. The sequence \"
denotes the occurrence of a double
quotation mark within the string, the sequence \n
denotes a newline
character, and the sequence \\
indicates the occurrence of a single backlash
character. Other escape sequences are allowed, but we will ignore them here. Write a
regular expression that recognizes strings of exactly this sort, using NEWLINE
to represent "a new line" (since \n
will be interpreted as the
literal sequence \n
). In practice, strings and comments usually are tokenized
by special-case code that is invoked once the initial "
is seen. Can you
see why?
Using the metacharacters ( ) [ ^ ] *
given in Lecture
2 yields (the first regular expression accepts embedded newlines, the second one does
not):
" ( [^"\] | NEWLINE | \["n\] )* " " ( [^"\] | \["n\] )* "
Breaking up the parenthesized expression within the double quotes:
[^"\].
Any character that is not a double quote or backslash. NEWLINE.
An embedded newline character. For example: "Hello, world!"
Really, NEWLINE
is included by [^"\]
since it is
"any character that is not a double quote or backslash".
\["n\].
A backslash followed by a double-quote, `n', or
backslash. Strings and comments are special-cased because of their arbitrary length. A long
comment or string would overrun a typical buffer allocated for a token in a typical lexer
(such as lex
). The AT&T lex
token buffer is 200 characters;
other implementations allocate as little as 100 characters. Modern lexers are better
prepared for long strings and comments; flex
uses 8k buffers by default.
There were some inconsistencies regarding the treatment of NEWLINE
in the
FAQ and in some email; answers that either matched or rejected embedded newline characters
were accepted.
Basically, misuse (or lack of use) of regular expression syntax earned a one-point deduction. Missing another thing resulted in a 0. Common mistakes:
[0-9a-zA-Z]
instead of [^"\]
. The first expression
excludes stuff like !@#$%... Appel, problem 2.2 (a,b).
(a) [1 point, 0-1, 0.7±0.5] A regular expression is equivalent to a DFA.
Recognizing the strings containing more a's than b's would require a DFA of unbounded
size, since there would have to be one DFA state for each difference in the number of a's
and b's, which itself is unbounded.
(b) [1 point, 0-1, 0.6±0.5] The language of palindromes would also require a DFA
of unbounded size, since it would need a state to represent the "beginning" and
"end" instances of a character in the palindrome. Since palindromes are of
arbitrary length, the DFA would be of arbitrary size.
Appel, problem 2.3.
(a) [1 point, 1-1, 1±0.0] This automata recognizes all four-character strings
consisting of "0"'s and "1"'s except for "0110".
(b) [1 point, 0-1, 0.9±0.3] This automata recognizes strings consisting of 5n
+ 1 consecutive `a' characters, where n is a non-negative integer.
(c) [1 point, 0-1, 0.6±0.5] This automata recognizes strings accepted by the
regular expressions
(0* (1 (0 1* 0)* 1)* )*or
( (1 (0 1* 0)* 1)* 0*)*or
(0 | 1 (0 1* 0)* 1)*Note that in general,
(a | b)*
will generate and recognize the
same strings as (a* b*)*
. The point is that you should be able
to algorithmically convert between DFAs and regular expressions.
The cuter solution was "all binary representations of multiples of 3".
(a) [2 points, 0-2, 1.8±0.5]
y 1234 --> [67] |^ x||z v| [567]
(b) [2 points, 0-2, 1.0±0.9] The leaves have transitions between themselves. They are all final states, because of state 6. Yes, there were 32 states. Those who stuck it out to the end would have seen the binary tree right away. Those who used only a state table missed out on this insight. Note the exponential increase in the number of states (32 = 25) from the original 6 states, and the balanced binary tree reflecting this 2n increase:
1---- |\__/ b a| 12 +--------------------------------+--------------------------------+ a| |b 123 13 +-----------------------------+ +-----------------------------+ a| |b a| |b 1234 134 124 14 +-----------+ +-----------+ +-----------+ +-----------+ a| |b a| |b a| |b a| |b 12345 1345 1245 145 1235 135 125 15 +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ a| |b a| |b a| |b a| |b a| |b a| |b a| |b a| |b 123 134 124 14 123 13 12 156 123 13 12 146 12 136 126 16 456 56 56 56 56 56 56 46 46 46 36
(c) [2 points, 1-2, 2±0.2]
c a {1,5,10,14} --> {2,6,11,15} --> {3,7,12,16} + +-----------------+-------+ r| |t [{13,17}] [{4,8}] + + s| |s [18] [9]
[2 points, 0-2, 1.8±0.5] Appel, problem 2.6.
One may readily represent this state diagram as a table:
transition 0 1 state 1 2 6 2 7 3 3 1 3 4 3 7 5 8 6 6 3 7 7 7 5 8 7 3
Note that the transitions leaving states {2,8} are the same. What this means is that once state 2 or 8 has been entered, the visitor of the DFA cannot distinguish between states 2 or 8. From the conditions for equivalent states given at the end of Appel section 2.4, it follows states 2 and 8 are equivalent. The same conditions also hold for states {4,6}.
Once states {2,8} and {4,6} have been minimized, you will see that states {1,5} can also be minimized. This yields the following table and DFA:
transition 0 1 +------------+ + state 15 28 46 v 0 1 |0 /|1 28 7 3 ->15--->28--->[3]<-+ 3 15 3 | ^ | ^ 46 3 7 1| \1 |0 + | 7 7 15 v \ v /|0 | 46--->7<-+ | 0| | +------------+
[3 points, 0-3, 2.1±1.1] Appel, problem 2.9.
The lexical specification given by Appel represents the following DFA:
>{1,7,10}---b-->[{11}] |a v [{2,8,11}]--b-->{3,8}--a-->[{4}]--a-->{5} |a |b ^ / v v \ / b [{9}]<----a----{8}-+ a \{6}v ^ / |/ b +From the DFA, we can generate the
edges
and
final
tables:
+-----------------+--------------------+---------+ | | Edges | Final | +-----------------+--------------------+---------+ | | a b | Action | | {1,7,10} | [{2,8,11}] [{11}] | 0 | | [{2,8,11}] | [{9}] {3,8} | 3 | | {3,8} | [{4}] {8} | 0 | | [{4}] | {5} | 1 | |State {5} | {6} | 0 | | {6} | [{4}] | 0 | | {8} | [{9}] [{8}] | 0 | | [{9}] | | 2 | | [{11}] | | 3 | +-----------------+--------------------+---------+