# Problem Set 2: Syntactic Analysis

due: Wednesday, September 18

Update 9/17:
• Clarified toplevel for problem 1.
1. Consider a simple markup language that uses tags. Its terminals are {<, >, /, =, word}.  Every tag begins with < and ends with >. The first token in an open tag is a word representing its name, followed by an optional list of attributes which are pairs of words related by =. Every open tag must be paired with a close tag, which is a tag with / before the name and no attributes. Any number of words or tags may appear between and open and close tags. A sentence of the whole language consists of an arbitrary number of words and tags.

For example, here is a valid string from this grammar:

<word word=word word=word><word>word word word</word><word></word></word>

1. Write a context-free grammar for this language.
3. Construct the LL(1) parse table, and clearly identify any conflicts in the table. Make sure the columns of the table come in the same order that the terminals do above: {<, >, /, =, word}.
4. What, fundamentally, makes this language not LL(1)? Is there a k for which it is LL(k)?
2. Consider the following grammar:

EE op E | (E) | num
op →  + | * | ^

This grammar is ambiguous; we would like the grammar to have parse trees in which exponentiation (^) has higher precedence than multiplication (*), and both have higher precedence than addition (+).

1. Show that the grammar is ambiguous.
2. Write an LL(1) grammar that accepts the same language as this grammar and respects the desired operator precedence.  (You need not ensure that the operators are left-associative, however.)
3. Show the derivation of the expression 2^3*4 + 5 using the grammar of part (b).
4. Write an LR(1) grammar that accepts the same language, but enforces both the correct precedence and left-associativity of the operators. You do not need to construct the parser tables.
3. [Dragon book, 4.40] Show that the following grammar is LR(1) but not LALR(1).

S →   Aa | bAc | Bc | bBa
A
→   d
B →   d

4. (CS5120 only) Describe a grammar for which every non-terminal has a unique applicable production given the first token (i.e. each non-terminal's productions' FIRST sets are disjoint) and yet the grammar is not LL(1).