CS212: Project 2B - Parse tree unnecessarily deep

For Part 2b, points were taken off for something similar to “parse tree unnecessarily deep.” Typically, this meant something similar to the following was used in the compiler (here in pseudocode, so it’s easy to follow)

[in StatementBlock.java]
void parse(Tokenizer t) {
      while(next token is not ‘}’) {
            Statement s = new Statement(s)
            s.parse(tokenizer)
            this.addChild(s);
      }
}
 

[in Statement.java]
Statement child;
void parse(Tokenizer t) {
      if (next token is “if”) {
            this.child = new IfStatement()
            if (next token is “for”)
                  this.child = new ForStatement()
      } else if ... {
	...
      }
      this.child.parse(t);
}

Why does this cause problems? Consider the parse tree that results for something like for(;;) {x = 1;}

ForStatement
          Statement
                    StatementBlock
                              Statement
                                        ExpressionStatement
                                                  Expression
                                                            AssignmentExpression
                                                                      ExpPart
                                                                                Name (x)
                                                                      Expression
                                                                                Constant (1)

This parse tree can be compacted greatly. For instance:

ForStatement
          StatementBlock
                    ExpressionStatement
                              AssignmentExpression
                                        Name (x)
                                        Constant (1)

There are four levels of objects in the first AST that mean nothing from the perspective of the program. When you walk down the tree, none of the Statement and Expression/ExpPart nodes contribute any code, and they can be easily removed. This was one of the things that was emphasized in the group meetings – your AST should represent the code as compactly as possible without losing any meaning. Clearly, the first tree is storing a lot more than it needs to. Moreover, the AST for the most part represents “has-a” relationships. In the first AST, this implies that a Statement “has-a” StatementBlock, when in fact a StatementBlock is a kind of Statement.

So how might you go about fixing this? The problem is that you know you want to parse a Statement next, but you don’t know what type. The only thing creating an instance of a Statement object does is allow that instance to correctly determine which type of statement is next during parsing; after that its job is done, but it remains in the AST. So consider the following code:

[in Statement.java]
public static Statement parse(Tokenizer t) {
      if (next token is “do”) {
            Statement code = Statement.parse(t);
            Pull next token (should be “until”);
            Pull next token (should be “(“);
            Expression condition = Expression.parse(t);
            return new DoUntilStatement(code, condition);
      } else if (next token is “{“) {
            Vector<Statement> v = new Vector<Statement>();
            While (next token not “}”) 
                  v.addElement(Statement.parse(t));
            return new StatementBlock(v);
      } else if ... {
            ...
      }
}

Note that this is a static method, which means you don’t need an instance of a Statement object to call it – you can simply call Statement.parse(). It also returns exactly the class needed for a given statement, so it will produce the second parse tree above. Finally, it’s obviously recursive: at any time, if it’s parsing a statement that requires another statement, it will call itself to get the node needed. You can imagine something similar in Expression, and anywhere else where you have subclasses of a given type but don’t necessarily know which one you need. Thus, in your parse code, wherever you need a Statement, or an Expression, or whatever, you simply call the static method on the class which does the parsing, and it returns you an instance of the actual class you need.

Hope this helps clarify things. If you have any questions, feel free to contact any one of the TAs for the course to schedule a meeting, and we'll be happy to discuss this further