Problem Set 1: Parse forest algorithms and grammar development
CS 474 Introduction to NLP Mats Rooth
Solve problem 1.2 in Revesz. In addition to a written solution, turn in grammar files a.gram, a.lex, a.start ..., b.gram, b.lex, ... etc. in symbolic lopar format. Make sure the parser can read your solutions.
Design and present in pseudo-code an algorithm which removes those symbols which do not contribute to an a complete analysis (i.e, an analysis in the tree language licensed by the grammar) from the chart returned by the CKY algorithm. A call to the algorithm has the form filter-chart(chart, N, T, S, F) , where chart is a two-dimensional array returned by CKY as presented in class, and N,T,S,F represent the grammar. The algorithm returns a chart.
One property of a solution is that in the filtered chart, chart[0,n] contains only S.
Consider sequences English auxiliary verbs and one main verb such as
According to a theory of Noam Chomsky, the sequences are generated from this pattern.
Parentheses represent optionality, and curly brackets represent choice among three possibilities. Modal represents modal verbs such as will and might. V represents main verbs such as discuss. The pattern is used to generate a sequence of symbols, and then the affixes hop over the following words. For instance
which spells out as had been eaten. A maximal-length sequence is will have been being eaten .
Write a context-free grammar in Chomsky Normal Form which generates the same strings as Chomsky's procedure; note that this is a finite set. Include just one modal will , and two main verbs eat and discuss . These have inflected forms such as discussing (from discuss-ing ), discusses (from discuss-es ) and ate (from eat-ed ). The grammar should generate terminal strings which are actual English phrases, such as has been discussing and ate .
In addition to a written solution, turn in grammar files in lopar format. It may also be useful to use lopar and charge in developing the grammar. Since we may use programs to test your solution, make sure you have the right file format.
If possible, use claws c5 preterminal tags, as specified on pages 141-142 of Manning and Schuetze. For instance, use VHZ as the preterminal catecory of has .