Bootstrapping lexical choice via multiple-sequence alignment
Regina Barzilay and Lillian Lee
Proceedings of EMNLP, pp. 164--171, 2002

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora --- datasets that supply several verbalizations of the corresponding semantics rather than just one.
We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.

@inproceedings{Barzilay+Lee:02a, author = {Regina Barzilay and Lillian Lee}, title = {Bootstrapping lexical choice via multiple-sequence alignment}, year = {2002}, pages = {164--171}, booktitle = {Proceedings of EMNLP} }

mona lisa results chart

This paper is based upon work supported in part by the National Science Foundation under ITR/IM grant IIS-0081334 and a Louis Morin scholarship. Any opinions, findings, and conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of the National Science Foundation.