Representing mathematical content

We have designed an internal representation, called the quasi-prefix form, for handling mathematical content. It captures the full prefix form of mathematical expressions with operators and simple variables. The tree corresponding to [tex2html_wrap5254] has root + and children [tex2html_wrap5258] and [tex2html_wrap5260] and is represented as such internally.

In addition to linearizing the underlying tree structure, mathematical notation uses visual attributes such as superscripts and subscripts. We extend the prefix form to capture such visual attributes -hence the name quasi-prefix.

The key feature of the quasi-prefix form is that it delays the assignment of semantic interpretation to instances of ambiguous written mathematics. For example, the superscripts in an expression are represented not as exponents but as attribute superscript. This is because the meaning of these visual attributes is context dependent. Assigning one of the several possible interpretations at the recognition step is unduly restrictive in a fully flexible rendering system. For example, interpreting the superscript as an exponent would result in [tex2html_wrap5262] being recognized correctly, but [tex2html_wrap5264] being incorrectly recognized. Further, it would be impossible to later distinguish between the correct and incorrect interpretations. The quasi-prefix form captures the mathematical notation itself, leaving the assignment of semantic interpretation to a later step. By doing so, we can represent content where we do not have sufficient semantic information. Thus, [tex2html_wrap5266] might denote the first derivative of [tex2html_wrap5268] with respect to [tex2html_wrap5270] in a specific context. The superscript and subscript might mean something entirely different in another context, e.g., as in [tex2html_wrap5272]. If more contextual information is available at the rendering step, AsTeR can speak [tex2html_wrap5274] as ``cap a transpose''. In the absence of such contextual information, the system can still produce an audio notation that maps different features of the written notation to unique audio dimensions.

At the same time, the quasi-prefix form is sufficiently rich to permit renderings that are independent of the order in which the written symbols appear on paper. Linear renderings with the rendering-order hard-coded into the system can be produced with a simpler representation, e.g., a linear list, or even the TeX encoding itself. This was shown by , a string-substitution based program that directly transformed TeX source to produce linear renderings [Ram92][Ram91].

As an example, assume for the present that \kronecker[+] is defined as an infix binary operator. Given the expression [tex2html_wrap5276] encoded as $a\kronecker b$ , we can write a rendering rule for object kronecker represented in the quasi-prefix form to produce ``a kronecker product b''. This rendering can be produced by as well, but a simpler list-like representation restricts the system to this one form of rendering. Using the quasi-prefix form, AsTeR can also produce ``the kronecker product of a and b''.

Thus, even though the quasi-prefix form captures only the information present in the TeX encoding, it is still flexible enough to permit more sophisticated processing.

This power is necessary in overcoming the passive nature of listening. In producing printed output, it is sufficient to produce one view; once the information has been presented visually, a person reading the material can access it in any desired order. TeX itself therefore never builds up an internal representation like the quasi-prefix form; its purpose is to typeset the input according to a fixed set of rules, and the TeX encoding directly reflects the linear order[+] in which expressions appear on paper. Thus, here, the displayed information is passive while the person reading it is active. The situation in presenting information orally is exactly the opposite; the information flows past a passive listener. In order to achieve effective oral communication, it is therefore important to be able to present multiple views of the information.

[Next] [Up] [Previous]
Next: Math object encapsulates Up: Recognizing high-level document Previous: Extending document logical

TV Raman
Thu Mar 9 20:10:41 EST 1995