[Next] [Up] [Previous]

`$a+b$`

,

[LVerbatim574]

Converting a list as shown above to prefix form is a simple exercise and can be found in most programming language texts. Our implementation is based on the infix to prefix converter in the text on Common Lisp by Winston and Horn[+] [HW89].

Function `inf-to-pre` performs the infix-to-prefix conversion.
The input to this function is a list of math objects that have been
processed using the classification given
in s:classification-math. Each element of this list is a math
object with content and attributes but no children. Note that the
contents of the attributes are first converted to quasi-prefix
form. For example, when recognizing [tex2html_wrap5378], the
input is first converted to a list of five math objects containing the
quasi-prefix representation for [tex2html_wrap5380], +, [tex2html_wrap5384], + and
[tex2html_wrap5388] respectively. This is achieved by collecting the attributes
that appear on each math object and processing their content
recursively. Converting such a list to prefix form is now no
different than processing [tex2html_wrap5390].

We now extend this algorithm to handle ambiguous mathematical notation. Conventional parsing techniques fail, since written mathematics does not adhere to a rigorous set of precedence rules. For example, the expression [tex2html_wrap5392] means [tex2html_wrap5394] rather than [tex2html_wrap5396], even though function application is normally assigned the highest precedence. Moreover, [tex2html_wrap5398] means [tex2html_wrap5400] rather than [tex2html_wrap5402]. We have taken many such anomalies into account.

The precedence table for operators t:precedence lists operators in ascending order of precedence. Only one operator is shown at each level.

[table585]
**Table:** Precedence table for mathematical operators.

Functions `define-precedence` and
`remove-precedence` allow the user to modify the precedence
table. These, however, are not for use by a casual user of AsTeR ,
since changes to the precedence table without a clear understanding of
the recognition algorithm can cause unexpected behavior.

As pointed out earlier, precedence rules alone are not sufficient to handle written mathematics. We adapt the algorithm by using the following heuristics:

- The big operators,
*e.g.,*[tex2html_wrap5432] and [tex2html_wrap5434], are treated as unary. Everything up to the next operator of lower precedence than the operator in question is considered part of the operand of the big operator. Thus, in the expression[displaymath5374]

everything up to the = sign is treated as the summand. This technique is particularly useful in recognizing expressions like [tex2html_wrap5438]. By our heuristic, the summation is correctly recognized as the second argument to the + sign. Further, the summand is terminated by the = sign. The expression is now equivalent to recognizing [tex2html_wrap5444], which can be handled by the standard algorithm.

- The integral operator can have an optional delimiter, as in
[tex2html_wrap5446]. If the [tex2html_wrap5448] is present and is recognizable
*i.e.,*has been marked up as`\d{x}`

as opposed to`dx`

, it is recognized as the closing delimiter; the variable of integration[+] is inferred. However, this closing delimiter may not always be available -it may be encoded ambiguously, as in`$\int f dx$`

, or the integral itself may not require a closing [tex2html_wrap5452], as in [tex2html_wrap5454]. In the former case, our recognizer treats the juxtaposition [tex2html_wrap5456] as the integrand. Though this may seem incorrect, it is in fact exactly what the typeset output means. In the latter case, the earlier rule (treating the operand of a big operator to be everything up to the first operator of lower precedence) applies. Hence, we can correctly recognize [tex2html_wrap5458]. - The closing delimiter [tex2html_wrap5460] is treated as such only if it occurs
at the top level. Thus, in
`$\frac{\dx}{x}$`

, the`\dx`

does not end the integrand. This allows us to recognize such integrals correctly, but we cannot now infer the variable of integration. There seems to be no clean solution for this problem. Written mathematical notation relies on the fact that [tex2html_wrap5462] means [tex2html_wrap5464] and the integrand is therefore [tex2html_wrap5466]. - Function application is treated as right associative. This
results in [tex2html_wrap5468] being interpreted correctly. Since
juxtaposition has been assigned a higher precedence than function
application, [tex2html_wrap5470] continues to be recognized
correctly. The following equation is a good example of such ambiguous
notation -note the complete absence of parentheses:
[displaymath5375]

- In written mathematics, delimiters do not always match. For
example, [tex2html_wrap5472] denotes a semi-open interval. There are also cases
where there is no matching closing delimiter. The recognizer is
aware of such anomalies and handles them correctly. When it sees an
open delimiter, it scans forward to the end of the math expression
for the first matching close delimiter of the same kind. If one is
found, then all of the input up to this point is treated as the
delimited expression. If no matching close delimiter of the same
kind is found, then the first unmatched close delimiter delimits the
input. Otherwise, the occurrence is treated as an unmatched
delimiter.
- The [tex2html_wrap5474] is one of the few postfix operators used in written mathematics. This is treated as a special case, and we confirm that the [tex2html_wrap5476] is indeed a factorial sign by making sure that it does not have any attributes. Thus, [tex2html_wrap5478] is not a factorial symbol.

[Next] [Up] [Previous]

Thu Mar 9 20:10:41 EST 1995