ASTER Demonstration

is dedicated to my Guide-Dog. AsTeR --Audio System For Technical Readings-- is a computing system for rendering technical documents in audio. AsTeR was developed by me for my PhD. (141 pages) An audio formatted version of the thesis, (approximately 6 hours) produced by AsTeR, is being made available by RFB (Recordings For the Blind as the first computer generated talking book. Here is the abstract in print, and here is an audio formatted version.

This hypertext document demonstrates the audio renderings generated by AsTeR. Here is an enhanced demo using inline images. Each example is made up of three components:

The original LaTeX input.
The audio formatted output produced by AsTeR. The speech is produced by a Dectalk, and has been digitized at 8-bit mulaw AsTeR uses stereo to render tables, an effect that is not conveyed by the 8-bit mono encoding.
The visually formatted version produced by LaTeX and DVIPS.

How to use this demo:

The examples in this demonstration get progressively difficult. I suggest you go through the initial sections sequentially; For short demos, I typically show people the first three sections, and round it off with the continuous fraction in section 4 and a quick overview of Faa De Bruno's formula.

Here is the Postscript file containing all the examples, in case you want to look over them first. I am not placing a single file containing all of the audio examples since this would be about 9MB.

Section 1 simple fractions and expressions.

This set of examples demonstrates the use of voice inflection and pauses to convey grouping of sub-expressions succinctly.

Audio state varied along a dimension in audio space before rendering sub-expressions.

Section 2 superscripts and subscripts.

To convey subscripts, superscripts, and other visual attributes, vary audio state along a dimension that is orthogonal to (independent of ) the dimension used to convey sub-expressions. This will allow the nesting of these mutually independent concepts.

Section 3 Knuth's examples of fractions and exponents.

These examples are taken verbatim from the TeX Book, by Donald Knuth. They are used in the TeX Book to demonstrate the power of the TeX layout operators. Notice that all of these examples comprise of the same 6 symbols, but are very different! AsTeR can render these as unambiguously as TeX can.

Section 4 A continued fraction.

Moving along a dimension in audio space defines a perceptibly monotonic change. This notion of perceptible monotonicity is vital in conveying nesting.

audio LaTeX Postscript

Section 5 Simple School algebra.

Section 6 square roots.

Notice the choice of unambiguous renderings for the following expressions:

Section 7 Trigonometric identities.

Written mathematical notation can be ambiguous and hard to recognize. Notice the complete absence of parenthesis in some of the examples below. AsTeR uses several heuristics to construct the correct tree structure for these expressions.

Section 8 Logarithms.

Notice the context-specific rendering when speaking the base of the logarithm. The renderings are chosen to reduce cognitive load;

log base a of x

as opposed to

log of x to the base a

Section 9 Series.

Context-specific rendering rules allow AsTeR to interpret the superscripts as exponents. Such interpretation is not hard-wired into the renderings; it is fully customizable by the user.

Section 10 Integrals.

The first of these examples, probably the most innocuous, is also the most difficult to recognize; it is impossible to determine the variable of integration.

Notice that AsTeR interprets triple integrals as the nested application of the integral operator. A user can browse the triple integral and listen to its sub-pieces.

The integrals shown in examples 3 and 4 can trick the most experienced of human readers into an error.

Section 11 Summations.

Notice that the same expression can be written in more than one way.

Section 12 Limits.

Section 13 Cross referenced equations.

The following section is meant to illustrate AsTeR's rendering of cross-references, and is most effective when AsTeR is used interactively.

AsTeR enables the listener to give meaningful names to cross-referenceable objects, and uses these names when referring to such objects in later cross-references.

Section 14 Distance formula.

Notice that AsTeR produces good intonational structure when speaking text that is intermixed with mathematics. audio LaTeX Postscript

Section 15 Quantified expression.

The quantifiers present an interesting challenge to AsTeR's recognizer. audio LaTeX Postscript

Section 16 Exponentiation.

Once again, perceptible monotonicity allows AsTeR to convey the following deeply nested expressions succinctly.

These examples were produced with the Emacs Calculator, a full-fledged symbolic algebra system. AsTeR interfaces directly with this calculator, and renders the output just as well as it can render any document.

Section 17 A generic matrix.

AsTeR uses stereo effects to convey the two-dimensional structure of the matrix. Rendering commences on the left, and moves progressively right as each element of any row is spoken.

audio LaTeX Postscript

Section 18 Faa de Bruno's formula.

This section presents Faa De Bruno's formula, taken from Knuth's Art Of Computer Programming, Vol. 1. I first heard it spoken by a RFB reader on a talking book; it took 120 seconds to speak.

Since the renderings produced by AsTeR utilize features of the audio space not available to a human reader (I still have not met a reader who can change the size and shape of her head as she talks:-) the rendering takes under 80 seconds.

As you will hear soon, even this is too long; you forget the beginning by the time you hear the end.

Later, we present rendering using variable substitution, a powerful technique for conveying top-level structure of complex expressions.

Notice the proper intonational structure produced for text intermixed with mathematics.
audio LaTeX Postscript
audio LaTeX Postscript
Here is Faa De Bruno's formula in all its glory:-
Audio (66 seconds) LaTeX Postscript

AsTeR can process complex expressions like the above, and upon request, replace complex sub-expressions with meaningful identifiers. Such renderings convey top-level structure; the listener can then listen to the sub-expressions separately.

Since this substitution process is performed by AsTeR, there is no LaTeX or Postscript equivalent for the audio output in this case.

The top-level formula. audio (20 seconds)
Lower constraint 1. audio (20 seconds)
Numerator. audio (15 seconds)
Denominator. audio (14 seconds)

T.V. Raman raman@crl.dec.com

Last modified: Fri Aug 5 10:06:00 1994