Auditory User Interfaces --List Of Figures
- Figure 1.1
- Computing applications typically consist of
obtaining user input, computing on this information and finally
displaying the results. The first and third phase in this process
constitute the user interface. As can be seen, it is possible to
separate the user interface from the computational phase.
- Figure 1.2
- Calendars are displayed visually using a
two-dimensional layout that makes it easy to see the underlying
structure. The calendar display consists of a set of characters on the
screen; but the meaning of this display is as much in its visual
layout as in the characters themselves. Merely speaking the text fails
to convey meaning. We can see that January-1, 2000 is a Saturday; this
information is missing when the visual display is spoken.
- Figure 2.1
- Sub-components of recorded prompts used by an IVR
system at a bank. Different prompts can be generated by concatenating
- Figure 2.2
- Phonemes in American English. The various vowels
and consonants making up standard American English are shown using a
two-letter notation. Each phoneme is shown along with a word
containing that phoneme.
- Figure 2.3
- Textual description of a nested exponent. Notice
that when reading the prose making up this description, it is very
difficult to perceive the underlying structure of the mathematical
- Figure 2.4
- A call management system using word
spotting. Users can express the same command in several ways. The
recognition system looks for key phrases that determine the user
command, thereby allowing for a flexible system.
- Figure 2.5
- Coarticulatory effects in continuous
speech. Coarticulatory effects (or the lack there of) are often a
problem when trying to synthesize natural sounding speech. Not
surprisingly, the presence of these same effects in human speech make
the computer's task of recognizing continuous speech even harder.
- Figure 2.6
- Using spatial audio to encode information about
incoming email. Auditory cues indicate the arrival of new mail. These
auditory cues encode additional information such as urgency of the
message using spatial audio.
- Figure 3.1
- Visual realization of conversational gestures
---the building blocks for dialogues. User interface design tries to
bridge the impedance mismatch in man-machine communication by
inventing a basic set of conversational gestures that can be
effectively generated and interpreted by both man and machine.
- Figure 4.1
- The Emacspeak desktop consists of a set of active
buffer objects. This display shows a subset of currently active
buffers on my desktop.
- Figure 4.2
- A sample directory listing. The visual interface
exploits vertical alignment to implicitly encode the meaning of each
field in the listing.
- Figure 4.3
- A listing of running processes. The task manager
helps in tracking system resources. Processes can be killed or
suspended from the task manager.
- Figure 4.4
- Commands available while searching. A set of
highly context-specific conversational gestures.
- Figure 4.5
- Outline view of this section. It can be used to
move quickly to different logical components of the document.
- Figure 4.6
- Result of folding the lexical analyzer in AsTeR
. This is a document consisting of over $2,000$ lines. Folding helps
in organizing the code, obtaining quick overviews, as well as in
- Figure 4.7
- Sample collection of dynamic macros available
when editing C-source. Standard C-constructs can be generated with a
- Figure 4.8
- A sample C-program. It can be created with a few
gestures when using dynamic macros.
- Figure 4.9
- A sample HTML page. Template-based authoring
makes creating such documents easy.
- Figure 4.10
- Visual display of a structured data record. The
data record is visually formatted to display each field name along
with its value.
- Figure 4.11
- An expense report. Semantics of the various
fields in each record is implicitly encoded in the visual layout.
- Figure 4.12
- Tracking an investment portfolio. Modifying
entries can cause complex changes to the rest of the document.
- Figure 4.13
- A train schedule. We typically look for the
information we want, rather than reading the entire timetable.
- Figure 4.14
- Commands in table browsing mode. The interface
enables the user to locate the desired item of information without
having to read the entire table.
- Figure 4.15
- A well-formatted display of the message headers
presents a succinct overview of an email message in the visual
interface. Speaking this visual display does not produce a pleasant
spoken interface ---the spoken summary needs to be composed directly
from the underlying information making up the visual display.
- Figure 4.16
- Newsgroups with unread articles are displayed in
buffer. This buffer provides special commands for operating on
newsgroups. The visual interface shows the name of the group preceded
by the number of unread articles.
- Figure 4.17
- Unread articles are displayed in buffer
*Group Summary* . This buffer is augmented with special commands for reading and
responding to news postings. The visually formatted output succinctly
conveys article attributes such as author and subject.
- Figure 4.18
- More than one opening delimiter can appear on a
line. When typing the closing delimiter, Emacspeak speaks the line
containing the matching delimiter. The spoken feedback is designed to
accurately indicate which of the several open delimiters is being
- Figure 4.19
- An example of comparing different versions of a
file. Visual layout exploits changes in fonts to set apart the two
versions. The reader's attention is drawn to specific differences by
visual highlighting ---here, specific differences are shown in a bold
font. Visual interaction relies on the eye's ability to quickly
navigate a two-dimensional display. Directly speaking such displays is
both tedious and unproductive.
- Figure 4.20
- Browsing the Java Development Kit (JDK 1.1)
using a rich visual interface. Understanding large object oriented
systems requires rich browsing tools. Emacspeak speech-enables a
powerful object oriented browser to provide a pleasant software
- Figure 4.21
- Emacspeak is implemented as a series of modular
layers. Low-level layers provide device-specific interfaces. Core
services are implemented on a device-independent
layer. Application-specific extensions rely on these core services.
- Figure 4.22
- Advice is a powerful technique for extending
functionality of pre-existing functions without modifying their source
code. Here, we show the calling sequence for a function $f$ that has
before, around, and after advice defined.
- Figure 4.23
- Example of advising a built-in Emacs command to
speak. Here, command next-line
is speech-enabled via an after advice that causes the current line to
be spoken after every user invocation of this command.
- Figure 5.1
- HTML pages on the WWW of the 1990's abound in
presentational markup. What does red text on a monochrome display
mean? What does it mean to (er) blink aurally?
- Figure 5.2
- A sample aural style sheet fragment for producing
audio formatted Webformation. Audio formatting conveys document
structure implicitly in the aural rendering, allowing the listener to
focus on the information content.
- Figure 5.3
- The HTML-3.2 specification fails to separate the
underlying conversational gesture from its visual realization even
more dramatically than GUI toolkits. In this example, it is impossible
to decipher from the markup that the current dialogue expects the user
to enter a name and age ---in HTML-3.2, there is no association
between an edit field and its label.
- Figure 5.4
- The AltaVista main page. This page presents a
search dialogue using a visual interface. Emacspeak presents a
speech-enabled version of this dialogue that is derived from the
Last modified: Tue Aug 19 17:09:15 1997