Auditory User Interfaces --List Of Figures

Figure 1.1: Computing applications typically consist of obtaining user input, computing on this information and finally displaying the results. The first and third phase in this process constitute the user interface. As can be seen, it is possible to separate the user interface from the computational phase.
Figure 1.2: Calendars are displayed visually using a two-dimensional layout that makes it easy to see the underlying structure. The calendar display consists of a set of characters on the screen; but the meaning of this display is as much in its visual layout as in the characters themselves. Merely speaking the text fails to convey meaning. We can see that January-1, 2000 is a Saturday; this information is missing when the visual display is spoken.
Figure 2.1: Sub-components of recorded prompts used by an IVR system at a bank. Different prompts can be generated by concatenating appropriate components.
Figure 2.2: Phonemes in American English. The various vowels and consonants making up standard American English are shown using a two-letter notation. Each phoneme is shown along with a word containing that phoneme.
Figure 2.3: Textual description of a nested exponent. Notice that when reading the prose making up this description, it is very difficult to perceive the underlying structure of the mathematical expression.
Figure 2.4: A call management system using word spotting. Users can express the same command in several ways. The recognition system looks for key phrases that determine the user command, thereby allowing for a flexible system.
Figure 2.5: Coarticulatory effects in continuous speech. Coarticulatory effects (or the lack there of) are often a problem when trying to synthesize natural sounding speech. Not surprisingly, the presence of these same effects in human speech make the computer's task of recognizing continuous speech even harder.
Figure 2.6: Using spatial audio to encode information about incoming email. Auditory cues indicate the arrival of new mail. These auditory cues encode additional information such as urgency of the message using spatial audio.
Figure 3.1: Visual realization of conversational gestures ---the building blocks for dialogues. User interface design tries to bridge the impedance mismatch in man-machine communication by inventing a basic set of conversational gestures that can be effectively generated and interpreted by both man and machine.
Figure 4.1: The Emacspeak desktop consists of a set of active buffer objects. This display shows a subset of currently active buffers on my desktop.
Figure 4.2: A sample directory listing. The visual interface exploits vertical alignment to implicitly encode the meaning of each field in the listing.
Figure 4.3: A listing of running processes. The task manager helps in tracking system resources. Processes can be killed or suspended from the task manager.
Figure 4.4: Commands available while searching. A set of highly context-specific conversational gestures.
Figure 4.5: Outline view of this section. It can be used to move quickly to different logical components of the document.
Figure 4.6: Result of folding the lexical analyzer in AsTeR . This is a document consisting of over $2,000$ lines. Folding helps in organizing the code, obtaining quick overviews, as well as in efficient navigation.
Figure 4.7: Sample collection of dynamic macros available when editing C-source. Standard C-constructs can be generated with a few gestures.
Figure 4.8: A sample C-program. It can be created with a few gestures when using dynamic macros.
Figure 4.9: A sample HTML page. Template-based authoring makes creating such documents easy.
Figure 4.10: Visual display of a structured data record. The data record is visually formatted to display each field name along with its value.
Figure 4.11: An expense report. Semantics of the various fields in each record is implicitly encoded in the visual layout.
Figure 4.12: Tracking an investment portfolio. Modifying entries can cause complex changes to the rest of the document.
Figure 4.13: A train schedule. We typically look for the information we want, rather than reading the entire timetable.
Figure 4.14: Commands in table browsing mode. The interface enables the user to locate the desired item of information without having to read the entire table.
Figure 4.15: A well-formatted display of the message headers presents a succinct overview of an email message in the visual interface. Speaking this visual display does not produce a pleasant spoken interface ---the spoken summary needs to be composed directly from the underlying information making up the visual display.
Figure 4.16: Newsgroups with unread articles are displayed in a *Group* buffer. This buffer provides special commands for operating on newsgroups. The visual interface shows the name of the group preceded by the number of unread articles.
Figure 4.17: Unread articles are displayed in buffer *Group Summary* . This buffer is augmented with special commands for reading and responding to news postings. The visually formatted output succinctly conveys article attributes such as author and subject.
Figure 4.18: More than one opening delimiter can appear on a line. When typing the closing delimiter, Emacspeak speaks the line containing the matching delimiter. The spoken feedback is designed to accurately indicate which of the several open delimiters is being matched.
Figure 4.19: An example of comparing different versions of a file. Visual layout exploits changes in fonts to set apart the two versions. The reader's attention is drawn to specific differences by visual highlighting ---here, specific differences are shown in a bold font. Visual interaction relies on the eye's ability to quickly navigate a two-dimensional display. Directly speaking such displays is both tedious and unproductive.
Figure 4.20: Browsing the Java Development Kit (JDK 1.1) using a rich visual interface. Understanding large object oriented systems requires rich browsing tools. Emacspeak speech-enables a powerful object oriented browser to provide a pleasant software development environment.
Figure 4.21: Emacspeak is implemented as a series of modular layers. Low-level layers provide device-specific interfaces. Core services are implemented on a device-independent layer. Application-specific extensions rely on these core services.
Figure 4.22: Advice is a powerful technique for extending functionality of pre-existing functions without modifying their source code. Here, we show the calling sequence for a function $f$ that has before, around, and after advice defined.
Figure 4.23: Example of advising a built-in Emacs command to speak. Here, command next-line is speech-enabled via an after advice that causes the current line to be spoken after every user invocation of this command.
Figure 5.1: HTML pages on the WWW of the 1990's abound in presentational markup. What does red text on a monochrome display mean? What does it mean to (er) blink aurally?
Figure 5.2: A sample aural style sheet fragment for producing audio formatted Webformation. Audio formatting conveys document structure implicitly in the aural rendering, allowing the listener to focus on the information content.
Figure 5.3: The HTML-3.2 specification fails to separate the underlying conversational gesture from its visual realization even more dramatically than GUI toolkits. In this example, it is impossible to decipher from the markup that the current dialogue expects the user to enter a name and age ---in HTML-3.2, there is no association between an edit field and its label.
Figure 5.4: The AltaVista main page. This page presents a search dialogue using a visual interface. Emacspeak presents a speech-enabled version of this dialogue that is derived from the underlying HTML.

Book Overview

Contents	Figures	Tables
Preface	Acknowledgements	Index

Email: raman@adobe.com

Last modified: Tue Aug 19 17:09:15 1997