The Complete Audio Desktop

T. V. Raman
Email: raman@cs.cornell.edu
URL: http://cs.cornell.edu/home/raman

Short Summary

Emacspeak is a speech interface that allows blind and visually impaired users to interact independently and efficiently with the computer. Available free of cost on the Internet, Emacspeak has dramatically changed how the author and hundreds of blind and visually impaired users around the world interact with the personal computer and the Internet.

Chapter 1
Emacspeak Case Study

1.1  Benefits

Emacspeak provides complete eyes-free access to daily computing tasks. By providing fluent spoken access to local and remote electronic information, the system opens up the wealth of information available on the Internet to visually impaired users.

Emacspeak introduces several improvements and innovations when compared with screenreaders designed to allow blind users to interact with personal computers. Unlike screenreaders that speak the contents of a visual display, Emacspeak speaks the underlying information. As an example, using a calendar application with a screenreader results in the blind user hearing a sequence of meaningless numbers; In contrast, Emacspeak speaks the relevant date in an easy to comprehend manner -see Figure 1.1.

January 2000

Sun Mon Tue Wed Thu Fri Sat
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

Figure 1.1: Calendars are displayed visually using a two dimensional layout that makes it easy to see the underlying structure. The calendar display consists of a set of characters on the screen; but the meaning of this display is as much in its visual layout as in the characters themselves. Merely speaking the text fails to convey meaning. We can see that January 1, 2000 is a Saturday; this information is missing when the visual display is spoken.

The system deploys the innovative technique of audio formatting to increase the band-width of aural communication; changes in voice characteristic and inflection combined with appropriate use of non-speech auditory icons are used throughout the user interface to create the equivalent of spatial layout, fonts, and graphical icons so important in the visual interface. This provides rich contextual feedback and shifts some of the burden of listening from the cognitive to the perceptual domain.

Finally, Emacspeak is completely free; in contrast, commercially available screenreaders typically double the cost of a personal computer. These innovations have together resulted in the system significantly increasing the ability of visually impaired individuals throughout the world to more effectively use information technology for work and leisure. As demonstrated by the user feedback included in this case study, this has resulted in visually impaired individuals throughout the world obtaining access to higher education as well as gainful employment in the high technology field.

Benefits To Visually Impaired Users

  1. Enables users to interact more effectively with online information.

  2. Opens the doors to higher education.

  3. Provides the tools to obtain gainful employment.

  4. Zero cost of ownership lowers threshold of entry.

Long-term Benefits To Society

History has repeatedly shown that technologies first invented to serve individuals with special needs eventually prove beneficial to all of mainstream society. The technological innovations introduced by Emacspeak are likely to once again bear this out.

Initially designed to provide efficient information access to visually impaired users, these same technological innovations will play a pivotal role in enabling ubiquitous information access for mainstream society, e.g., accessing the wealth of information on the Internet via a mobile telephone or whilst driving.

A Revolutionary Change

Emacspeak drastically changes how daily computing tasks are performed in an eyes-free environment. Earlier technologies forced the individual to listen to and interpret the meaning of visually presented information; this placed a significant cognitive burden on the user. Emacspeak, on the other hand, treats speech interaction as a first-class modality in the human-computer interface. This allows the user to focus on the task at hand, as opposed to repeatedly asking questions of the form:

Where is the computer cursor now?

Where is the mouse pointer now?

What is the significance of this button?

Impact On Mainstream Society

Emacspeak has laid the groundwork for developing full-fledged conversational interfaces suitable for deployment in personal information appliances such as automobile computers, palm-sized computers and mobile telephones. The Emacspeak environment can already be used on a pocket-sized hand-held computer. As these technologies move out into the mainstream market, they will have a profound impact on how we work and play in the coming millennium.

1.2  Relevance Of Information Technology

Emacspeak takes advantage of the hundreds of man years of research and development invested in the field of text to speech conversion. Text to speech technology forms the basis of providing eyes-free information access. Emacspeak also benefits significantly from the wealth of information available on the Internet. We live in an age where almost all information manifests itself as electronic bits at some point in its lifetime; this makes it possible for Emacspeak to level the playing field with respect to information access where visually impaired users are concerned.

Technological Innovations

  1. Audio Formatting

    This technique is analogous to the well-understood notion of visual formatting. Spoken information is overlaid with changes in voice characteristic and inflection and combined with non-speech auditory icons to produce rich aural presentations that significantly enhance the band-width of aural communication, thereby shifting some of the burden of listening from the cognitive to the perceptual domain.

  2. Speech-enabling Applications

    This technique involves building speech interaction directly into the user application, as opposed to building a separate application that speaks the visual display. Speech-enabled applications are easier and more productively used in an eyes-free environment, since such applications take full advantage of the available features of an auditory display.

1.3  User Acceptance

Emacspeak introduced the innovative speech-enabling approach to a target audience that had been entrenched in the well-established (though inferior) technology of screenreaders. Consequently, user acceptance of the system upon its introduction in 1995 was initially lukewarm. However, as more and more users try the system and find it more productive, this initial obstacle is being progressively eroded. From an initial handful of users in 1995, the system has had a steadily growing user base. Since 1997, Emacspeak has been bundled with all popular distributions of the Linux operating system; this has resulted in an exponential rise in the user base. With early adopters now vociferously convincing other blind users about the advantages inherent in the system, the Emacspeak user community is now growing rapidly. The group has its own mailing list on the Internet and today offers real-time support to new users as they find their way around the system.


  1. Developed because of the author's real need for such a system, Emacspeak is replete with features that have all been designed because of real user needs.

  2. Distributed free of cost, the zero cost of ownership significantly lowers the threshold of entry for visually impaired users.

  3. The system has enabled the author and many others to compete effectively at the workplace in the high technology field.

  4. The system opens up higher education to visually impaired users, thereby significantly enhancing their ability to obtain productive employment in the high technology field.

In summary, speech-enabling all user applications and providing efficient aural communication via audio formatted output are Emacspeak's key original features. In 1995, Emacspeak was also the first system designed for visually impaired users to be distributed free of cost on the Internet.

Measure Of Success

The Emacspeak project has succeeded well beyond its original goal of providing a usable eyes-free audio desktop for the author's personal use. Today, Emacspeak is widely used around the world by hundreds of blind and visually impaired users to gain spoken access to the Internet. The system is being widely used within several government agencies, notable amongst these the National Security Agency (NSA), to provide spoken access to computer workstations running the UNIX operating system. Additionally, Emacspeak has also benefitted users with learning disabilities who find that listening to information -in addition to seeing it on the screen- enhances comprehension.

The success of the project is evinced by the following quotes from Emacspeak users:

A System Administrator At Vassar College Writes

Although I am sighted I have a reading impairment. Prior to using Emacspeak I relied on recorded audio tapes for my professional and personal reading, and on screen readers for my computer interaction.

Since discovering Emacspeak, I have never gone back to screen readers. Emacspeak provides spoken feedback in a way that is both aware of and responsive to applications and their data. As a result, I have been able to tailor my work environment to provide spoken feedback that is context-sensitive. This means that I am now able to work more efficiently and more enjoyably than ever before.

I have used Emacspeak to enable me to work on several NSF projects, most notably the Corpus Encoding Standard (CES) which is now used in the US and throughout Western, Central and Eastern Europe. I cannot imagine how I would have accomplished this work without the access afforded me by the Emacspeak interface. For the last several years I have been using Emacspeak not only in the conventional settings of desktop computers and workstations at the office and at home, but also as a full time digital assistant. I have found it an ideal output system for use in wearable computing. Using my current setup I am able to provide system administration to over 100 users and several dozen machines through the consistent interface that Emacspeak provides.

Many newspapers and professional publications are now available on the Internet. Using the Emacspeak interface, I am for the first time able to skim a newspaper or magazine, choose which articles I wish to read, and read them at the same time as my coworkers and friends who may be reading the same material from traditional sources. This allows me to discuss current literature with my peers in a timely fashion. While this may seem like a simple thing that most folks take for granted, it was not possible for me in the past. The time delay in getting materials read onto audio cassette always meant that by the time I had had the opportunity to read a journal, the rest of my peers had moved on to the next issue.

Thanks to the information and access available to me through Emacspeak I now have a greater degree of both social interaction and independence than in the past. Put simply, Emacspeak enables me to participate more fully in life.

Greg Priest-Dorman
Laboratory Coordinator, Computer Science Department
Vassar College, Poughkeepsie, NY Email: priestdo@cs.vassar.edu

An Employee At GE Says

I feel that Emacspeak has exceeded its goals since it is a totally new and innovative approach to providing speech output to a blind computer user. For example, using audio Icons in conjunction with spoken output gives the user the confidence that what he/she thinks should be taking place indeed really is. I have been using Emacspeak on my job as a software engineer at GE-Harris, Railway Electronics for the last 10 months. I started using Emacspeak in 1995 when I started the Computer Science Undergraduate program at the University of Central Florida. Thanks to Emacspeak, I work directly in the Unix environment. I do all my programming using Emacspeak.

In conclusion, I have benefited from the Emacspeak project both in obtaining my educational and professional goals.

Mo McCleary
GE Harris Railway Electronics, LLC
Email: mmcclear@ge-harris.com

A Recently Rehabilitated Systems Administrator Says

At first I found it extremely difficult learning to work within a sightless programming and admin environment. However, 12 months later and I can't imagine using any environment other than Emacspeak. I am again working as a system administrator, developing software and currently organizing to return to my Phd. I don't think I would have been able to do all of this without Emacspeak.

In conclusion, I would like to sincerely thank you for developing Emacspeak. I can honestly say that I doubt very much that I would have been able to return to either my development work or post graduate research if it had not been for Emacspeak.

Tim Cross
System Administrator
Email: tcross@tim.northnet.com.au

A Student From Australia Says

As a university student, I am undertaking graduate research in philosophy, whilst simultaneously studying law at an undergraduate level. To engage in these intellectual pursuits, I need access to electronic information, such as that which is available via the World Wide Web. Moreover, it is also necessary to prepare beautifully typeset documents, relying solely and speech output. The Emacspeak auditory interface uniquely supports and facilitates both of these activities, together with routine tasks such as file management and the manipulation of electronic mail. Also, Emacspeak has enabled me to gain a greater understanding of computing in general and, more specifically, the advantages and drawbacks of different types of user interface. In conclusion, I would point out that it is Emacspeak which has enabled me to move beyond the limitations imposed by the DOS environment, and take advantage of some of the most capable and flexible software currently available (such as the Emacs editor, the LaTeX typesetting system and indeed the Linux environment itself).

Jason John Griffin White
Email: jasonw@ariel.ucs.unimelb.edu.au

A New Employee At Gateway Writes

Emacspeak has been the single most effective tool I've used in the past 6 years. Effective in providing spoken feedback while I re-learned to write software without sight; Effective in allowing me to learn new skills, languages, and technologies from my Linux workstation, all of which contributed enormously to obtaining my new position at Gateway; Effective in providing a platform upon which I contributed to other projects that sought to increase accessibility of computers to the blind community. Emacspeak is a powerful tool, available to anyone at absolutely no cost. It has been an open door of opportunity for me, and remains open for everyone.

Brian L. Sellden
Email: brian@henge.com
Just another hack at Gateway
User of Emacspeak, making Unix talk.

1.4  Background And Project Evolution

After using off-the-shelf commercial screenreaders for about five years, I started developing Emacspeak for my personal use in the fall of 1994. The decision to develop a new system was motivated by my frustration with the quality of information access provided by conventional screenreader technology.

Using my computer science background and experience implementing AsTeR -Audio System For Technical Readings- I designed Emacspeak as a fully functional audio desktop. AsTeR had pioneered the technique of audio formatting for speaking structured documents; Emacspeak took this invention one step further by applying audio formatting to the entire computer interface.

By early 1995, I had replaced the commercial screenreader with Emacspeak for all my computing tasks; after further testing, the system was made freely available on the Internet.

I continue to extend and enhance Emacspeak based on user needs; every six months, I make a stable release of the system on the Internet. Once released, the system is mirrored world-wide by a network of software archives. Distributors of the freely available Linux operating system then include the distribution on CDROM for wider circulation.

Obstacles Faced

Lack of resources has been the primary obstacle. Emacspeak is developed, maintained and supported entirely in the author's spare time. To extend the benefits of Emacspeak to the larger community of non-technical users, the project needs significant resources in order to provide the requisite training materials and user education.

As described earlier, Emacspeak also faced the initial hurdle of introducing a revolutionary solution to a user community that was entrenched in using traditional screenreaders for gaining non-visual access to computing. This obstacle has diminished with an increasingly well-informed user base. However, there is still considerable user education required in bringing potential blind users and professionals working in the field of rehabilitation up to speed with the innovations introduced by Emacspeak.

1.5  References

Gregg C Vanderheiden PhD.
Professor - Human Factors
Department of Industrial Engineering
University of Wisconsin
Director - Trace R&D Center
Email: gv@trace.wisc.edu
Voice-mmail: 608 263 5788

David Gries
William L. Lewis Professor of Engineering
Computer Science Department
Cornell University
Ithaca, NY
Email: gries@cs.cornell.edu
Voice-mmail: 607 255 9207

1.6  OnLine Resources

  1. Author's WWW Home Page.
    URL: Http://cs.cornell.edu/home/raman

  2. Emacspeak WWW Site.
    URL: http://cs.cornell.edu/home/raman/emacspeak

  3. Online Publications

    1. Envisioning Speech. Scientific American, September 1996. Wayte Gibbs.
      URL: http://www.sciam.com/0996issue/0996profile.html

    2. Netsurfing Without A Monitor. Scientific American, March 1997. T. V. Raman
      URL: http://www.sciam.com/0397issue/0397raman.html

    3. Speaking Of Mathematics. American Scientist, March 1996. Brian Hayes.
      URL: http://www.amsci.org/amsci/issues/Comsci96/compsci96-03.html

    4. Emacspeak -A Speech Enabling Interface. Dr. Dobb's Journal, September 1997. T. V. Raman
      URL: http://www.ddj.com/ddj/1997/1997_09/rama.htm

    5. User Interface -A Means To An End. Dr. Dobb's Journal, August 1997. T. V. Raman.
      URL: http://www.ddj.com/oped/1997/raman.htm

  4. AsTeR -Audio System For Technical Readings
    URL: http://cs.cornell.edu/home/raman/aster

  5. Emacspeak mailing list archives
    URL: http://www.cs.vassar.edu/ priestdo/emacspeak/

Material For Time Capsule

Invented as an effective solution for providing eyes-free information access to the visually impaired, Emacspeak is poised to revolutionize the current state of the art in mobile, automobile and hand-held computing. The technological innovations introduced by Emacspeak could be deployed to profoundly impact how we interact with electronic information in our daily lives.

Is Our Society Prepared For This Application?

Today's information age society is more than prepared for this application. The exponential rise in the amount of information we process in our daily lives requires a concomitant increase in the efficiency of the tools we use to process the incoming information. Whereas existing interfaces insist on overloading the visual channel of communication for processing all information, speech-enabled interfaces as pioneered by Emacspeak help individuals deal more effectively with the information avalanche we all face. Where technological innovations in information processing have progressively increased the ability of the computer to perform more and more information processing tasks in parallel, the user-interface innovations introduced by Emacspeak help balance the human side of the human-computer interface equation.

Is It Affordable?

Emacspeak is completely free of cost. With personal computers continuously increasing in speed (and falling in price), technological innovations such as Emacspeak will become part of the basic computing experience.

Was It Well-represented To The Public?

Emacspeak as a speech-enabled solution for visually impaired users has been well-represented to the relevant user community as demonstrated by the user feedback included in the accompanying case study. The technological innovations have been represented to the scientific community in numerous research publications that are also available online. These online resources have been discovered by the scientific and popular press, resulting in a number of articles in well-reputed magazines such as Scientific American that have described the technology for the general public. However, making the average citizen aware of the potential benefits of every-citizen interfaces remains an ongoing task.

Biographical Sketch

After receiving his PhD from Cornell University in 1994, T. V. Raman worked at Digital Equipment Corporation's Cambridge Research Lab (CRL) until September 1995. He now works at Adobe Systems, San Jose, Calif., on electronic documents and information reuse. He also continues to work on speech interaction and recently published a book, Auditory User Interfaces: Toward the Speaking Computer (Kluwer Academic Publishers, 1997).

Technical Background

My work on rich aural interfaces began with the work on AsTeR (Audio System For Technical Readings) which is best introduced by the following quote:

The advent of electronic documents makes information available in more than its visual form -electronic information can now be display independent. We describe a computing system, AsTeR, that audio formats electronic documents to produce audio documents.

The development of AsTeR was the basis of my dissertation, which was presented to the Faculty of the Graduate School of Cornell University in fulfillment of the Requirements for the Degree of Doctor of Philosophy in 1994. This summary, written four years later, puts the work in perspective with respect to the developments in the world of electronic information and auditory interfaces between 1994 and 1998.

AsTeR was motivated by the insight that information presentation needs to take advantage of the specific perceptual modality in use. This typeset manuscript exploits features of visual interaction to convey information effectively; in the same vein, AsTeR introduced the notion of audio formatting to enable rich aural presentations of structured information.

Speech-enabling Applications

The insights gained from developing and using AsTeR have been applied to the more general problem of providing aural access to computer interfaces, starting in late 1994. Computer interfaces encapsulate man-machine dialogue, and once we realize that ``The document is the interface'', the technique of synthesizing effective aural presentations starting with the information instead of its visual presentation leads naturally to the speech-enabling approach -see the relevant online publications.

The speech-enabling approach -a technique that separates computation from the user interface -is described in detail in the book Auditory User Interfaces: Toward the Speaking Computer. Application designers can implement desired features in the computational component and have different user interfaces expose the resulting functionality in the manner most suited to a given user environment. This leads to the design of high-quality Auditory User Interfaces (AUI) that integrate speech as a first class citizen into the user interface.

Structured Information And The WWW

AsTeR pointed out the advantages to come in a world where documents are created electronically before being turned into modality specific presentations such as typeset documents. The work also pointed out the need for such electronic information to be well-structured to enable computation on this information.

The last few years have seen an explosive growth in electronic information on the Internet fueled by the popularity of the WWW. The initial rush to the WWW resulted in publishers putting out rich visual content with a concomitant abuse of document structure as envisioned in AsTeR. As a consequence, content providers on the WWW today face many of the challenges outlined in AsTeR when attempting to create electronic content that can be re-purposed for publishing online as well as in traditional print formats. This has also led to a vast amount of Webformation that is becoming increasingly difficult to navigate and categorize.


As humans, we see, hear, feel and smell. Human interaction is enriched by the concomitant redundancy introduced by multimodal communication. In stark contrast, computer interfaces until now have relied primarily on visual interaction -today's interfaces are like the silent movies of the past! As we approach the turn of the millennium, computers now have the ability to talk, listen and perhaps, even understand. Integrating new modalities like speech into human-computer interaction requires rethinking how information systems are designed in today's world of visual computing.

Visually rich computing introduced the notion of What You See Is What You Get (WYSIWYG) documents; but by carrying it too far, we risk ending up in a world of ``What You See Is All You Have'' documents. On the positive side, the exponential growth in electronic information combined with a desire to be able to intelligently process this content and access it whenever, wherever and however the user chooses provides adequately strong reasons to suggest that the world will move away from the present situation of see-only documents.

That a blind person can navigate the Internet just as efficiently and effectively as any sighted person attests to the profound potential of digital documents to improve human communication. Printed documents are fixed snapshots of changing ideas; they limit the means of communication to the paper on which they are stored. But in electronic form, documents can become raw material for computers that can extract, catalogue and rearrange the ideas in them. Used properly, technology can separate the message from the medium so that we can access information wherever, whenever and in whatever form we want.

Archiving information in a structurally rich form will ensure that this vast repository of knowledge can be reused, searched and displayed in ways that best suit individuals' needs and abilities, using software not yet invented or even imagined. The coming millennium is likely to prove an exciting one in the world of electronic information and speech interaction.

T. V. Raman
Mountain View, CA.
URL: http://cs.cornell.edu/home/raman

December 15, 1998

File translated from TEX by TTH, version 1.96.
On 17 Dec 1998, 12:15.