Multimedia - getting started

The primary focus of this document is Real Audio tools, audio file formats and text-to-speech applications, as needed for the CS519 project. This document is by no means complete. After reading this you'll probably have to do a lot of extra reading before starting work. Whenever I refer to specific setup issues such as path names, I refer to the particular setup on the project servers.

Real Audio

Real Audio is the name of the family of audio/video tools designed by Real Networks. The family consists of clients, servers, content creation tools and SDKs. The servers generate streams of constant bandwidth. The clients use buffering to deal with delay jitter. As transport they use multicast IP, unicast UDP, TCP or HTTP. The protocol used for streaming is RTSP (Real Time Streaming Protocol) documented in rfc2336. You can find a lot of documentation installed along with the Real products, and you can also access their support site if you want more.

Real Audio clients

The Real  audio clients also called players can be used as stand-alone applications or as plug-ins. The Real Player 5.0 is free and runs on every platform I can think of. The newer Real Player Plus G2 is freely downloadable for trial and runs only on Windows 95/98 and NT 4.0, but it's nicer. The Real Player 5.0 is installed both under NT and Linux and G2 only under NT.

The most important settings for the Real Player (under View / Preferences) are the bandwidth and transport type. I recommend setting the transport type to TCP under Linux.

Real Audio servers

There are lots of types of Real  Servers. Under NT we use the one we have a license for and under Linux we use one downloaded for trial. If it expires, let me know and I'll download a new one or we'll buy a license for Linux too. You have to start the servers manually. They are located in \Program Files\Real\RealServer\Bin and /usr/local/pnserver/. The data files are in the "content" subdirectory.

You can set the maximum number of clients accepted by the server and limit the aggregate bandwidth that can be used by clients. There are a number of tools for on-line monitoring of the sever (Java Performance Monitor, server Status and Control, etc.).

File formats and mime types

Real Networks uses the following formats:

RealAudio clip (.ra) Audio encoded to RealAudio format. This type of file is delivered by RealServer and is played on a RealPlayer.
RealVideo clip (.rm) Audio and video encoded to RealVideo format. This file can contain multiple streams, including audio, video, image maps, and events. This type of file is delivered by RealServer and is played on a RealPlayer.
RealAudio or RealVideo metafile (.ram) Connects a Web page to one or more RealAudio or RealVideo clips. This metafile is located on your Web server and is linked to from your Web page. A metafile contains the URL(s) for one or more clips stored on your RealServer.
RealPlayer Plug-in metafile (.rpm) Like a RealAudio or RealVideo metafile (above), but used with RealPlayer Plug-in for Netscape Navigator and Internet Explorer 3.0 or later.
RealFlash clip (.swf) Animation in RealFlash format.

You have to set the mime type "audio/x-pn-realaudio-plugin" for .rpm files to be used with the Real Audio plugin and the mime type "audio/x-pn-realaudio" for .ra , .rm and .ram files that will be played by RealPlayer as a helper application.

Real tools

There are lots of tools used to create Real Audio content. The one best suited for our purposes is RealEncoder. With this program you can create ".ra"s interactively if you have a microphone and you can convert other sound formats to .ra even from the command line. We have this both under NT and Linux.

Real SDKs

The SDKs are available only under Windows and Mac, so  if you're not content with what you can get out of the encoder and want to write your own code you'll have to use NT. I installed 3 SDKs: the "RealSystem G2 SDK Developer's Guide" in directory rmasdk under Program Files, the "RealEncoder SDK Developer's Guide" under RealSDK\G2Encoder and Real Player SDK in \Real\REALSDK. There are huge amounts of documentation available for all of these.

Text to speech applications

I recommend you trying http://www.bell-labs.com/project/tts/index.html, just for fun. We won't use that system in the project. There are two candidates for you to choose from: the Microsoft SAPI 4.0 (Speech API) already installed on the servers and the Eloquence TTS system that was used among other things in a program for reading aloud mathematical text to the visually impaired developed by a CS grad student in 1996 that got a lot of publicity. You can read more about this in the "Computer Industry News" board from the 4th floor of Upson.

Microsoft SAPI

The Microsoft SAPI is installed under NT on all the servers. I recommend trying first TTSDemo.exe and ttsapp.exe from the directory TTS. Start by reading the documentation. Good luck!

Eloquence TTS system

The ETI-Eloquence SDK is installed on all the servers under the "Eloquent Technology, Inc." directory in "Program files". It sounds better than the Microsoft TTS system, it's a lot smaller, and the documentation should  get you startted quite fast. If you use this system you can use either the SAPI library, or their own platform-independent API called ECI.

Documentation: The ECI and SAPI .rtf files describe the API choices. The Help files for both the SAPI and ECI APIs contain the SAPI Tag instructions and the ECI Annotations, respectively, and are a very valuable resource.

Demos: EloqTalk uses the MS SAPI interface and Elocutor uses ECI. Elocutor understands only ECI annotations. With Elocutor, you can output speech, Wave files and SPR (Symbolic Phoneme Representation(s). Open any ECI annotated text file (e.g. MainDemo.txt or TextProcessing.txt),  highlight the text and, click on the first blue balloon in the tool bar to speak. EloqTalk understands only SAPI tags. Open any SAPI tagged text file and,  Click on the forward arrow (left most tool) in the tool bar to speak.