An introduction to digital libraries
This is a fascinating period in the history of libraries and publishing. For the first time, it is possible to build large-scale services where collections of information are stored in digital formats and retrieved over networks. The materials are stored on computers. A network connects the computers to personal computers on the users' desks. In a completely digital library, nothing need ever reach paper.
This book provides an overview of this new field. Partly it is about technology, but equally it is about people and organizations. Digital libraries bring together facets of many disciplines, and experts with different backgrounds and different approaches. The book describes the contributions of these various disciplines and how they interact. It discusses the people who create information and the people who use it, their needs, motives, and economic incentives. It analyzes the profound changes that are occurring in publishing and libraries. It describes research into new technology, much of it based on the Internet and the World Wide Web. The topics range from technical aspects of computers and networks, through librarianship and publishing, to economics and law. The constant theme is change, with its social, organizational, and legal implications.
One book can not cover all these topics in depth, and much has been left out or described at an introductory level. Most of the examples come from the United States, with prominence given to universities and the academic community, but the development of digital libraries is world-wide with contributions from many sources. Specialists in big American universities are not the only developers of digital libraries, though they are major contributors. There is a wealth and diversity of innovation in almost every discipline, in countries around the world.
An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. A key part of this definition is that the information is managed. A stream of data sent to earth from a satellite is not a library. The same data, when organized systematically, becomes a digital library collection. Most people would not consider a database containing financial records of one company to be a digital library, but would accept a collection of such information from many companies as part of a library. Digital libraries contain diverse collections of information for use by many different users. Digital libraries range in size from tiny to huge. They can use any type of computing equipment and any suitable software. The unifying theme is that information is organized on computers and available over a network, with procedures to select the material in the collections, to organize it, to make it available to users, and to archive it.
In some ways, digital libraries are very different from traditional libraries, yet in others they are remarkably similar. People do not change because new technology is invented. They still create information that has to be organized, stored, and distributed. They still need to find information that others have created, and use it for study, reference, or entertainment. However, the form in which the information is expressed and the methods that are used to manage it are greatly influenced by technology and this creates change. Every year, the quantity and variety of collections available in digital form grows, while the supporting technology continues to improve steadily. Cumulatively, these changes are stimulating fundamental alterations in how people create information and how they use it.
To understand these forces requires an understanding of the people who are developing the libraries. Technology has dictated the pace at which digital libraries have been able to develop, but the manner in which the technology is used depends upon people. Two important communities are the source of much of this innovation. One group is the information professionals. They include librarians, publishers, and a wide range of information providers, such as indexing and abstracting services. The other community contains the computer science researchers and their offspring, the Internet developers. Until recently, these two communities had disappointingly little interaction; even now it is commonplace to find a computer scientist who knows nothing of the basic tools of librarianship, or a librarian whose concepts of information retrieval are years out of date. Over the past few years, however, there has been much more collaboration and understanding.
Partly this is a consequence of digital libraries becoming a recognized field for research, but an even more important factor is greater involvement from the users themselves. Low-cost equipment and simple software have made electronic information directly available to everybody. Authors no longer need the services of a publisher to distribute their works. Readers can have direct access to information without going through an intermediary. Many exciting developments come from academic or professional groups who develop digital libraries for their own needs. Medicine has a long tradition of creative developments; the pioneering legal information systems were developed by lawyers for lawyers; the web was initially developed by physicists, for their own use.
Technology influences the economic and social aspects of information, and vice versa. The technology of digital libraries is developing fast and so are the financial, organizational, and social frameworks. The various groups that are developing digital libraries bring different social conventions and different attitudes to money. Publishers and libraries have a long tradition of managing physical objects, notably books, but also maps, photographs, sound recordings and other artifacts. They evolved economic and legal frameworks that are based on buying and selling these objects. Their natural instinct is to transfer to digital libraries the concepts that have served them well for physical artifacts. Computer scientists and scientific users, such as physicists, have a different tradition. Their interest in digital information began in the days when computers were very expensive. Only a few well-funded researchers had computers on the first networks. They exchanged information informally and openly with colleagues, without payment. The networks have grown, but the tradition of open information remains.
The economic framework that is developing for digital libraries shows a mixture of these two approaches. Some digital libraries mimic traditional publishing by requiring a form of payment before users may access the collections and use the services. Other digital libraries use a different economic model. Their material is provided with open access to everybody. The costs of creating and distributing the information are borne by the producer, not the user of the information. This book describes many examples of both models and attempts to analyze the balance between them. Almost certainly, both have a long-term future, but the final balance is impossible to forecast.
Why digital libraries?
The fundamental reason for building digital libraries is a belief that they will provide better delivery of information than was possible in the past. Traditional libraries are a fundamental part of society, but they are not perfect. Can we do better?
Enthusiasts for digital libraries point out that computers and networks have already changed the ways in which people communicate with each other. In some disciplines, they argue, a professional or scholar is better served by sitting at a personal computer connected to a communications network than by making a visit to a library. Information that was previously available only to the professional is now directly available to all. From a personal computer, the user is able to consult materials that are stored on computers around the world. Conversely, all but the most diehard enthusiasts recognize that printed documents are so much part of civilization that their dominant role cannot change except gradually. While some important uses of printing may be replaced by electronic information, not everybody considers a large-scale movement to electronic information desirable, even if it is technically, economically, and legally feasible.
Here are some of the potential benefits of digital libraries.
To use a library requires access. Traditional methods require that the user goes to the library. In a university, the walk to a library takes a few minutes, but not many people are member of universities or have a nearby library. Many engineers or physicians carry out their work with depressingly poor access to the latest information.
A digital library brings the information to the user's desk, either at work or at home, making it easier to use and hence increasing its usage. With a digital library on the desk top, a user need never visit a library building. The library is wherever there is a personal computer and a network connection.
Computing power can be used to find information. Paper documents are convenient to read, but finding information that is stored on paper can be difficult. Despite the myriad of secondary tools and the skill of reference librarians, using a large library can be a tough challenge. A claim that used to be made for traditional libraries is that they stimulate serendipity, because readers stumble across unexpected items of value. The truth is that libraries are full of useful materials that readers discover only by accident.
In most aspects, computer systems are already better than manual methods for finding information. They are not as good as everybody would like, but they are good and improving steadily. Computers are particularly useful for reference work that involves repeated leaps from one source of information to another.
Libraries and archives contain much information that is unique. Placing digital information on a network makes it available to everybody. Many digital libraries or electronic publications are maintained at a single central site, perhaps with a few duplicate copies strategically placed around the world. This is a vast improvement over expensive physical duplication of little used material, or the inconvenience of unique material that is inaccessible without traveling to the location where it is stored.
Much important information needs to be brought up to date continually. Printed materials are awkward to update, since the entire document must be reprinted; all copies of the old version must be tracked down and replaced. Keeping information current is much less of a problem when the definitive version is in digital format and stored on a central computer.
Many libraries provide online the text of reference works, such as directories or encyclopedias. Whenever revisions are received from the publisher, they are installed on the library's computer. The new versions are available immediately. The Library of Congress has an online collection, called Thomas, that contains the latest drafts of all legislation currently before the U.S. Congress; it changes continually.
The doors of the digital library never close; a recent study at a British university found that about half the usage of a library's digital collections was at hours when the library buildings were closed. Materials are never checked out to other readers, miss-shelved or stolen; they are never in an off-campus warehouse. The scope of the collections expands beyond the walls of the library. Private papers in an office or the collections of a library on the other side of the world are as easy to use as materials in the local library.
Digital libraries are not perfect. Computer systems can fail and networks may be slow or unreliable, but, compared with a traditional library, information is much more likely to be available when and where the user wants it.
Most of what is stored in a conventional library is printed on paper, yet print is not always the best way to record and disseminate information. A database may be the best way to store census data, so that it can be analyzed by computer; satellite data can be rendered in many different ways; a mathematics library can store mathematical expressions, not as ink marks on paper but as computer symbols to be manipulated by programs such as Mathematica or Maple.
Even when the formats are similar, materials that are created explicitly for the digital world are not the same as materials originally designed for paper or other media. Words that are spoken have a different impact from words that are written, and online textual materials are subtly different from either the spoken or printed word. Good authors use words differently when they write for different media and users find new ways to use the information. Materials created for the digital world can have a vitality that is lacking in material that has been mechanically converted to digital formats, just as a feature film never looks quite right when shown on television.
Each of the benefits described above can be seen in existing digital libraries. There is another group of potential benefits, which have not yet been demonstrated, but hold tantalizing prospects. The hope is that digital libraries will develop from static repositories of immutable objects to provide a wide range of services that allow collaboration and exchange of ideas. The technology of digital libraries is closely related to the technology used in fields such as electronic mail and teleconferencing, which have historically had little relationship to libraries. The potential for convergence between these fields is exciting.
The cost of digital libraries
The final potential benefit of digital libraries is cost. This is a topic about which there has been a notable lack of hard data, but some of the underlying facts are clear.
Conventional libraries are expensive. They occupy expensive buildings on prime sites. Big libraries employ hundreds of people - well-educated, though poorly paid. Libraries never have enough money to acquire and process all the materials they desire. Publishing is also expensive. Converting to electronic publishing adds new expenses. In order to recover the costs of developing new products, publishers sometimes even charge more for a digital version than the printed equivalent.
Today's digital libraries are also expensive, initially more expensive. However, digital libraries are made from components that are declining rapidly in price. As the cost of the underlying technology continues to fall, digital libraries become steadily less expensive. In particular, the costs of distribution and storage of digital information declines. The reduction in cost will not be uniform. Some things are already cheaper by computer than by traditional methods. Other costs will not decline at the same rate or may even increase. Overall, however, there is a great opportunity to lower the costs of publishing and libraries.
Lower long-term costs are not necessarily good news for existing libraries and publishers. In the short term, the pressure to support traditional media alongside new digital collections is a heavy burden on budgets. Because people and organizations appreciate the benefits of online access and online publishing, they are prepared to spend an increasing amount of their money on computing, networks, and digital information. Most of this money, however, is going not to traditional libraries, but to new areas: computers and networks, web sites and webmasters.
Publishers face difficulties because the normal pricing model of selling individual items does not fit the cost structure of electronic publishing. Much of the cost of conventional publishing is in the production and distribution of individual copies of books, photographs, video tapes, or other artifacts. Digital information is different. The fixed cost of creating the information and mounting it on a computer may be substantial, but the cost of using it is almost zero. Because the marginal cost is negligible, much of the information on the networks has been made openly available, with no access restrictions. Not everything on the world's networks is freely available, but a great deal is open to everybody, undermining revenue for the publishers.
These pressures are inevitably changing the economic decisions that are made by authors, users, publishers, and libraries. Chapter 6 explores some of these financial considerations; the economics of digital information is a theme that recurs throughout the book.
The vision of the digital library is not new. This is a field in which progress is been achieved by the incremental efforts of numerous people over a long period of time. However, a few authors stand out because their writings have inspired future generations. Two of them are Vannevar Bush and J. C. R. Licklider.
As We May Think
In July 1945, Vannevar Bush, who was then director of the U. S. Office of Scientific Research and Development, published an article in The Atlantic Monthly, entitled "As We May Think". This article is an elegantly written exposition of the potential that technology offers the scientist to gather, store, find, and retrieve information. Much of his analysis rings as true today as it did fifty years ago.
Bush commented that, "our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose." He discussed recent technological advances and how they might conceivably be applied at some distant time in the future. He provided an outline of one possible technical approach, which he called Memmex. An interesting historical footnote is that the Memmex design used photography to store information. For many years, microfilm was the technology perceived as the most suitable for storing information cheaply.
Bush is often cited as the first person to articulate the new vision of a library, but that is incorrect. His article built on earlier work, much of it carried out in Germany before World War II. The importance of his article lies in its wonderful exposition of the inter-relationship between information and scientific research, and in the latent potential of technology.
The original article was presumably read only by those few people who happened to see that month's edition of the magazine. Now The Atlantic Monthly has placed a copy of the paper on its web site for the world to see. Everybody interested in libraries or scientific information should read it.
Libraries of the Future
In the 1960s, J. C. R. Licklider was one of several people at the Massachusetts Institute of Technology who studied how digital computing could transform libraries. As with Bush, Licklider's principal interest was the literature of science, but with the emergence of modern computing, he could see many of the trends that have subsequently occurred.
In his book, The Library of the Future, Licklider described the research and development needed to build a truly usable digital library. When he wrote, time-shared computing was still in the research laboratory, and computer memory cost a dollar a byte, but he made a bold attempt to predict what a digital library might be like thirty years later, in 1994. His predictions proved remarkably accurate in their overall vision, though naturally he did not foretell every change that has happened in thirty years. In general, he under-estimated how much would be achieved by brute force methods, using huge amounts of cheap computer power, and over-estimated how much progress could be made from artificial intelligence and improvements in computer methods of natural language processing.
Licklider's book is hard to find and less well-known than it should be. It is one of the few important documents about digital libraries that is not available on the Internet.
The first serious attempts to store library information on computers date from the late 1960s. These early attempts faced serious technical barriers, including the high cost of computers, terse user interfaces, and the lack of networks. Because storage was expensive, the first applications were in areas where financial benefits could be gained from storing comparatively small volumes of data online. An early success was the work of the Library of Congress in developing a format for Machine-Readable Cataloguing (MARC) in the late 1960s. The MARC format was used by the Online Computer Library Center (OCLC) to share catalog records among many libraries. This resulted in large savings in costs for libraries.
Early information services, such as shared cataloguing, legal information systems, and the National Library of Medicine's Medline service, used the technology that existed when they were developed. Small quantities of information were mounted on a large central computer. Users sat at a dedicated terminal, connected by a low-speed communications link, which was either a telephone line or a special purpose network. These systems required a trained user who would accept a cryptic user interface in return for faster searching than could be carried out manually and access to information that was not available locally.
Such systems were no threat to the printed document. All that could be displayed was unformatted text, usually in a fixed spaced font, without diagrams, mathematics, or the graphic quality that is essential for easy reading. When these weaknesses were added to the inherent defects of early computer screens - poor contrast and low resolution - it is hardly surprising that most people were convinced that users would never willingly read from a screen.
The past thirty years have steadily eroded these technical barriers. During the early 1990s, a series of technical developments took place that removed the last fundamental barriers to building digital libraries. Some of this technology is still rough and ready, but low-cost computing has stimulated an explosion of online information services. Four technical areas stand out as being particularly important to digital libraries.
Large libraries are painfully expensive for even the richest organizations. Buildings are about a quarter of the total cost of most libraries. Behind the collections of many great libraries are huge, elderly buildings, with poor environmental control. Even when money is available, space for expansion is often hard to find in the center of a busy city or on a university campus.
The costs of constructing new buildings and maintaining old ones to store printed books and other artifacts will only increase with time, but electronic storage costs decrease by at least 30 percent per annum. In 1987, we began work on a digital library at Carnegie Mellon University, known as the Mercury library. The collections were stored on computers, each with ten gigabytes of disk storage. In 1987, the list price of these computers was about $120,000. In 1997, a much more powerful computer with the same storage cost about $4,000. In ten years, the price was reduced by about 97 percent. Moreover, there is every reason to believe that by 2007 the equipment will be reduced in price by another 97 percent.
Ten years ago, the cost of storing documents on CD-ROM was already less than the cost of books in libraries. Today, storing most forms of information on computers is much cheaper than storing artifacts in a library. Ten years ago, equipment costs were a major barrier to digital libraries. Today, they are much lower, though still noticeable, particularly for storing large objects such as digitized videos, extensive collections of images, or high-fidelity sound recordings. In ten years time, equipment that is too expensive to buy today will be so cheap that the price will rarely be a factor in decision making.
Storage cost is not the only factor. Otherwise libraries would have standardized on microfilm years ago. Until recently, few people were happy to read from a computer. The quality of the representation of documents on the screen was too poor. The usual procedure was to print a paper copy. Recently, however, major advances have been made in the quality of computer displays, in the fonts which are displayed on them, and in the software that is used to manipulate and render information. People are beginning to read directly from computer screens, particularly materials that were designed for computer display, such as web pages. The best computers displays are still quite expensive, but every year they get cheaper and better. It will be a long time before computers match the convenience of books for general reading, but the high-resolution displays to be seen in research laboratories are very impressive indeed.
Most users of digital libraries have a mixed style of working, with only part of the materials that they use in digital form. Users still print materials from the digital library and read the printed version, but every year more people are reading more materials directly from the screen.
The growth of the Internet over the past few years has been phenomenal. Telecommunications companies compete to provide local and long distance Internet service across the United States; international links reach almost every country in the world; every sizable company has its internal network; universities have built campus networks; individuals can purchase low-cost, dial-up services for their homes.
The coverage is not universal. Even in the U.S. there are many gaps and some countries are not yet connected at all, but in many countries of the world it is easier to receive information over the Internet than to acquire printed books and journals by orthodox methods.
Although digital libraries are based around networks, their utility has been greatly enhanced by the development of portable, laptop computers. By attaching a laptop computer to a network connection, a user combines the digital library resources of the Internet with the personal work that is stored on the laptop. When the user disconnects the laptop, copies of selected library materials can be retained for personal use.
During the past few years, laptop computers have increased in power, while the quality of their screens has improved immeasurably. Although batteries remain a problem, laptops are no heavier than a large book, and the cost continues to decline steadily.
Access to digital libraries
Traditional libraries usually require that the user be a member of an organization that maintains expensive physical collections. In the United States, universities and some other organizations have excellent libraries, but most people do not belong to such an organization. In theory, much of the Library of Congress is open to anybody over the age of eighteen, and a few cities have excellent public libraries, but in practice, most people are restricted to the small collections held by their local public library. Even scientists often have poor library facilities. Doctors in large medical centers have excellent libraries, but those in remote locations typically have nothing. One of the motives that led the Institute of Electrical and Electronics Engineers (IEEE) to its early interest in electronic publishing was the fact that most engineers do not have access to an engineering library.
Users of digital libraries need a computer attached to the Internet. In the United States, many organizations provide every member of staff with a computer. Some have done so for many years. Across the nation, there are programs to bring computers to schools and to install them in pubic libraries. For individuals who must provide their own computing, adequate access to the Internet requires less than $2,000 worth of equipment, perhaps $20 per month for a dial-up connection, and a modicum of skill. Increase the costs a little and very attractive services can be obtained, with a powerful computer and a dedicated, higher speed connection. These are small investments for a prosperous professional, but can be a barrier for others. In 1998 it was estimated that 95 percent of people in the United States live in areas where there is reasonable access to the Internet. This percentage is growing rapidly.
Outside the United States, the situation varies. In most countries of the world, library services are worse than in the United States. For example, universities in Mexico report that reliable delivery of scholarly journals is impossible, even when funds are available. Some nations are well-supplied with computers and networks, but in most places equipment costs are higher than in the United States, people are less wealthy, monopolies keep communications costs high, and the support infrastructure is lacking. Digital libraries do bring information to many people who lack traditional libraries, but the Internet is far from being conveniently accessible world-wide.
A factor that must be considered in planning digital libraries is that the quality of the technology available to users varies greatly. A favored few have the latest personal computers on their desks, high-speed connections to the Internet, and the most recent release of software; they are supported by skilled staff who can configure and tune the equipment, solve problems, and keep the software up to date. Most people, however, have to make do with less. Their equipment may be old, their software out of date, their Internet connection troublesome, and their technical support from staff who are under-trained and over-worked. One of the great challenges in developing digital libraries is to build systems that take advantage of modern technology, yet perform adequately in less perfect situations.
Basic concepts and terminology
Terminology often proves to be a barrier in discussing digital libraries. The people who build digital libraries come from many disciplines and bring the terminology of those disciplines with them. Some words have such strong social, professional, legal, or technical connotations that they obstruct discussion between people of varying backgrounds. Simple words mean different things to different people. For example, the words "copy" and "publish" have different meanings to computing professionals, publishers, and lawyers. Common English usage is not the same as professional usage, the versions of English around the world have subtle variations of meaning, and discussions of digital libraries are not restricted to the English language.
Some words cause such misunderstandings that it is tempting to ban them from any discussion of digital libraries. In addition to "copy" and "publish", the list includes "document", "object", and "work". At the very least, such words must be used carefully and their exact meaning made clear whenever they are used. This book attempts to be precise when precision is needed. For example, in certain contexts the distinction must be made between "photograph" (an image on paper), and "digitized photograph" (a set of bits in a computer). Most of the time, however, such precision is mere pedantry. Where the context is clear, the book uses terms informally. Where the majority of the practitioners in the field use a word in certain way, their usage is followed.
Digital libraries hold any information that can be encoded as sequences of bits. Sometimes these are digitized versions of conventional media, such as text, images, music, sound recordings, specifications and designs, and many, many more. As digital libraries expand, the contents are less often the digital equivalents of physical items and more often items that have no equivalent, such as data from scientific instruments, computer programs, video games, and databases.
The information stored in a digital library can be divided into data and metadata. Data is a general term to describe information that is encoded in digital form. Whether the word "data" is singular or plural is a source of contention. This book treats the word as a singular collective noun, following the custom in computing.
Metadata is data about other data. Many people dislike the word "metadata", but it is widely used. Common categories of metadata include descriptive metadata, such as bibliographic information, structural metadata about formats and structures, and administrative metadata, which includes, rights, permissions, and other information that is used to manage access. One item of metadata is the identifier, which identifies an item to the outside world.
The distinction between data and metadata often depends upon the context. Catalog records or abstracts are usually considered to be metadata, because they describe other data, but in an online catalog or a database of abstracts they are the data.
No generic term has yet been established for the items that are stored in a digital library. In this book, several terms are used. The most general is material, which is anything that might be stored in a library. The word item is essentially synonymous. Neither word implies anything about the content, structure, or the user's view of the information. The word can be used to describe physical objects or information in digital formats. The term digital material is used when needed for emphasis. A more precise term is digital object. This is used to describe an item as stored in a digital library, typically consisting of data, associated metadata, and an identifier.
Some people call every item in a digital library a document. This book reserves the term for a digitized text, or for a digital object whose data is the digital equivalent of a physical document.
The term library object is useful for the user's view of what is stored in a library. Consider an article in an online periodical. The reader thinks of it as a single entity, a library object, but the article is probably stored on a computer as several separate objects. They contain pages of digitized text, graphics, perhaps even computer programs, or linked items stored on remote computers. From the user's viewpoint, this is one library object made up of several digital objects.
This example shows that library objects have internal structure. They usually have both data and associated metadata. Structural metadata is used to describe the formats and the relationship of the parts. This is a topic of Chapter 12.
The form in which information is stored in a digital library may be very different from the form in which it is used. A simulator used to train airplane pilots might be stored as several computer programs, data structures, digitized images, and other data. This is called the stored form of the object.
The user is provided with a series of images, synthesized sound, and control sequences. Some people use the term presentation for what is presented to the user and in many contexts this is appropriate terminology. A more general term is dissemination, which emphasizes that the transformation from the stored form to the user requires the execution of some computer program.
When digital information is received by a user's computer, it must be converted into the form that is provided to the user, typically by displaying on the computer screen, possibly augmented by a sound track or other presentation. This conversion is called rendering.
Finding terminology to describe content is especially complicated. Part of the problem is that the English language is very flexible. Words have varying meanings depending upon the context. Consider, the example, "the song Simple Gifts". Depending on the context, that phrase could refer to the song as a work with words and music, the score of the song, a performance of somebody singing it, a recording of the performance, an edition of music on compact disk, a specific compact disc, the act of playing the music from the recording, the performance encoded in a digital library, and various other aspects of the song. Such distinctions are important to the music industry, because they determine who receives money that is paid for a musical performance or recording.
Several digital library researchers have attempted to define a general hierarchy of terms that can be applied to all works and library objects. This is a bold and useful objective, but fraught with difficulties. The problem is that library materials have so much variety that a classification may match some types of material well but fail to describe others adequately.
Despite these problems, the words work and content are useful words. Most people use the word content loosely, and this book does the same. The word is used in any context when the emphasis is on library materials, not as bits and bytes to be processed by a computer but as information that is of interest to a user. To misquote a famous judge, we can not define "content", but we know it when we see it.
While the word content is used as a loosely defined, general term, the word work is used more specifically. The term "literary work" is carefully defined in U. S. copyright law as the abstract content, the sequence of words or music independent of any particular stored representation, presentation, or performance. This book usually uses the word "work" roughly with this meaning, though not always with legal precision.
A variety of words are used to describe the people who are associated with digital libraries. One group of people are the creators of information in the library. Creators include authors, composers, photographers, map makers, designers, and anybody else who creates intellectual works. Some are professionals; some are amateurs. Some work individually, others in teams. They have many different reasons for creating information.
Another group are the users of the digital library. Depending on the context, users may be described by different terms. In libraries, they are often called "readers" or "patrons"; at other times they may be called the "audience", or the "customers". A characteristic of digital libraries is that creators and users are sometimes the same people. In academia, scholars and researchers use libraries as resources for their research, and publish their findings in forms that become part of digital library collections.
The final group of people is a broad one that includes everybody whose role is to support the creators and the users. They can be called information managers. The group includes computer specialists, librarians, publishers, editors, and many others. The World Wide Web has created a new profession of webmaster. Frequently a publisher will represent a creator, or a library will act on behalf of users, but publishers should not be confused with creators, or librarians with users. A single individual may be creator, user, and information manager.
Computers and networks
Digital libraries consists of many computers united by a communications network. The dominant network is the Internet, which is discussed Chapter 2. The emergence of the Internet as a flexible, low-cost, world-wide network has been one of the key factors that has led to the growth of digital libraries.
Figure 1.1 shows some of the computers that are used in digital libraries. The computers have three main function: to help users interact with the library, to store collections of materials, and to provide services.
In the terminology of computing, anybody who interacts with a computer is called a user or computer user. This is a broad term that covers creators, library users, information professionals, and anybody else who accesses the computer. To access a digital library, users normally use personal computers. These computers are given the general name clients. Sometimes, clients may interact with a digital library without no human user involved, such as the robots that automatically index library collections, and sensors that gather data, such as information about the weather, and supply it to digital libraries.
The next major group of computers in digital libraries are repositories which store collections of information and provide access to them. An archive is a repository that is organized for long-term preservation of materials.
The figure shows two typical services which are provided by digital libraries: location systems and search systems. Search systems provide catalogs, indexes, and other services to help users find information. Location systems are used to identify and locate information.
In some circumstances there may be other computers that sit between the clients and computers that store information. These are not shown in the figure. Mirrors and caches store duplicate copies of information, for faster performance and reliability. The distinction between them is that mirrors replicate large sets of information, while caches store recently used information only. Proxies and gateways provide bridges between different types of computer system. They are particularly useful in reconciling systems that have conflicting technical specifications.
The generic term server is used to describe any computer other than the user's personal computer. A single server may provide several of the functions listed above, perhaps acting as a repository, search system, and location system. Conversely, individual functions can be distributed across many servers. For example, the domain name system, which is a locator system for computers on the Internet, is a single, integrated service that runs on thousands of separate servers.
In computing terminology, a distributed system is a group of computers that work as a team to provide services to users. Digital libraries are some of the most complex and ambitious distributed systems ever built. The personal computers that users have on their desks have to exchange messages with the server computers; these computers are of every known type, managed by thousands of different organizations, running software that ranges from state-of-the art to antiquated. The term interoperability refers to the task of building coherent services for users, when the individual components are technically different and managed by different organizations. Some people argue that all technical problems in digital libraries are aspects of this one problem, interoperability. This is probably an overstatement, but it is certainly true that interoperability is a fundamental challenge in all aspects of digital libraries.
The challenge of change
If digital technology is so splendid, what is stopping every library immediately becoming entirely digital? Part of the answer is that the technology of digital libraries is still immature, but the challenge is much more than technology. An equal challenge is the ability of individuals and organizations to devise ways that use technology effectively, to absorb the inevitable changes, and to create the required social frameworks. The world of information is like a huge machine with many participants each contributing their experience, expertise, and resources. To make fundamental changes in the system requires inter-related shifts in the economic, social and legal relationships amongst these parties. These topics are studied in Chapters 5 and 6, but the underlying theme of social change runs throughout the book.
Digital libraries depend on people and can not be introduced faster than people and organizations can adapt. This applies equally to the creators, users, and the professionals who support them. The relationships amongst these groups are changing. With digital libraries, readers are more likely to go directly to information, without visiting a library building or having any contact with a professional intermediary. Authors carry out more of the preparation of a manuscript. Professionals need new skills and new training to support these new relationships. Some of these skills are absorbed through experience, while others can be taught. Since librarians have a career path based around schools of librarianship, these schools are adapting their curriculum, but it will be many years before the changes work through the system. The traditions of hundreds of years go deep.
The general wisdom is that, except in a few specialized areas, digital libraries and conventional collections are going to coexist for the foreseeable future. Institutional libraries will maintain large collections of traditional materials in parallel with their digital services, while publishers will continue to have large markets for their existing products. This does not imply that the organizations need not change, as new services extend the old. The full deployment of digital libraries will require extensive reallocation of money, with funds moving from the areas where savings are made to the areas that incur increased cost. Within an institution, such reallocations are painful to achieve, though they will eventually take place, but some of the changes are on a larger scale.
When a new and old technology compete, the new technology is never an exact match. Typically, the new has some features that are not in the old, but lacks some basic characteristics of the old. Therefore the old and new usually exist along side. However, the spectacular and continuing decline in the cost of computing with the corresponding increase in capabilities sometimes leads to complete substitution. Word processors were such an improvement that they supplanted typewriters in barely ten years. Card catalogs in libraries are on the same track. In 1980, only a handful of libraries could afford an online catalog. Twenty years later, a card catalog is becoming a historic curiosity in American libraries. In some specialized areas, digital libraries may completely replace conventional library materials.
Since established organizations have difficulties changing rapidly, many exciting developments in digital libraries have been introduced by new organizations. New organizations can begin afresh, but older organizations are faced with the problems of maintaining old services while introducing the new. The likely effect of digital libraries will be a massive transfer of money from traditional suppliers of information to new information entrepreneurs and to the computing industry. Naturally, existing organizations will try hard to discourage any change in which their importance diminishes, but the economic relationships between the various parties are already changing. Some important organizations will undoubtedly shrink in size or even go out of business. Predicting these changes is made particularly difficult by uncertainties about the finances of digital libraries and electronic publishing, and by the need for the legal system to adapt. Eventually, the pressures of the marketplace will establish a new order. At some stage, the market will have settled down sufficiently for the legal rules to be clarified. Until then, economic and legal uncertainties are annoying, though they have not proved to be serious barriers to progress.
Overall, there appear to be no barriers to digital libraries and electronic publishing. Technical, economic, social, and legal challenges abound, but they are being overcome steadily. We can not be sure exactly what form digital libraries will take, but it is clear that they are here to stay.
Last revision of content: January 1999
Formatted for the Web: December 2002
(c) Copyright The MIT Press 2000