So why are we gathered here tonight, when we could instead be spending the weekend in the comfort of our homes or offices surfing the internet? Perhaps some of you are here for the guaranteed warm and sunny fall weather in New Mexico (as you can see, I wrote this yesterday). I see a heterogeneous audience: many representatives from the American Physical Society, members of the library community, and a small representation of dedicated physicists. Some of you are here because you're curious, some of you are anxious, and some are very anxious about the future. It is clear that many roles will change, though exactly what role will be played by whom remains unclear.
In principle, what we'd like to do tomorrow is map out the ideal research communication medium of the future. That will be difficult within the programmed agenda, but we need at least to ensure that the physicists, who play a privileged role in this as both providers and consumers of the information, will not only be heard but be given the strongest voice.
If you come away from this weekend with the feeling that this was a marvelous meeting and a wonderful time was had by all, and then back to comatose business as usual, then this will have been a failure. We need to dislodge definitively the curiously prevalent notion that the future electronic medium will strictly duplicate, inadequacy for inadequacy, ashes for ashes, dust for dust, the current print medium; and that means you must go home and rethink many of your operations.
I've been asked to set the stage by relating a bit of the history of the "e-print archives" and whatever has occurred since mid 1991, but that would be tedious.
Instead I will concentrate only on some highlights that serve to illustrate the major lessons learned to date, and suggest their implications for the future. (For additional background information, see my article First Steps Towards Electronic Research Communication, Computers in Physics, Vol.8, No.4, Jul/Aug 1994, p. 390, originally adapted from a letter to Physics Today, June 1992.)
The first database, hep-th (for High Energy Physics -- Theory), was started in August of '91 and was intended for usage by a small subcommunity of less than 200 physicists, then working on a so-called "matrix model" approach to studying string theory and two dimensional gravity. (Mermin [Reference Frame, Physics Today, Apr 1992, p.9] later described the establishment of these electronic research archives for string theorists as potentially "their greatest contribution to science.") Within a few months, the original hep-th had quickly expanded in its scope to over 1000 users, and after little more than three years now has over 3600 users. More significantly, there are numerous other physics databases now in operation (see xxx physics e-print archives) that currently serve over 25,000 physicists and typically process more than 40,000 electronic transactions per day (i.e. as of 10/94).
These systems are entirely automated (including submission process and indexing of titles/authors/abstracts), and allow access via e-mail, anonymous ftp, and the WorldWideWeb. The communication of research results occurs on a dramatically accelerated timescale and much of the waste of the hardcopy distribution scheme is eliminated. In addition, researchers who might not ordinarily communicate with one another can quickly set up a virtual meeting ground, and ultimately disband if things do not pan out, all with infinitely greater ease and flexibility than is provided by current publication media.
It is important to distinguish the form of communication facilitated by these systems from that of usenet newsgroups or garden variety "bulletin board" systems. In "e-print archives," researchers communicate exclusively via research abstracts that describe material otherwise suitable for conventional publication. This is a very formal mode of communication in which each entry is archived and indexed for retrieval at arbitrarily later times; Usenet newsgroups and bulletin boards, on the other hand, represent an informal mode of communication, more akin to ordinary conversation, with unindexed entries that typically disappear after a short time.
It is also useful to dispatch some of the usual red herrings that invariably get raised. While the high energy physics community, for example, did have a pre-existing hardcopy preprint habit that had already largely supplanted journals as our primary communication medium, this is not a necessary initial condition for acceptance of an electronic preprint archive, as evidenced by recent growth into other areas of physics and mathematics, and even to computation and linguistics. The economics for all this remains favorable, with a gigabyte of hard disk storage currently averaging under $700 (i.e. roughly 25,000 papers including figures can be stored for an average of less than 3 cents apiece). Finally, politically correct elements typically fret over leaving the third world in the dust -- but the reality is that less developed countries are already better off than they were before: researchers in eastern Europe, South America, and the far East frequently report how lost they would be without these electronic communication systems, and how they can finally participate in the ongoing research loop. It will always remain easier and less expensive to get a computer connected to the internet than to build, stock, and maintain conventional libraries -- the conventional journal system had always been much less fair to the underprivileged.
To summarize, to date we've learned:
Before continuing, we must distinguish at this point between two very different types of publication, formerly grouped together only due to accidental similarities in their modes of production and distribution. Understanding this distinction is crucial to the future of scholarly publishing endeavors. (My comments here have been strongly influenced by e-mail discussions with Stevan Harnad and correspondents, some of which are available at this ftp url. Other relevant discussions of electronic publishing issues by Harnad, with further references, are available at this http url or equivalently at this ftp url).
In scholarly publication (a.k.a. "Esoteric Scholarly Publication"), we are writing to communicate research information and to establish our research reputations. We are not writing in order to make money in the form of royalties based on the size of a paying readership. We have every desire to see maximal distribution of our work (properly accredited of course), and would fight any attempt to suppress that distribution. In trade publication, on the other hand, authors write specifically to sell their articles and books, and have direct financial remuneration in mind from the outset. It is consequently in their interest as well to maximize distribution, but at the same time to insure that each reader pays per view; for this the intermediation of a publishing company to maintain an infrastructure to exact money from paying customers and to root out bootleg distribution may well remain welcome.
(To make the distinction graphic, suppose I write a paper that revolutionizes physics and transforms the way we think about nature: then I get perhaps a few thousand readers. If I write a typical relatively influential paper then I get a few hundred readers, and if I write an average paper I get a few tens of readers. On the other hand, if I write a completely fictitious account of an alleged love triangle involving O.J. Simpson, Tonya Harding, and Lorena Bobbitt then I instantly get over 200,000,000 eagerly paying customers.)
So in scholarly publication, we have a situation wherein authors can joke that they would pay people to read their articles. (N.B. this potential paucity of readership for any given article must not be used as an argument that support of basic research is intrinsically wasteful -- it simply results from the naturally restricted size of a highly specialized community, and does not directly measure the ultimate utility of the research.) So the essential point is now self-evident: if we the researchers are not writing with the expectation of making money directly from our efforts, then there is no earthly reason why anyone else should make money in the process (except for a fair return on any non-trivial "value-added" they may provide; or except if, as was formerly the case in the paper-only era, the true costs of making our documents publicly available are sufficiently high to require that they be sold for a fee). Now we are ready to consider the current role played by publishers of physics research information (at least in certain fields).
At dinner last night at a restaurant (Gabriel's) along the highway between Santa Fe and Los Alamos, the discussion turned (as it so often does) to the role of physics journals. It is ordinarily claimed that journals play two intellectual roles: a) to communicate research information, and b) to validate this information for the purpose of job and grant allocation.
As I've explained, the role of journals as communicators of information has long since been supplanted in certain fields of physics, so let's consider their other role. Having queried a number of colleagues concerning the criteria they use in evaluating job applicants and grant proposals, it turns out that the otherwise unqualified number of published papers is too coarse a criterion and plays essentially no role. Researchers are typically familiar with the research in their own field, and must in any event independently evaluate it together with letters of recommendation from trusted sources. Recent activity levels of candidates were mentioned as a criterion, but that too is independent of publication per se: "hot preprints" on a CV can be as important as any publication.
So we came to the conclusion last night (actually some us had realized this some time ago) that certain physics journals currently play NO role whatsoever for physicists. Their primary role seems to be to provide a revenue stream to publishers, a revenue stream invisibly siphoned from overhead on research contracts through library systems.
So this goes a long way to explaining how it could possibly be that a system whose only virtue is instant retransmission is able to supplant entirely established journals as a credible information source in certain fields. Make no mistake -- the current electronic archiving systems remain unspeakably primitive (but "in the country of the blind, Polyphemus is king").
With an example of an electronic system that physicists will voluntarily and actively use in hand, it is illuminating to consider how a hopeless misunderstanding of the properties and potentialities of the electronic medium can lead to badly mistaken intents and implementations. To illustrate what physicists do not want, let me take as an example a recent APS "request for proposals" for an on-line version of Physical Review Letters. (I use this example not only because of APS representation here, but also because -- due to a curious twist of fate -- I happened to be on the committee that evaluated these proposals early in '94.) In this case, the "request for proposals" itself was fundamentally flawed for both superficial and profound reasons ("We come here not to praise APS, ..."), though none of the respondents had the temerity (nor perhaps insight) to point this out explicitly.
So even benign, nonprofit organizations and learned societies, having tasted the amenities of scholarly publishing, tend to become addicted and lose track of their original mandate. ("if it were so, it was a grievous fault...") Until recently, there were few effective options for physicists to break into an intellectually void closed loop involving only publisher and library systems. The resources necessary for production and distribution of conventional printed journals allowed publishers to focus on their mechanics, and avoid any pressure to rethink the intellectual content and quality of their operations. The on-line electronic format will allow us to transcend the current inadequate system for "validating" research in a variety of ways. No longer need we be tied to a one-time all-or-nothing referee system which provides insufficient intellectual signal, and a static past database. We eagerly anticipate a vastly improved and more useful electronic literature, taking advantage of the flexibility afforded by the electronic medium and unhindered by artifacts of its evolution from paper.
Not to disparage this atmosphere of conviviality and friendship, there are some dark clouds on the horizon.
For the moment, conventional publishers have continued to express their unbridled enthusiasm for open electronic dissemination systems, despite an intrinsic potential for subversion. As long as their bottom line is unaffected, they can afford to be arbitrarily magnanimous in their desire for peaceful coexistence: "After all we have long been in the business of propagating research information, we would never dream of trying to suppress it in any way..."
But ever financially pressed research libraries are poised for triage of their journal subscriptions. And as pointed out by Quinn (1994), there's a potential explicit mechanism to encourage preferential cutting of subscriptions to physics journals: Libraries, faced with difficult choices, may decide that physicists already have an alternate information feed from the raw global electronic database; and physicists may well complain the least (or not at all) when their journals are threatened with cancellation. (Indeed this is already reported to be happening in India and other places with severely limited financial resources -- as argued above, the less developed countries stand to benefit at least equally from recent technological developments).
One positive note in this regard for the APS: its comparatively low-priced journals will give a bit of extra breathing room in the short term. In the long term, however, it is difficult to imagine how the current model of funding publishing companies through research libraries (in turn funded by overhead on research grants) can possibly persist. As argued by Odlyzko (1994), it is premised on a paper medium that was difficult to produce, difficult to distribute, difficult to archive, and difficult to duplicate -- a medium that hence required numerous local redistribution points in the form of research libraries. The electronic medium shares none of these features and thus naturally facilitates largescale disintermediation, with attendant increases in efficiency benefitting both researchers and their sources of funding. As described above, recent developments have exposed the extent to which current publishers have defined themselves in terms of production and distribution, roles which we now regard as trivially automated. But there remains a pressing need for organization of intellectual value-added, which by definition cannot be automated even in principle, and that leaves significant opportunities for any agency willing to listen to what researchers want and need.
And this brings us back to what should be the primary agenda for the meeting tomorrow, which is to fantasize about the ideal research communication system, and to start mapping out who will play what role in realizing and maintaining it. Allez.
FAIR USE: I reserve the right to distribute this electronic document in any way I so desire. It is publicly posted to the internet on my server, and anyone is free to establish a link to it from a subsidiary server (but not to copy it for public posting on a remote server, since that could lead to an undesirable proliferation of obsolete versions). It should not be reprinted for inclusion in any publication for sale without my explicit permission.