Some comments on e-biosci

Paul Ginsparg, 31 Aug '99

[from PubMed Central and Beyond: Page 5, In HMS Beagle: The BioMedNet Magazine (http://news.bmn.com/hmsbeagle/61/viewpts/page5), Issue 61 (3 Sep 1999)]

The crucial question underlying this debate is how our scientific research communications infrastructure should be reconfigured to take maximal advantage of newly evolving electronic resources. Rather than just "electronic publishing" which connotes a rather straightforward cloning of the paper methodology to electronic network, we'd prefer to see new methodologies leading to some form of global "knowledge network" of the future.

My own involvement in what evolved to become the current NIH proposal was a talk I gave in December '98 at the Banbury Center in Cold Spring Harbor, where I encouraged the biological and life scientists present to move in the direction of broader global archiving systems. (And frankly the participants at this meeting, many very dissatisfied with the current system, needed vanishingly little encouragement -- it's thrilling if the biomedical people are ready to join the 1990s, better late than never...) I described how the Los Alamos e-print archives, since their inception in '91 (where "e-print" denotes self-archiving by the author), have become a major forum for dissemination of results in physics and mathematics, and suggested some of what we foresee as the advantages of a unified global archive for research in these fields. I also pointed out that these e-print archives are entirely scientist driven, and are flexible enough either to co-exist with the pre-existing publication system, or help it evolve to something better optimized for researcher needs. In particular, the rapid dissemination they provide is not in the least inconsistent with concurrent or post facto peer review, and in the long run provides a possible framework for a far more functional archival structuring of the literature than is provided by current peer review methodology. The subsequent direct NIH involvement became an enormous opportunity to build on the existing resources at NCBI, and a potential model for other funding agencies, provided major miscues could be avoided. My primary comments since have been simply to encourage direct communication to the target scientists, trying to ensure their direct support and participation, rather than dealing through intermediaries whose vested interests might hinder more rapid progress.

Suggesting the need for reconsideration of the current methodology of research dissemination and validation is that each article typically costs (taxpayers?) many tens of thousands of dollars minimum in salary, much more in equipment, overhead, etc., and the key point of the electronic communication medium is that for less than an additional dollar per article it's possible to archive and make freely available to the entire world in perpetuity - that's the real lesson of the physics archives. So why cede control to some third party that brings in on average a few thousand dollars in revenue per article by contributing typically a few hundred dollars of value added in typesetting? Of course, some are also responsible for organizing peer review, though typically this comes from the donated time and energy of the research community, and subsidized by the same grant funds and institutions sponsoring the research in the first place. The question crystallized by the new communications medium is whether this arrangement remains the most efficient way to perform all these functions, or if the dissemination and authentication systems can now be more naturally disentangled to create a more forward-looking research communications infrastructure for the next millennium.

In writing these "retrospective" comments (31 Aug '99), I now have the benefit of having seen the latest description of the proposed "PubMed Central" site in Harold Varmus' letter of 30 Aug '99, in which it becomes a form of clearing house of materials validated or screened by outside parties with no direct relationship to the NIH. This evolution in conception is consistent both with NIH's role as a funding agency, helping to provide the infrastructure essential to the research it supports, and also with its obligation to the more general public. (Ironically, the NIH's prominence gives it less latitude to disseminate unscreened materials since no matter the disclaimer, appearance on an NIH server could be construed as some form of effective imprimatur.) Certainly promoting the easy availability of publicly funded research is in public policy interests for promoting the use of that research.

Some of the confused and overly strident responses over the past half-year to various forms of the proposal, from professional societies and other agencies supposedly acting on behalf of researchers, have been disappointing both in their presumption of entitlement and in their inability to recognize the potential benefits to researchers themselves. My own professional society (the American Physical Society), while itself already a publisher, has been much more forward-looking in trying to adapt its operations to the needs of its constituent research community, has endorsed the physics archives, and has a number of cooperative arrangements in place, with more planned for the future. Many different overlays (representing different cross sections) of the global archive will be possible, and their implicit competition will serve the needs of a heterogeneous community. Happily, I believe my own professional society may have avoided the pitfalls I was concerned about five years ago when I wrote: "Even benign, nonprofit organizations and learned societies can easily become addicted to the amenities of scholarly publishing and lose track of their original mandate: thus placing the revenue-generating potential of their established publishing enterprises above the need to furnish creative intellectual services to their constituents."

I'd also like to take this opportunity to comment on some of the attempts in the past half year to isolate physicists or rather to distinguish their research practices from the rest of the scientific community, in an attempt to assert that what has been so successful and continues to grow "couldn't possibly" work in say the biological or life sciences. Some of the suggested differences are actually amusing, for example that physicists are intrinsically "less competitive" than biologists (!). Perhaps physicists had instead simply abstracted long ago the essence of their research communication from its physical embodiment in paper, so found the move to new functionalities afforded by the new medium both obvious and natural.

Then there's the oft-repeated claim that physics is invariably done in big labs, with large teams, and the papers are all written by hundreds of authors, or that they publish much less. Actually there is only one small area of physics in which this is even partially true, namely high energy *experimental* physics, but this area involves well under 1000 papers per year, and even only a small percentage of those have "hundreds of authors", hence under 1% of physics papers have "hundreds of authors". Characterizing the entire field in this fashion is an over-simplification at best. Note further that the physics archives started not with high energy experimental papers anyway, but with theoretical papers, and these in general have *fewer* authors than biomedical papers, and are written by people who publish just as frequently as biomedical scientists. The real distinction here is probably between experimentalists and theorists (with the theorists typically more computer literate and forward-looking), and so the sort of cultural difference that may be more at work here is that in biomedical research there are fewer of what we would regard as pure theorists. While in physics the pure experimentalists were slower to make use of the archives, they too benefit from the enhanced readership and ease of dissemination they provide, and have eventually adopted and adapted to this mode as enthusiastically as the rest.

Other suggestions say that "physicists are more interested in hypotheses and less in confirmed results" come from an equally skewed view of how physicists operate. And from the physicists' point of view, the biologists frequently seem an exceedingly timid group, having ceded direct control over their research results to parties not always acting in their interests (for instance, exerting stringent copyright control that might limit readership, enforcing embargoes to limit prior disclosure and discussion, and sometimes placing other obstacles in the path of research communication). From this viewpoint, even the way biologists are said to stake intellectual property claims is intrinsically irrational -- that is, waiting for an official journal publication date, as though the work is not intrinsically correct until officially "validated" (a practice which we're told can lead to the complete non sequitur outcome that results could be discussed at a meeting, some other lab could rush to reproduce and rush to publish, and the latter could get full "credit"). Physicists seem to have been doing this in a more rational way since long before the advent of the electronic communications network.

But I should also say that the biological and life scientists I meet in real life are rarely much different from physicists (modulo a certain computerphobia): clear-thinking, competent, autonomous -- so I'm not sure who exactly is exaggerating the differences and to what end; perhaps I don't meet a representative sample. Ultimately the issues regarding the correct configuration of the electronic research infrastructure will be decided experimentally. So why the reticence to experiment, what's the harm in finding out one way or another? Who exactly stands to lose?

Regardless of how different research areas move into the future (perhaps by some parallel and ultimately convergent evolutionary paths), I strongly suspect that, on the one- to two-decade time scale, serious research biologists will also have moved to some form of global unified archive system, without the current partitioning and access restrictions familiar from the paper medium, for the simple reason that it is the best way to communicate knowledge and hence to create new knowledge.