Alternative Models of Scholarly Publishing

William Arms
Corporation for National Research Initiatives

This is a lightly edited transcript of a presentation to the LAUC Conference on Alternative Models for Scholarly Publishing, University of California at Berkeley, November 1998

I want to start by considering the title of today's activities, "Alternative Models for Scholarly Publishing". I greatly enjoyed listening to the thoughtful discussions in the first session from several people whose work I admire. But all three of the speakers essentially started with the traditional concepts of journal publishing and asked how to move these concepts into the online computing world. My hypothesis is that the changes and improvements to be made in scholarly and scientific communication are bigger than replicating the paper world online.

My background is in computing. The history of computing shows that, when you get sufficient technological developments, change ceases to be incremental and fundamental alterations take place. When I came to the United States in 1978, American computing consisted of IBM and the seven dwarfs. The dwarfs had names like Univac, CC, Honeywell, and NCR. Where are these companies? Gone! The next range of technology was mini- computers. The leading companies were Digital, Wang, Data General, etc.Where are they? Gone! It is not that the importance of these companies marginally shifted; they went out of business. The early personal computer companies suffered the same fate. The first personal computer I had was a Terak. Both Paul Ginsparg and I had NeXT computers. They were lovely computers. But where are they today? Gone!

In scientific publishing, we are dealing with a world that is moving beyond incremental change. We should expect really major change. We should not assume that the ACM should continue to be in this world. Why should Oxford University Press continue? Why should Elsevier or WestLaw or any other organization assume that it will remain important? These organizations have done a tremendous job with old technology. What makes us think that they will be leaders with new technology?

I can not predict the future, but I can tell you what I observe today. In particular, I can make some observations on being an author and a reader in this world. For the rest of this talk I will give a few examples from my own personal experience and make some comments about them. I will suggest a few conclusions. But this is experimental field. We each look at the observations and we each try to draw conclusions.

A good starting point is, "As We May Think." This is, as you probably recognize, the title of a paper published in 1945 by Vannevar Bush; many people see his paper as an inspiration for what we do today. It is widely read today because it is on the web with open access. How many people read the original article in the Atlantic Monthly? Very few. The web has brought the paper to millions of us.

The first half of the article is a discussion of why the traditional methods of scientific journal publishing are ill-suited to modern science. This part of the article reads as well today as it did then. The reason is because the traditional model of modern scientific journal publishing is often a barrier to communication. We can do better.

Let me put this in context. I have spent most of my life in universities. I now work for a small research organization that does not have a library. Yet I think I am more up to date with current research in my fields of digital libraries, electronic publishing, and networks than at any time in my career, at least since I was a young faculty member. As a reader who does not have use of a conventional library I read only online and only open access material. If information is not online with open access it does not exist. And fortunately in my fields, everything appears first online with open access. Some things appear later in the traditional journal literature.

I am not only a reader; I am also an author. The weakness of conventional journalliterature was brought home to me about six or seven years ago. There was a workshop on electronic information in the humanities, at UC Irvine, organized by the Getty Trust amongst others. They sponsored five papers to set the scene. I wrote one of the five which tried to be provocative and to challenge people. It stimulated a lively discussion at the workshop and also among people who were not at the workshop. So I put up a copy on my gopher site. (The use of gopher tells you the date of the meeting.) Some time later, I got a request from the organizers. They planned to publish these five papers in a special issue of "Leonard," a journal published by MIT Press. I said, "Of course! If you want to publish my paper, please do." Shortly afterwards I received a copyright transfer form which, amongst other things, stated that I could not leave the paper on my gopher site. Consider my choice. Do I publish this paper in a journal which none of my colleagues will ever read? Or do I leave it on my gopher site where people will read it? I did the obvious thing: I signed the copyright transfer form and left the paper on the gopher site. It is the last copyright transfer form I will ever sign, unless the publisher pays me.

In thinking about scientific information, I repeatedly return to two principles. My first principle is that authors are best served by open access to their work. If we really want to develop systems that serve scientific and scholarly authors well, we need "open access." My second principle is that scientific information needs to be managed by professionals and professional staff needs to be paid. This is the dilemma. Scientific communications needs open access and yet we have to pay the staff. How do we reconcile these two principles?

One aspect of the current journal trends worries me. You heard President Atkinson's talk about the California Digital Library. If you work here at the University of California at Berkeley, you will soon be in the position that large numbers of journals appear to be open access -- because the University of California has subscribed to them. But only a privileged few belong to rich universities. Let me give you an example. The IEEE (the Institute of Electrical and Electronic Engineers) did a survey of its members some time ago. As I recall, more than half of them work in organizations of five or less engineers. Most professional engineers do not have access to major libraries. The same is true for medicine -- and the same for law. Right across the board. If you want to create an elitist society in which the people who belong to expensive universities and major corporations are the only people who have access to scientific information, follow the path that we are on. But if you want to really get your information out into the world, it has to be open access.

Now for some specific examples. First of all, I want to talk about ACM's Digital Library. This is an example of taking the traditional way of doing things and building the electronic equivalent. Although I believe that this not the long term solution, it is a good example of what many professional societies have done recently.

Over the last few years the ACM has created a production process by which manuscripts received from authors are marked up in SGML by a professional staff. This SGML source is used for print and online versions, which are different manifestations of the same source. The processes of peer review, selection of materials, and copy editing are unchanged. This new style of production uses new technology, but is basically the old process. The division of effort follows a fairly common academic model. The editors-in-chief of the journals are volunteers. The reviewers are also volunteers. The production staffs are professionals. (I might say in passing, if ever you're invited to be an editor-in-chief of an ACM journal, think before accepting. It is an unbelievable workload. I can not imagine how anybody does it.) One of our principal problems at present is getting really good reviewers. The quality of review is vital, if you want to publish good journals of this type. Reviewing is a burden for people to undertake. Waiting for reviews put delays into the system.

ACM's Digital Library has been a great success, but it has two serious problems: the delays between submission and publication, and need to be a subscriber to access the collections. I agree with Vannevar Bush; we need to do better. If these are fundamental problems, you might ask why am I Chair of the ACM Publications Board? The answer is that I hope that it is possible to work with the community to make things better. Unlike some society publishers where publishing has becoming a business, ACM is very conscious of the basic questions. What do our members want? How can we help our members? If we keep asking these questions, we have some hope of making progress.

The business relationship between authors and ACM is contained in a document called, "The ACM Interim Copyright Policy." I dislike the fact that its called "copyright policy" but that is the way that these things are written. My contribution is the word "interim" in the title, because I believe the policy is in a state of flux. The background is that there was a committee, about five people, who created the current version of the policy. Some were cautious in their point of view; others were more progressive. (I think you can guess which group I was part of.) Nobody is completely happy with the compromises we came up with, but we agreed that we could accept them for a little while and review the policy later. In passing, I might mention that three years ago I was definitely an extremist on the ACM's Publication Board. I would now say that I am close to the center. It is not that I have changed my opinions; the community has changed.

There are two very important parts of this policy. First, we encourage authors to post their material online as soon as they have done their research, to "pre-publish" it; we also encourage them to put the final version of their papers on their own web sites. Some other publisher take the attitude that they will not publish research that has been distributed previously and that authors can not distribute their work independently after it has been published. In my opinion, this is totally irresponsible. With the ACM policy, pre-publication is encouraged.

The second aspect of the policy that I want to emphasize is something that I disagree with but prepared to accept as a compromise. ACM expects authors to transfer the copyright of articles. In return ACM allows authors some specified actions. Compared with other publishers, ACM is very generous in permitting authors to do almost everything that a reasonable person might want, but I dislike the mindset. Quite simply, the mindset is that authors are competition to the publishers, and that authors who distribute their own information may be undercutting the market for the publishers. To protect themselves from their own authors, publishers demand the transfer of copyright from author to publisher. (Personally, I told you the pledge that I took as an author several years ago. I will transfer copyright only if I am paid.)

Let me move on to another example. I hope many of you know, D-Lib Magazine, which is an online magazine we publish. It is a monthly serial about digital libraries research. I am the publisher; currently, I am also the interim editor following Amy Friedlander, who was the founding editor. The two of us started D-Lib Magazine about three years ago, at the time when the Digital Libraries Initiative from NSF, DARPA, and NASA was starting. People coming from different communities were setting out on digital libraries research without knowing the most basic facts about the other disciplines. Our aim was to help computer scientists discover what librarians do; librarians to learn about publishing, and so on.

The production process of D-Lib Magazine emphasizes speed. Monthly issues of the magazine come out on the fifteenth of each month at noon (not about noon, but at noon promptly). We ask for authors to submit their manuscripts by the beginning of the month, so that we can go through a few editorial cycles, but we will accept articles up to the eleventh of the month for publication on the fifteenth. D-Lib Magazine is online only. You can print it out if you like -- we are very happy - - but all we provide is online. It is open access -- I did not have to tell you that. It follows my first principle. Authors retain copyright. I wrote the original copyright policy. It basically said, do anything you want with these materials but be nice to the authors. If you copy something, say who it comes from. The current policy is slightly more complex, but not much.

There is a professional editor, who is paid to edit the magazine, but no external review. Conventional wisdom says that you can not have high quality without peer review. Yet D-Lib Magazine is clearly high quality. It shows that, if you have a proper editorial structure, you can get high quality material without peer review. Amy Friedlander and I put our professional reputation behind the magazine. We are both very knowledgeable in the field of digital libraries. There are not many topics covered in D-Lib Magazine that I am not competent to review and edit thoroughly.

What is the impact of D-Lib Magazine? We are very widely read. I was interested in seeing the figures that Michael Keller showed about the readership of the HighWire Press journals. It is difficult to make comparisons, without details about all of the journals. But D-Lib Magazine sees circulation figures which certainly rival those, despite no peer review. Perhaps most interesting is that the existence of a widely-read, open access magazine has helped to resist the creation of unnecessary, conventional journals by commercial publishers in the emerging field of digital libraries.

D-Lib Magazine is cited in the pompous places. We are indexed in the indexing and abstracting services. We are being cited in journal articles, conference procedures, and National Academy reports. We are perceived as providing academic prestige. I recently was an external reviewer for a promotion of a computer science faculty member at a leading university. In the list of publications that he submitted to the promotion and tenure committee, the first two were papers in D-Lib Magazine. The leading universities, the ones that are confident in what they do, are the ones can show others the way. The leading universities want good people; they evaluate people for promotion by how good they think they are, and they can change the rules in ways that some of the less self-confident universities may hesitate to do.

D-Lib Magazine is an example of a scientific publication, in which open access is fundamental. However, there is one big "but". It depends upon external sources of funding. My strategy is quite simple. We started off with some money from DARPA, which was assigned for a related purpose, with enthusiastic support from the program manager. Our aim is to make the magazine so important to the digital libraries community that they will not let it die. But it does require external funding and we will have to keep finding it.

Now for another example of a non-standard form of publication. At CNRI, we are the host for the Secretariat of the Internet Engineering Task Force (known as the "IETF"). The IETF, which is the technical arm of the Internet, publishes a very, very influential and important series called the RFC Series. The process is as follows. Suppose that as you sit here you have a wonderful idea about how to re-engineer the Internet. You write an Internet Draft which describes your ideas. At the beginning of the draft you put in a formal notice stating that this is a draft. It has six months' life and at the end of that time it is scheduled to vanish. You submit the draft to the IETF Secretariat; they immediately post it as an Internet Draft. They also circulate a notice to a mailing list saying there is a new Internet Draft on a certain topic. Discussion of the draft takes place on the mailing list, with comments that range from minor suggestions for improvement to quite vigorous comment. At a forthcoming IETF meeting -- they have three meetings a year --, the working group will get together and accept or reject the paper. If accepted, it becomes an official member of the RFC series and placed online. Thus, the RFCs combine open access, immediate dissemination of ideas, and a wonderful form of peer review, the best peer review of anything I know about.

The impact of the RFC series can not be over-stated. They are very widely read. They are crucial to the Internet's continuing success. You can argue that if the Internet followed the procedures of most standards bodies and charged for access to their standards, that the Internet would not have survived the turmoil its gone through. The fact that people compete in this open access process to have their ideas accepted by the community is how the Internet keeps overcoming technical hurdles.

The conservatism of publishers means that you will not find RFCs in many abstracting and indexing services. But the very reasons that they are ignored by traditionalists -- online only, open access, no conventional peer review -- these are the very reasons for their tremendous importance. In the networking community, they also convey academic prestige, and open the door to big research grants. To be the author of some major RFCs is worth any number of journal articles. In networking, RFCs are how reputations are made.

The RFCs have a different payment model. The fundamental way this is paid is the conference fees pay for the cost of the professional staff who manage the process. There are some other miscellaneous revenues but the fundamental thing is that there is no charge to the authors who write RFCs or to the readers.

My final example is also an ACM example. When the ACM Digital Library provides formal publication of computer science research, ACM is very conscious of the fact that computer scientists pre-publish most of their research, historically in department technical reports, more recently on web sites. ACM asked the question, is there anything we can do to enhance this pre-publication and early dissemination of results. This is an important question, because quite a lot of good material never gets beyond the technical reports. One of my favorite web sites is the Stanford Digital Library web site, which has working papers on it. On this site, there are several rough drafts by Terry Winograd. He emphasizes that they are drafts, and will probably never be finished. But, Terry Winograd's rough thoughts are pretty good rough thoughts and well worth reading. This informal literature is important.

A member of the ACM Publications Board, Joe Halpern of Cornell University, proposed that computing should have a pre-print service rather like what has been done in physics. Paul Ginsparg, who is the next speaker today, has been the leader of the physics effort. By chance, there was another person in the Computer Science Department at Cornell, Carl Lagoze, who was already working in these areas. By a wonderful coincidence, that very month, Paul Ginsparg came to Cornell and gave a talk. From this visit, a three-way partnership developed, which is called (CoRR) the Computing Research Repository, that provides an e-print service to complement computing journals. The actual management of the collections is done by the Los Alamos, using the existing systems with some enhancements. When a paper is submitted to the e-print service, it is not reviewed; it goes through a very simple moderation. The only question is whether a paper that claims to be on networking, appears to be a paper on networking. The moderator does not ask whether it is a good paper on networking, or an original paper on networking, but only whether it is on networking and not anthropology or something other field.

ACM is sponsoring CoRR. We are happy for other societies to join us, though we would not be happy to be joined by a society that wishes to say that any paper which is subsequently published in one of their journals should be withdrawn from CoRR. It is too early to judge the success, but I am optimistic. CoRR satisfies my two principles. It is open access yet it is run by professionals. It is a low costs system, with mainly automatic processing, but still does require external funding.

To finish this talk I want to point out how much complexity is eliminated by publishing with open access. Much of the complexity of building electronic publishing systems is because restricting access to information and collecting money are complicated. I was recently at a workshop at the University of Maryland. There was a speaker from Elsevier, who described their electronic journals and listed ten challenges. Eight of those challenges were the difficulties in restricting access, questions like how to authenticate people. If Elsevier allowed everybody access, they would have only have two challenges.

The three open access services that I mentioned all have trivial copyright policies. I described D-Lib Magazine's policy. On one occasion, I spend a very happy half hour trying to find a mention of copyright on the Los Alamos site. There is some vague suggestions buried away. And as I prepared this talk, I realized I do not have any idea what the copyright policy is for the Internet RFCs. It does not matter; they are open access. ACM's copyright policy is much more complex. Open access systems are much simpler than closed access systems.

To conclude, I see two principles. First, authors are best served by open access to their work. Second, we need information professionals -- editors, publishers, managers of information and archivist, etc. Professionals need to be paid. These principles may appear contradictory, but don't give up. There are ways to reconcile them. I have shown a few existence proofs and I believe there are going to be many more.

William Y. Arms
wya@cs.cornell.edu

Alternative Models of Scholarly Publishing

William Arms Corporation for National Research Initiatives

William Arms
Corporation for National Research Initiatives