The hybrid information
environment and our Intranet solution to access it
Herbert Van de Sompel
This is NOT a publication
During the last decade, technological innovations have generated possibilities for libraries to deliver new, appealing services to end-users. Two waves of innovations have resulted in two major reorientations in library automation.
A first, remarkable one, occurred when concentration shifted from automating in-house library procedures (cataloguing, loan, acquisitions, serial control, ...) that controlled the traditional print-based collection, to automating the information and information delivery procedures. This was a shift from empowering the library towards empowering the end-user. Seminars addressing this reorientation would carry titles such as “From automated housekeeping of archives and libraries to automated information” (Ref 1).
This shift originated from a technology-based impulse, in which the sudden availability of secondary information on CD-ROM and the increasing use of Local Area Networks were the main catalysts. Suddenly, technology enabled libraries to offer their end user community more than catalogue information and as such, an ever increasing amount of libraries have chosen that new path.
But - although the possibility to deliver extra services to the end-user seems to be obvious and appealing - it is noteworthy to mention that the shift has been a slow one. The basic required technologies were made available between 1985 and 1990, but numerous libraries waited long to undertake concrete actions. An important cause for this is the underestimation of the immense importance of secondary sources in an academic environment.
While some libraries were still evaluating the feasibility of riding the first reorientation wave, there was already a new one to surf. An inspiring résumé of what this new wave is about, was found in the title of a seminar held at the University of Padua: “From database networking to the digital library” (Ref 2). This title suggests that a future library solution is more than the sum of electronic networked databases.
In order to clarify this important nuance, it is inspiring to use a metaphor taken from Marvin Minsky (Ref 3). In “The society of mind”, Minsky reveals his ideas on the functioning of the human brain. He explains the notion of an agent, the smallest operational entity in the human brain, capable of a very simple, specific task. The interaction between some agents that form a group, can result in handling a more complex task, and the co-operation between groups of agents, in yet more complex tasks. Finally, using the right interactions between agents and groups of agents, intelligence emerges.
The solutions that resulted from the first reorientation are characterised by information made accessible on a network, searchable via different kinds of monolithic softwares. The fact that there is no interaction between these softwares results - for the end-user - in the lack of integration between information, that is, can or should be related (see slide in lecture 2). Referring to Minsky, the agents are not co-operating in a first shift solution. The second reorientation means building intelligent solutions, by creating interactions between information entities, and by doing so, using the sum of the pieces rather than the individual parts (see slide in lecture 2). The solution in the second reorientation is more than a browsable list of searchable items, it is an easy-to-use entry point to an intelligent interlinked information environment.
Again, this second shift is technology driven, with open client-server technologies, inter-application tools, global networking and the increasing availability of primary electronic information as the most important catalysts. Amongst the first building blocks - of particular relevance for libraries - were the WWW, client-server based CD-ROM solutions (SilverPlatter ERL, Ovid) and the Z39.50 protocol. The potential of DVD-ROM in this context has only very recently been investigated (Ref 4).
It is more than possible that this shift is going to happen much faster, and probably some libraries that have been slow in reacting to the first shift, will move immediately into the second one. There are very good reasons for doing so. According to EC experts (Ref 5), the future of libraries depends on the success new networked library services will be realised with. This means that - in order to survive - the implementation of certain electronic services is no longer optional, but a must. If their “own” library will not deliver these, researchers - no longer limited by geographical boundaries - will address other libraries, making the raison-d’être of the “own” library less and less relevant.
Here, the survival of the library itself is not the main issue, but the survival of library values, that might become endangered. Libraries must - in the context at hand - also be interpreted as new solutions originating from innovative companies, competing with services traditionally delivered by or via “real” libraries, but operating from a mere commercial perspective. Traditional libraries should be active in this domain, by implementing new solutions and by co-operating with such new companies, constantly keeping a close watch on crucial values such as the democratic provision of information, the archiving of information, the integrity of information, ...
Some major features distinguish second wave from first wave solutions. The following will be discussed below: the hybrid information environment, the interlinked information environment, the accessibility.
Due to the Internet-explosion and to the increasing availability of digital content from traditional publishers, the spectrum of the information environment has diversified far beyond the traditional print-oriented library. New solutions should deal with that environment, as a whole:
information |
formal |
non-formal |
paper based |
traditional library |
traditional library |
digital |
digital/formal |
digital/informal |
Using these parameters, it can be seen that a solution to access the whole environment is composed of 3 sub-environments. Fig. 3 (a slide in lecture 2), shows a representation of these, with the direction from left to right indicating the shift from paper to digital and from formal to non-formal information. Indicated from top to bottom are the various steps of the consultation chain (Ref 7). We distinguish:
· The traditional library solution, aiming at the optimisation of the access to print-only, primary information. The used tools are catalogues revealing locations of that information; secondary tools providing an insight in the scientific production independent of local holdings; and document delivery mechanisms.
· The digital/informal solution, aiming at the optimisation of the access to digital-only, non-formal information. Some interesting initiatives (Ref 8) have been undertaken to disclose information of academic relevance in user-friendly ways. This is a very challenging domain, and the solutions being presented range from manual compilation of an academic Internet catalogue to automatic classification of Internet resources. Due to the extent and the dynamics of this problem, and given the financial and organisational restrictions faced by most libraries (Ref 9) only few will be able to provide relevant input in this domain. Luckily - for the time being - libraries can point to the solutions created by their colleagues or direct users to the free Internet search engines. No doubt we shall see new commercial secondary services arise in this area.
· The digital/formal solution, aiming at the optimisation of the access to digital-only, formal information. It is obvious that the digital-only solutions will be inspired by the Internet-mechanisms, linking directly from secondary tools to primary electronic information. In the commercial arena, this is the domain of an increasing number of vapor-, paper- or operational solutions such as Blackwell Navigator, SwetsNet, Elsevier Science Direct, UMI Proquest Direct, SilverPlatter’s SilverLinker, ISI’s Electronic Library, Ovid’s Biomedical Collection. In the non-commercial arena, this is the domain of in-house, peer reviewed electronic publishing, a possibility that is increasingly being investigated by academic institutions. For the discussion at hand, it is important to point out that in-house electronic publishing as such is not enough. We want users to be able to find the publications, as well. This means that secondary services remain essential, and should be provided either via the established - commercial - information providers or via new tools.
From the above we might conclude that none of the existing terms identifying the new library solutions - virtual library, electronic library, digital library - are able to name the complex problem at hand.
· Traditional libraries do not operate in a mere virtual, digital or electronic environment. The services - for years to come - are based on both print and digital material.
· The term “library” itself, almost implies “formal information” and “structure”. Therefore, it might not be very appropriate to use it when referring to future solutions that take into account the “non-formal” information flow generated by the Internet.
The bottom line remains that libraries will have to deliver solutions in a hybrid information environment.
The expectations of a net-traveller when using a library solution are inspired by his hyperlinked experiences in the non-formal information environment. To this user, it is not comprehensible that formal secondary sources, catalogues and primary sources, that are logically related, are not functionally hyperlinked. As illustrated conclusively in the Mecano EC Project (Ref 10), linking information in a first shift library solution is not quite feasible. But the client-server based building blocks of second shift solutions enable the development of linking mechanisms. The library-holding link - between secondary and catalogue information - is an obvious, and meanwhile popular example (Refs 7, 10, 11). A gateway from secondary information to an interlibrary loan service - being a special case of document delivery - is another example, as is the link between a catalogue system and primary document servers within the institution or at an information providers’ site.
The realisation of these links requires co-operation amongst libraries, providers of library housekeeping systems, providers of secondary services and providers of digital library solutions. In the course of building a next generation solution, libraries will choose amongst building blocks from different providers. Within the wide variety of criteria that will be used during that process, the willingness of the provider to co-operate in the domain of integrating the information environment might be one of the major ones.
Because of technical limitations of the building blocks, first shift solutions have very rarely been accessible to the whole community they were addressing. Only in some cases, where - for instance - computer centres have been able to impose certain technical choices upon their peers (Ref 12), we have seen the realisation of a general, uniform access to the available information. Second shift solutions arise in a completely different setting, characterised by client-server building blocks, cross-platform protocols and software (TCP/IP, HTTP, HTTPD servers and Web-browsers) and even cross-platform “operating systems” (Java). Therefore, the new solutions can fundamentally broaden the scope of the potential user-community by implementing:
· location independent solutions: Even in most first shift solutions, the user goes to the library for electronic retrieval. The second shift library reaches out to the user. The user accesses the library through the network, no longer restricted by geographical boundaries, since he is wired to the global network.
· platform independent solutions: The second shift library delivers services to a heterogeneous user-community, in which each user chooses a computer platform that fits his overall needs. Therefore access to the new library should be possible through the most common computer platforms and access-software is required for all these platforms.
· access via standard user-interfaces: The library user picks his own brand of software(s) to wander on the information highway. Therefore, the new library solutions should be able to deliver services to a variety of de-facto standard net-communication softwares.
· access-control and accounting mechanisms: In the course of gathering commercial information via the library solution, users will be accessing both intranet and Internet based services. The access mechanisms and accounting procedures of the information providers that will be “visited” during searching will be quite different. A new solution should free users from the burden of keeping track of numerous passwords and financial mechanisms, by developing access-control and accounting mechanisms between end-users and libraries and between libraries and information services.
The theoretical discussion above and the points of view expressed therein, have led to the definition of a concept for a new integrated library solution to serve a large academic user community. The proposed solution is the result of extensive discussions and experiments within and between the library automation teams of the Universities of Ghent, Louvain and Antwerp. The three teams join forces in the technical group of the Flemish-government funded Elektron project, to build a maquette for a system that once should be able to address the information needs of staff and students in Flemish universities and non-universitary higher education institutions. Meanwhile, the participating universities are taking a realistic approach, bringing parts of the anticipated solution into production, implementing pragmatic approaches for other parts or just leaving certain parts out. The discussion below will introduce the lay-out of the anticipated Elektron solution, and will focus on specific implementations at the University of Ghent, whenever appropriate.
The web-based Elektron solution aims at a modular architecture, consisting of the following building blocks (see slide in lecture 2):
· an authentication-module [1],
· a module for session management [2],
· a menu-system [3],
· a connection-module [4],
· a module for linking information systems [5].
These modules facilitate access to different kinds of web information systems. In the actual stage, the following types have been selected (see slide in lecture 2):
· SilverPlatters’ Electronic Reference Library using the proprietary DXP protocol [6]
· Systems compliant to the Z39.50 protocol [7]
· Systems that can be connected via HTTP
· Thin client systems to access traditional CD-ROMs [8]
The issue of user authentication in an environment with distributed services and a distributed user-community is far from trivial. The recent “White paper an authentication” by CNI’s Clifford Lynch presents a clear overview of the problem at hand, unfortunately without really proposing a way out.
It is clear that authentication is essential in the Elektron environment:
· The information-servers hold information that require licenses and hence that may only be made accesible within a controlled environment. The pragmatic approach - controlling the enviroment using IP based techniques - is not sustainable in the long run, because it is not waterproof and can even not be implemented in a lot of cases (i.e. dynamic IP addressing, remote access).
· Personalised services - for instance that require financial transactions (i.e. document delivery, pay-per-view) - can only be delivered when the identity of the user is known.
The aim of the Elektron project is to implement a solution, in which a single user-identification is required, when the user connects to the Elektron web-entry-page. This single identification potentialy authorises the user for access to multiple distributed services available in the environment. This Single Sign-On approach is in line with a recent trend in IT, that tries to overcome the burden of having a user-password combination for every single service: file server, e-mail, UNIX account, library catalogue, bibliographic databases, …
This Single Sign-On idea is not new in the electronic library environment. An interesting solution has been implemented by the Athens group of the e-Lib project, providing a single point of identification for a very large user-community as a gateway to a wide range of electronic services. This approach builds on a central relational database of users, administered in a decentralised way. The approach in the solution proposed here, is somehow different and builds on the Lightweight Directory Access Protocol (LDAP). LDAP - version 3 (RFC 2251) - is a standard presented as a slim alternative to ISO X.500 directory service standard. At first, LDAP has mainly been used for setting up standardised servers holding address books and directories. For instance, the Netscape Messenger mail software contains an LDAP client, that can be pointed to several LDAP servers, enabling searches for addressees in several directories. But increasingly, LDAP is being proposed as an authentication tool. In Sun’s Solaris 2.7 operating system, authenticating using an external LDAP server is presented next to the traditional UNIX passwd-file approach and the proprietary distributed NIS approach. Other operating systems - such as Windows NT 5 - are said to be following the same direction.
LDAP can play a strategic role in the realisation of a Single Sign-On environment, given its distributed and standardised nature. The idea within Elektron is that each institution will run an own LDAP server, running the database describing its community of students and staff. This evolution is already on its way, since several computing centres are currently experimenting with LDAP to replace the actual electronic phone book solutions. Once the typical Intranet applications such as access to file-servers, e-mail, UNIX systems, etc. support LDAP authentication, all these services could be authenticated by the unique LDAP server of an institution. Also, within Elektron, the aim is to replace the proprietary authentication modules for typical library applications (catalogue, bibliographic databases) by an LDAP authentication scheme. And of course, access to the Elektron environment would be granted by consulting the same LDAP server. This finally leads to the implementation of access to all institutional services using a single name and password combination.
Since the proposed solution:
· is based on a standard,
· can be implemented in a distributed manner,
· builds on work in progress at computer centres,
· relieves the library community from setting up and maintaining an own identification service,
it has the potential to realise an institution-wide single sign-on environment, the authors believe there is an important chance for potential success.
Ongoing tests with LDAP authentication for services provided to the library personel at the University of Antwerp are very positive. Nevertheless, several issues still need to be addressed:
· selection of a standardised directory scheme for Elektron (i.e. LIPS),
· the role of certificates in the set-up,
· securing transactions of username-password combinations,
· implementation in a distributed environment.
As a result of a positive authentication, the user’s browser is given a session ID that will be maintained throughout the rest of the consultations. The way things present themselves now, the idea is to store this session ID at the client’s side, for instance as a cookie. This ID will play a crucial role later on, when further authentication might be required by information-systems (see 3.4. Connection module). The ID-information needs to be available for read-out by the browser at any time.
Other relevant information read from the LDAP server such as:
· username
· e-mail address
· …
as well as dynamic, session specific information such as:
· interface-language,
· status of the menu-system,
· session ID’s in information-systems that are part of the environment,
· …
could be stored either at the client or at the server side.
The first real interface element to be presented to the user - one that will probably re-appear during the consultations - is the menu system, presenting the user with views on the databases of available databases. For this purpose, a package has been developed at the University of Ghent, where it runs in production since June 1997. The package is part of the Elektron maquette, and there is a genuine interest from the Universities of Antwerp and Louvain to implement it in their environments. The Ghent implementation focuses on the representation of a combination of links to formal sources available to the user (typically networked databases) and a very limited amount of links to high quality Internet sites aiming at the disclosure of informal scientific Internet sources. For instance, the menu-header secondary sources gives access to Current Contents, as well as to the major Internet search engines. It is most likely that a reference to most of the e-Lib libraries will be found under the same header. Catalogues links to several important Belgian library catalogues, as well as to a catalogue of electronic journals. Primary Sources would link to publishers’ sites - where electronic versions of current subscriptions are available (for free) - as well as to a selection of free Internet electronic journals.
The aim of the Ghent team was to develop a generic web-based menu-system, able to represent the database of menu-items from different perspectives. The system is set up using a relational database, in which the table of menu-items is a central element. In order to achieve multiple perspectives, the system allows for the definition of attribute-tables and tables linking atrribute-values to menu-items. Each attribute can be used as a perspective and as a filter. As such, a set-up as illustrated in the example below, can result in several representations of the database of menu-items, for instance:
· ordered by database type, using discipline as a filter mechanism
· ordered by discipline with database type as a filter.
· a screen providing facilities to search for menu-items and to filter the results using one or more attributes.
attribute |
|
|
|
attribute |
database type |
|
menu-item |
|
discipline |
Secondary |
|
Current contents |
|
Economy |
|
|
ABI/Inform |
|
Medicine |
Primary |
|
Engi virtual Lib |
|
Sciences |
|
|
IOP |
|
Applied |
|
|
Aleph |
|
… |
While a menu-item can have multiple values for a certain attribute (i.e. Current Contents and the attribute discipline), it can only have a single value for another property defined as a flag. Typical flags are - for instance - information type (formal/non-formal), collection (internal/external) and item-status (normal/demo/new). They can be represented in the output in several ways, for instance as distinct colours for the menu-items, as special icons next to the menu-items etc…
The system is entirely built using public
domain tools such as mSQL and Perl. The
programmation is object oriented and has been implemented with the following in
mind: extensive use by a broad user-community, flexible management and
flexibility in data-representation.
Although the user has been authenticated when connecting to the Elektron entry-page, the choice for certain menu-items will require an extra authentication in the information-system that is running the requested database. This can be the case for the consultation of an ERL-database, issuing a hold-request in the OPAC, reading an electronic paper at a publishers’ site etc… To be short, this can be the case for each information system that can also be accessed directly instead of via the menu-system. This extra authentication is opposed to the single sign-on concept presented above, and as such, the aim of the solution is to relieve the user from this burden. For this purpose, the concept of a connection module is introduced (see slide in lecture 2). This module will take care of further authentications in the background. Upon selection of a menu-item, the connection-module will perform a table look-up in order to determine the authentication scheme to be used:
menu-item |
original authentication |
new authentication |
Aleph 500 (local system) |
Aleph 500 |
LDAP & session-ID |
Current Contents (local system) |
local ERL |
LDAP & session-ID |
Emerald (remote system) |
Emerald-login |
LDAP-lib Emerald-login |
IOP (remote system) |
IP based |
Proxy if remote user |
Medline (local system) |
local ERL |
LDAP & session-ID |
PsycLit (remote system) |
remote ERL |
LDAP-lib remote ERL |
Springer Verlag (remote system) |
IP based |
Proxy if remote user |
Basically, the systems that require authentication fall under one of the following catagories [9]:
· Systems that only use IP-based access-control mechanisms: in this case, the connection module will have no other task then redirecting the user to the desired page, where the IP number of the browser will be authenticated. In case the user is working from a remote location (i.e. his IP address is not part of the institutional pool known by the system to which the user wants to connect), then connection will be made via a proxy server.
· Systems that are under local control (bold in table above): for these systems, the authentication module will be replaced. The new module should support two parallel schemes:
· The institutional LDAP authentication mechanisms (see 3.1.) that will be used whenever the user connects directly to the system, without going via the general entry-page.
· A mechanism based on a username/session-ID combination. This will be used when the user comes from the general entry-page. As explained above, in this case, the browser contains the username/session-ID. The only task for the connection module is to redirect the user to the appropriate information-system, that will be able to read the necessary authentication information from the users’ browser.
· Systems that are under external control, and that can be accessed using a username/password combination (italic in table above). In this case, a parallel library-specific LDAP-lib server will be used to determine the username/password combination to be provided to the external application. This LDAP-lib server has a non-standard structure, as shown below. The usernames are identical to those in the general LDAP-server. The passwords are being updated immediately after the general login (see 3.1), by the session-management module, which sets the LDAP-lib password for the user equal to his session-ID. Further fields in the LDAP-lib directory are pairs of usernames/passwords for external systems. As such, the connection module will be able to use the username/session-ID from the users’ browser to do a lookup in the LDAP-lib directory, in order to retrieve the necessary authentication information for the requested service. This, in combination with know-how of the external logon-mechanism, enables the connection module to perform a background logon for the user.
ldap
username |
password |
Emerald
name |
Emerald
password |
remote
ERL name |
remote
ERL password |
user1 |
session-ID1 |
biomed |
r5ttQ4 |
rugent |
our_erl |
user2 |
session-ID2 |
biomed |
r5ttQ4 |
rugent |
our_erl |
… |
|
|
|
|
|
usern |
session-IDn |
agricult |
U88ol? |
|
|
… |
|
|
|
|
|
It should be clear that maintaining such a separate LDAP-lib server is a non-trivial task:
· it requires permanent synchronisation of the usernames with the general LDAP server,
· in practice, it will only be possible to maintain username/password combinations for those external systems that allow generic logons for a reassonably large user group.
As indicated in “2.2 the interlinked information environment”, one of the important aims of a new library solution is to functionally connect information that is related, even if the information is held on different information systems.
The University of Ghent has been very active in this area:
· A link-to-holdings between ERL and the Aleph 500 system has been created as soon as the Aleph system was taken into production (June 97)
· A link between the ERL-Inspec database and the IEEE electronic library collection has been implemented on behalf of the IMEC research institute. This realisation is a joint effort of the Belgian SilverPlatter distributor IVS, IMEC and the University of Ghent (fourth quarter 1997)
· Experiments have been conducted linking from ERL to the SwetsNet solution.
These efforts have inspired the Universities of Antwerp and Louvain to set up similar mechanisms linking from ERL-systems to their catalogues and to selected publishers’ sites.
Most of these implementations have been set up in a very pragmatic manner, sharing a lack of sustainability. The ongoing developments have led to new insights, pointing at the need for a generic linking mechanism, able to handle all the desired connections between information systems.
References
(1) Kris
Clara & Julien Van Borm (editors) (1993) Van geautomatiseerd beheer van archieven en bibliotheken naar
geautomatiseerde informatie, Bibliotheekkunde, 51
(2) Universita degli Studi di Padova, Sistema Bibliotecario di Ateneo (1997) From database networking to the digital library: a European perspective: seminar held on March 6th 1997
(3) Marvin Minsky (1986) The society of mind, Simon and Schuster, New York
(4) SilverPlatter backs DVD (1997) Information World Review, january 1977 no 21, p 1
(5) J.S. Mackenzie Owen & A. Wiercx (1996) Knowledge Models for Networked Library Services: final report, EC Contract PROLIB/KMS 10119
(6) Nicholas Negroponte (1995) Being Digital, Knopf, New York
(7) Herbert Van de Sompel & Guido Van Hooydonk (1994) Technology and collaboration: creating an effective electronic information environment in an academic context, Proceedings of the 18th International Online Information Meeting, London 6-8 december 1994, pp 579-592
(8) http://www.sosig.ac.uk & http://bubl.ac.uk/link/subjects/ & http://www.ub2.lu.se/netlab.html
(9) S. Michael Malinconico & Jane C. Warth (1996) Electronic libraries: how soon? Program, vol 30, no. 2, April 1996, pp 133-148
(10) Mecano (1994), EC Project LIB-MECANO/4-2045
(11) Jerry V. Caswell & others (1995) Importance and use of holding links between citation databases and online catalogs The Journal of Acedemic Librarianship, March 1995, pag 92-96
(12) H. Geleijnse (1994) Campuswide information services at Tilburg University Libri (International Library Review), vol. 44, issue 4, Dec 1994, pp. 272-278
(13) http://www.lib.rug.ac.be
(14)
Herbert Van de Sompel (1993) Optimalisatie van de konsulatieketen aan de
Universiteit Gent, Van geautomatiseerd
beheer van archieven en bibliotheken naar geautomatiseerde informatie,
Bibliotheekkunde, 51, 263-278
[1] Experimental module at the University of Antwerp
[2] Still to be developped
[3] In production at the University of Ghent
[4] Experimental module at the University of Ghent
[5] Several experiments at the Universities of Ghent, Antwerp and Louvain
[6] In production at the Universities of Ghent, Antwerp and Louvain
[7] Accessed via the Z39.50 module of the University of Louvain’s LibriVision
[8] In production at the University of Ghent
[9] Certifcate-based solutions are left out of the discussion because they are actually not commonly used in the library environment. According to Lynch, user know-how about these solutions is very limitted and as such introduction by information providers seems to be unlikely in the near future.