From: Eliot Christian [echristi@usgs.gov] Sent: Monday, February 07, 2000 6:55 AM To: 'meta-harmony@mailbase.ac.uk' Subject: A GILS perspective on "Our Example" At 05:17 AM 2000-02-02 -0500, Carl Lagoze wrote: >[...] >A 130 min video (VHS) of a "Live at Lincoln Center Performance". The >conductor is Kurt Masur. The Orchestra is the New York Philharmonic. >The performance was on April 7, 1998 at 8PM Eastern Time. The video >was produced by PBS and broadcast live on the BBC. The TV director >was Brian Large and the sound recordist was Lydia Bancroft. The >narration and program notes are by Martin Bookspan and in english. > >The three pieces performed were are: > >- The Rite of Spring by Igor Stravinsky, written in 1911. Its length > is 35 minutes >- Beethoven Symphony No. 9, written in 1824. Its length is 65 minutes >- Concerto for Violin by Phillip Glass written in 1992. With Robert > McDuffie solo on the Violin. Its length is 25 minutes. > >Copyright for the entire performance is held by Lincoln Center for the >Performing Arts. Because GILS focuses just on interoperability for search, there is not really any "best" way to model the real world event or the given description of it. One can only look at how well any one model works in terms of exposing all that the provider intends to reveal, and taking full advantage of the search tool used by the searcher. Looking first at the real world event, it seems obvious to me that the information space is open-ended. An interested person could dig out all sorts of related facts, e.g., who attended the performance, who was capable of receiving the PBS broadcast, what kind of violin is owned by Robert McDuffie, what is the process if one want to copy the video, ... For some purposes, these and a host of other real world aspects could be essential metadata of the event or products. (I think the archival perspective makes this same point.) Since such an information space is unbounded and almost infinitely complex, I don't find it very useful to talk about modeling "the" real world in this case. What we do have in the given example is an information resource in the form of a text narrative. Although GILS can be used with complex and extensive information resources, the first goal in this case is to help searchers find this text narrative. To make the task non-trivial, we will assume that the text narrative is embedded within a collection of disparate stuff such as Web pages. So, we know that a provider intends to reveal the text narrative to searchers and we will assume it is part of a collection that includes a search facility. To intermediate the search, we also need to characterize the searcher. From the GILS perspective, the searcher is a process that interacts with the collection offered by the provider. That process may be under direct and immediate control of a human, but it may also be operating much more indirectly (agent, crawler, etc.). In general, the search process is modeled in GILS as network client-server, with the searcher as client and the provider as server. (The client initiates a search session.) To support a range of search clients, the GILS search designer maps registered well-known search concepts against the characteristics available in the target data. (Locally-defined concepts can also be used, but those are by definition not interoperable.) In the example case, a map would associate the search concept "anywhere" with the entire text narrative. This supports a "full-text" search client such as is commonly found on most Internet-wide search services. An intermediary without access to other information about the text would have to make inferences about the text content in order to offer more precise searching. Some contextual information can be inferred during the original ingest of the data. There are techniques to infer from unstructured text what it may be about and to make subject terms or coded classifications of the content. There are also ways to extract distinctive text such as personal names, place names, dates and times. One could also use natural language processing techniques to construct a graph representation of the entire text narrative and make some inferences about its more abstract content. Whether through machine guesswork or original human encoding, it is possible to expose for searching much of the component information in the text. This information can be modeled by the provider or intermediary in the form of XML, XML/RDF, SOIF, relational database, LDAP, Dublin Core attributes, RFC 822, IAFA, or any other construct. For the GILS search designer, it is only necessary that there is a way to map a search concept to a component piece of information. If we now assume that the provider or intermediary has decomposed the text narrative to expose information components, the essential issue concerns what search concepts are available at the search client. In GILS, there are a couple hundred available search concepts though in practice there are only 10 commonly used concepts. The GILS search designer is wise to give primary attention to mapping the concepts of title, author, subject, date, place, publisher, and cataloger. (The search concepts are defined using ISO 11179. They are registered in the Basic Semantic Registry and also listed at ) It should be understood that multiple views can be applied to the same collection--perhaps ranging from simple to complicated or covering a variety of languages. I hope this explanation of the GILS perpsctive has been helpful. I understand that some observers have been under the impression that there is a GILS "metadata format" directly comparable to MARC or some other schema. On the contrary, GILS ought to be seen as a way to achieve search interoperability _despite_ the absence of a pre-agreed metadata format. While it may not address all the goals of the ABC initiative, I don't know of any better way to achieve search interoperability given the amazing diversity in play across communities of interest. Eliot Christian, US Geological Survey, 802 National Center, Reston VA 20192 echristi@usgs.gov Office 703-648-7245 FAX 703-648-7112 Home 703-476-6134