From eyh5@ee.cornell.edu Wed Oct 31 22:04:43 2001 Return-Path: Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA134fR10119 for ; Wed, 31 Oct 2001 22:04:42 -0500 (EST) Received: from james (james.ee.cornell.edu [128.84.236.65]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fA134QJ02578 for ; Wed, 31 Oct 2001 22:04:26 -0500 Date: Wed, 31 Oct 2001 22:03:03 -0500 (EST) From: Edward Hua To: egs@CS.Cornell.EDU Subject: 615 Paper 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Semantic File Systems David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, Jkames W. O'Toole, Jr. A semantic file system is an information storage system that provides flexible associative access to the system's contents by automatically extracting attributes from files with file-type-specific transducers. In this scheme, the user-programmable transducers use information about the semantics of the file system objects to extract the properties for file indexing. One of the advantages of a semantic file system, as the authors claim, is its ease of intergrating into existing file systems. In a semantic file systems, queries can be mapped into tree-structured path names. Queries are performed by means of virtual directories to describe a desired view of file system contents. Unlike conventional directories, virtual directories do not have to be explicitly created to be accessed. This therefore has the advantage of convserving disk space in the server. A field virtual directory contains one entry for each possible value of its corresponding field, and these entries are collectively called value virtual directories, each of which has one entry for each entity described by the field-value pair. This forms the basis of rapid indexing in the semantic file system, as the field-value pair uniquely refers to a specific entry that is a symbolic link to the actual file. The authors of the paper have implemented a semantic file system and evaluated its performance. The experiment of indexing a large number of user files is done with both full update and incremental indexing. Two positive results are extracted from the experiment to support the authors' claim that the semantic file system is more effective than a traditional file system for information sharing: the savings in time in linear search through the entire file system and the ease of the integration of semantic file system with the existing programs in the development system. This paper was written in 1991, way before the revolution in Internet and networking technologies took place. However, it is not hard to discern traces of its feature that can be found in some of today's more popular programming languages that assist system administrators' daily tasks. For example, the semantic file system shares very similar traits as the Oracle database maintenance language. This paper can therefore be said as laying the foundation for the future generation of programming languages that facilitate the operation and administration of large systems and networks. From ramasv@CS.Cornell.EDU Wed Oct 31 23:53:03 2001 Return-Path: Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA14r1R02040 for ; Wed, 31 Oct 2001 23:53:02 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Subject: cs615 PAPER 31 Date: Wed, 31 Oct 2001 23:53:01 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4301E7F289@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: cs615 PAPER 31 Thread-Index: AcFikRWv/ADbLOAnQm+6sEWZNABAHw== From: "Venu Ramasubramanian" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fA14r1R02040 Semantic File Systems This paper describes an attribute-value pairs based naming scheme that is translated into a standard directory based file system interface. The AV based naming provides users ease of searching and finding desired files much like a database. Using user specified transducers to generate the AV pairs associated with a file increases the usability of this system. Each file has a transducer that can be specified by the user, whose role is to identify attribute value pairs associated with this file that can be used at search time to locate it. Users can search for files by specifyinga name consisting of AV pairs. The name is translated into a tree (directoy structure) and access to the file is provided using the standard FS interface. Virtual links are used to associate a file in multiple virtual directory. Query processing is of course the main component of this system. several indexes are created for the attributes and values geberated by the transducers. The indexes are updated periodically. The query processing component uses this index to locate the reqd file and materialize a directory structure containing this file. The incremental indexing operation is however shown to be expensive. Thus practically, reindexing would have to be done infrequently and during times of low process load. I think that the reindexing scheme could be improved further. It may not be possible to translate all kinds of queries into directory based names. Insisting on a file system interface forces the system to loose flexibility in query specification. Searching through contents of a text file such as done on the internet may require specifying the entire file as values to certain attributes. The size of AV pairs so created might be huge (duplication of the file) and so would be the index required to access it. The query resolution process specified by this system may be inefficient for such searches. Yet, the user defined transducers and directory based naming makes the system very attractive to use. From avneesh@csl.cornell.edu Thu Nov 1 11:20:24 2001 Return-Path: Received: from capricorn.ds.csl.cornell.edu (capricorn.csl.cornell.edu [132.236.71.92]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1GKNR11822 for ; Thu, 1 Nov 2001 11:20:23 -0500 (EST) Subject: 615 Paper 31 Date: Thu, 1 Nov 2001 11:22:01 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Message-ID: <97C142C1212ED545B0023A177F5349C4053B31@capricorn.ds.csl.cornell.edu> X-MS-Has-Attach: X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message X-MS-TNEF-Correlator: Thread-Topic: 615 Paper 31 Thread-Index: AcFi8VZQgF5oUu7WSY6CkTy4uslrbA== From: "Avneesh Bhatnagar" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fA1GKNR11822 Semantic File Systems This paper discusses a directory access scheme which exposes a lower of access granularity through the use of attribute-value pairs. Using specialized as well as generic transducers, the system provides the user access flexibility,and a higher degree of usability. The query mechanism works with the help of field virtual dirctories, which do not have to be explicitly created. Each field virtual directory is named by the field and has one entry for each possible value of the field. The entries in this directory are the value virtual directories, where there is an entry for each entity corresponding to each field-value pair. The main idea is to map queries onto a tree structured path name. The authors determine the performance of this system, and as expected the initial indexing process takes a significant amount of time, due to a higher number of disk accesses. HOwever subsequent queries when cached have a lower turnaround time. The attractive features of this system, are the ability for the user to specify transducers and the ease with which this can be integrated with the existing file system/applications. It would be interesting to note the overheads when the data is more complex e.g multimedia content. The authors also note that real time indexing would be a hard task. Considering the time when this paper was written, I think that the authors have introduced a novel idea. From c.tavoularis@utoronto.ca Thu Nov 1 11:23:23 2001 Return-Path: Received: from bureau6.utcc.utoronto.ca (bureau6.utcc.utoronto.ca [128.100.132.16]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1GNLR12507 for ; Thu, 1 Nov 2001 11:23:22 -0500 (EST) Received: from webmail2.ns.utoronto.ca ([128.100.132.25] EHLO webmail2.ns.utoronto.ca ident: IDENT-NOT-QUERIED [port 48471]) by bureau6.utcc.utoronto.ca with ESMTP id <238374-27959>; Thu, 1 Nov 2001 11:23:08 -0500 Received: by webmail2.ns.utoronto.ca id <24411-13843>; Thu, 1 Nov 2001 11:23:00 -0500 To: COM S 615 Subject: 615 PAPER 31 Message-ID: <1004631773.3be176ddc6ef8@webmail.utoronto.ca> Date: Thu, 01 Nov 2001 11:22:53 -0500 (EST) From: c.tavoularis@utoronto.ca MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: IMP/PHP IMAP webmail program 2.2.3 This paper presents the semantic file system as an alternative to traditional tree structured file systems. This system provides associative access to files, and employ transducers to extract attributes and values from the files. The term ‘semantic’ refers to the programmable transducers that use information about the semantics of object files and directories for indexing. A transducer is file type specific and produces attributes in the form of field-value pairs to describe each file. Each file can have more than one attribute, and the collective attributes are know as an entity. A user interface via a browser allows users to query files based on their attributes. Also, an API permits users to dynamically add new types of transducers to the system. A transducer table is used to match a transducer to a file type. The semantic file system can integrate over systems such as NSF by providing virtual directories, such that the virtual directory names are interpreted as queries. This system has many applications, for individuals to query interesting files, and for groups of users keep themselves up to date on shared files. This system by no means responds in real time and has a long indexing set up, which is not critical, but speed could be improved. I think a good alternative would be to allow users to specify attributes of their files rather than employing the transducers. This would make the overall system more efficient, and accurate. From teifel@csl.cornell.edu Thu Nov 1 11:32:13 2001 Return-Path: Received: from disney.csl.cornell.edu (disney.csl.cornell.edu [132.236.71.87]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1GWBR13914 for ; Thu, 1 Nov 2001 11:32:11 -0500 (EST) Received: from localhost (teifel@localhost) by disney.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fA1GW6a26774 for ; Thu, 1 Nov 2001 11:32:06 -0500 (EST) (envelope-from teifel@disney.csl.cornell.edu) X-Authentication-Warning: disney.csl.cornell.edu: teifel owned process doing -bs Date: Thu, 1 Nov 2001 11:32:06 -0500 (EST) From: "John R. Teifel" To: Subject: 615 PAPER 31 Message-ID: <20011101113138.T10685-100000@disney.csl.cornell.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Semantic Files: This paper describes semantic file systems--a file system that provides flexible associative access to files by extracting attributes from files with file transducers. They introduce virtual directories--which are queries to find other files and directories. Properties of file system objects are automatically extracted and indexed. They claim that a semantic file systems are more effective than traditional tree structured file systems in terms of information sharing and command level programming(?). A transducer is a filter that reads in a file and outputs the file's entities and their corresponding attributes. This is a slight limitation of the file system, because for every file type a transducer needs to be written--although there _might_ be a way to have a generic transducer for uncommon file types (I guess the default for most UNIX type things would be ok to be the text-transducer, but in general this may be an annoyance). Actually, I'm not quite sure why we read this paper. I suppose this file system may be extended to support dynamic, mobile networks...but that is not really what it was designed for I think. From ranveer@CS.Cornell.EDU Thu Nov 1 11:32:21 2001 Return-Path: Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1GWKR13930 for ; Thu, 1 Nov 2001 11:32:20 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Subject: 615 PAPER 31 Date: Thu, 1 Nov 2001 11:32:20 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D430213A7BF@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 31 Thread-Index: AcFi8sIbtWColMv9EdW5awCQJ59Etw== From: "Ranveer Chandra" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fA1GWKR13930 Semantic File Systems This paper proposes a Semantic File System that implements a file system layer on top of any existing file system. In this paper SFS is implemented over NFS, and supports virtual directories and content extraction. All accesses to files are through queries to the SFS layer to maintain meta-data consistency and synchronization. SFS supports the creation of virtual directories, each directory pointing to files that satisfy a query. SFS uses user-written transducers that are file-specific content extractors. Semantic FIle Systems aim to help users in organizing their files by content and provides means to do so conveniently. It is extremely difficult to manage and query large file systems, for eg. experienced UNIX users trying /usr/bin or /usr/local/bin or /usr/sbin or something else!!!! SFS provides an efficient mechanism to handle shared data. In addition the idea of attribute as a field-value pair is interesting and is has been used in many future systems, as late as the INS for resource discovery. Furthermore, SFS concepts have been used to develop other resource discovery protocols, such as Discover in 1995. Although SFS was poroposed in 1991, it has still not become popular. One reason could be the huge amount of space used by SFS, and the bigger reason could be that the hierarchical scheme presently used has its own benefits. A hybrid scheme using SFS and the hierarchical scheme would be an interesting area of future work. From daehyun@csl.cornell.edu Thu Nov 1 11:46:55 2001 Return-Path: Received: from wilkes.csl.cornell.edu (wilkes.csl.cornell.edu [132.236.71.69]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1GksR16269 for ; Thu, 1 Nov 2001 11:46:54 -0500 (EST) Received: (from daehyun@localhost) by wilkes.csl.cornell.edu (8.9.3/8.9.2) id LAA28632 for egs@cs.cornell.edu; Thu, 1 Nov 2001 11:46:49 -0500 (EST) (envelope-from daehyun) From: Daehyun Kim Message-Id: <200111011646.LAA28632@wilkes.csl.cornell.edu> Subject: 615 PAPER 31 To: egs@CS.Cornell.EDU Date: Thu, 1 Nov 2001 11:46:48 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit This paper presented an information storage system called Semantic File System (SFS). The main idea of SFS is user programmable transducer. The transducer is a filter whose input is file contents and whose output is a set of file attribute which is a field-value pair. Files stored in SFS are interpreted by the transducer to produce attributes. Later, the attributes are used for rerival of the file. File access is base on queries which describe desired attributes. Queries are boolean combinations of attributes. As a result of query, a set of files or directories is give. Compatibility with existing file system is an important issue. SFS provides it by introducing the concept of a virtual directory. Virtual directory names are interpreted as queries and provide access to files and directories in compatible with SFS. They implemented a SFS which support NFS protocol as external interface. In my opinion, The main contribution of this paper is to introduce another layer of files systems which is supported by the transducer. The transducer acts as a filter between users and files and provides more flexible and versatile file accesses. And it is also extendible by adding new transducers. But, the latency of file access might be a problem. It is clear that SFS requires more computations than the conventional file systems. So I'm not sure that SFS is suitable for high load file server systems. From papadp@ee.cornell.edu Thu Nov 1 12:37:27 2001 Return-Path: Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1HbPR24879 for ; Thu, 1 Nov 2001 12:37:25 -0500 (EST) Received: from ee.cornell.edu (hegel.ee.cornell.edu [128.84.236.63]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fA1Hb5J17866 for ; Thu, 1 Nov 2001 12:37:05 -0500 Sender: papadp@ece.cornell.edu Message-ID: <3BE18837.ADCC515C@ee.cornell.edu> Date: Thu, 01 Nov 2001 12:36:55 -0500 From: Panagiotis Papadimitratos Reply-To: papadp@ece.cornell.edu Organization: Cornell University X-Mailer: Mozilla 4.51 [en] (X11; I; SunOS 5.7 sun4u) X-Accept-Language: el, fr-FR, en MIME-Version: 1.0 To: Emin Gun Sirer Subject: 615 PAPER 31 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Review of:"Semantic File Systems," by D.K. Gifford, P. Jouvelot, M.A. Sheldon, J.W.O'Toole Panagiotis Papadimtratos papadp@ece.cornell.edu The Semantic File System (SFS) provides an abstraction of existing file systems by introducing an associative access based on files' attributes. The attributes are content-based values and the interface provided by SFS resembles the one of classical file systems; an SFS access results in a 'virtual directory', i.e., a set of symbolic links to the associated files. Moreover, SFS provides a programming interface that allows for a finer-grain definition of attribute extraction. This is possible through user-defined 'transducers', i.e., tools that 'understand' the semantics of different types of files and generate indexing structures to the corresponding files. In effect, transducers define the framework of querying mechanisms and the resultant list of files, according to the sought attribute(s), constitute a virtual directory, whose name is in essence the name of a query. The apparent benefits of such an approach can be summarized in the flexibility to define full-custom user-defined associative access that can support effectively collabortive environments, and backward compatibility. The latter one is due to the adherence to the file-system interface (see above - symbolic links), that provides transparent access to the SFS structure. On the other hand, there is a trade-off between latencies to retrieve the SFS data and the increasing size of the indices (and programmable units) that are needed to support such a system. This appears to hinder an SFS approach from co-existing with a 'legacy' system; the issue of SFS and non-SFS systems interaction is a different one. From samar@ece.cornell.edu Thu Nov 1 12:47:38 2001 Return-Path: Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1HlaR26045 for ; Thu, 1 Nov 2001 12:47:36 -0500 (EST) Received: from descartes (descartes.ee.cornell.edu [128.84.236.60]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fA1HlHJ18139 for ; Thu, 1 Nov 2001 12:47:17 -0500 Date: Thu, 1 Nov 2001 12:46:24 -0500 (EST) From: Prince Samar X-Sender: samar@descartes.ee.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII 31) Semantic File Systems This paper presents the Semantic File System which the authors believe is more efficient than the traditional tree-structured file systems. The authors use the attribute-value pair naming scheme which provides the user flexible and easy searching of desired files. Files stored in a Semantic File System are interpreted by file type specific transducers to produce a set of attributes that facilitate later retrieval of the file. Transducers are maintained for each file, which can be specified by the users. Transducers identify the attribute-value pairs for a file which can later be used for file searching. The name, consisting of the attribute value pair, that a user specifies at the time of searching is translated into a tree. The file is accessed using the standard file system interface. Virtual links can be used to associate a file in multiple virtual directories. The Semantic File System aims to help a user organize his files more efficiently, based on the content of the files using the attribute-value pairs. This idea has been used in many systems, including the Intentional Naming System. However, as the authors point out, the implementation of real time indexing may require substantial amount of computing power at the semantic file server. This may hinder the latency of such a system. From andre@CS.Cornell.EDU Thu Nov 1 12:57:13 2001 Return-Path: Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1HvBR27474; Thu, 1 Nov 2001 12:57:11 -0500 (EST) Received: from khaffy (d2040.dialup.cornell.edu [132.236.155.40]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA18841; Thu, 1 Nov 2001 12:57:09 -0500 (EST) Received: from andre by khaffy with local (Exim 3.31 #1 (Debian)) id 15zGV0-0000Ik-00; Thu, 01 Nov 2001 12:59:06 +0100 Date: Thu, 1 Nov 2001 12:59:06 +0100 From: =?iso-8859-1?Q?Andr=E9?= Allavena To: egs@CS.Cornell.EDU Cc: andre@CS.Cornell.EDU Subject: 615 PAPER 31 Message-ID: <20011101125906.B1116@khaffy> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.20i Sender: =?iso-8859-1?Q?Andr=E9_Allavena?= Semantic File System This paper introduces the idea of characterising each file in the file system with a small set of attributes (author, extension, keywords...) The usual tree directory structure would be a specific case of the previous, with the name of the directory being an attribute. In other word consider the file system to be a database on which all the files are automaticaly inserted with their attributes. These attributes allow much easier search and retrival. Note: filters need to be writen for each knid of file, if not the system won't be able to add to automaticaly index it. Their implementation is transparent to the file system: each attribute is shown as a virtual directory under the FS. So ls, completion and so on work. This is a really nice idea. I didn't see any development for Unix though (their implementation was BSD). Maybe people don't really need the power of this tool to look for files? -- André Allavena (local) 154 A Valentine Place École Centrale Paris (France) Ithaca NY 14850 USA Cornell University (NY) (permanent) 879 Route de Beausoleil PhD in Computer Science 06320 La Turbie FRANCE From gupta@CS.Cornell.EDU Thu Nov 1 13:00:40 2001 Return-Path: Received: from ringding.cs.cornell.edu (ringding.cs.cornell.edu [128.84.96.109]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fA1I0dR27851 for ; Thu, 1 Nov 2001 13:00:39 -0500 (EST) From: Indranil Gupta Received: (from gupta@localhost) by ringding.cs.cornell.edu (8.11.3/8.11.3/C-3.2) id fA1I0d515912 for egs@cs.cornell.edu; Thu, 1 Nov 2001 13:00:39 -0500 (EST) Message-Id: <200111011800.fA1I0d515912@ringding.cs.cornell.edu> Subject: 615 PAPER 31 To: egs@CS.Cornell.EDU Date: Thu, 1 Nov 2001 13:00:39 -0500 (EST) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Semantic File systems, Gifford, Jouvelot, Sheldon and O'Toole Jr. Reviewer: Indranil Gupta This paper describes an overlay file system where users can view virtual directories by specifying semantic search criteria. These criteria are specified using transducers, which support attribute-value pairs only. Indices are maintained per file by using a periodic (2 minute) indexing process that creates indices. As a result, the semantic file system might be out-of-date with recent updates. Comments: Only individual file field-value attributes are considered. More general semantics for file systems (such as attributes for linked object files) are not considered. The user has to program a transducer all by herself (from scratch) - no templates or abstraction is provided for this. Updates are reflected lazily on to the semantic file system. This is effectively a (fixed) tradeoff between the cost of actual actions on the file system and the level consistency. This issue needs to be explored more in the scenario of a set of objects over an ad-hoc network. In other words, the tradeoff should be static depending on access patterns, mobility etc.