From hs247@cornell.edu Tue Oct 8 00:25:04 2002 Received: from mailout5-0.nyroc.rr.com (mailout5-0.nyroc.rr.com [24.92.226.122]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g984P3h08587 for ; Tue, 8 Oct 2002 00:25:03 -0400 (EDT) Received: from hubby.cornell.edu (syr-24-58-42-130.twcny.rr.com [24.58.42.130]) by mailout5-0.nyroc.rr.com (8.11.6/RoadRunner 1.20) with ESMTP id g984P1p25891 for ; Tue, 8 Oct 2002 00:25:01 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008002458.00b86b78@postoffice2.mail.cornell.edu> X-Sender: hs247@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 00:25:10 -0400 To: egs@CS.Cornell.EDU From: Hubert Sun Subject: 615 Paper 31 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed The semantic file system is introduced in this paper. A semantic file system is a file system where a user can specify a listing of directories or files based on their attributes. Ie.) Give all the files that were create between Dec 21st and January 5th. In their implementation, these attributes are implemented as virtual directories on top of existing file systems. The file system described offers two interfaces. The user interface and the application interface. The application interface allows programmers to describe transducers. Transducers filter through files and directories and gathers attribute information. An example described would be to write a transducer that searched through C files and figure out what they import or what their methods are. Then a user through the user interface can query the system and find all files that include "iostream.h". So how does this apply to ad-hoc networks and naming services? One could imagine this file system to be on a distributed system or ad-hoc network. (How this is done will be glossed over, we'll just assume it can). Then we can imagine services to be file descriptors. When a service is added to the system one, we can write special transducers for special services. For a printer, we can sort by location, colour/non colour, laser/bubblejet..etc. Now if a person wanted to find its closest printer, all he would have to do is query and list all printers by location. Or query for the closest printer. Again the problem exists like the INS for finding the closest printer to Alice. One would have to know where Alice is. Again, caching information could help, but this information may not be up to date. But from the file system perspective since Alice is not a file or directory, we might have to modify the transducers to track users on the system. Though the paper does describe this system as a possibility for a distributed file system, it doesn't mention anything about mobile nodes or ad-hoc networks. For the file system to be consistent, when a file is added, the file and its attribute information have to be propagated to all its nodes. One could look at a semantic files system as a database. The data is all the files and directories in the system. We can then form views, tables or indexes to look at this data (via transducers). A user could then use a querying language like SQL to do searches. (ie. Select printer with resolution = 300). However, one problem is that, how does this apply to ad-hoc networks with data that can change very frequently. From mvp9@cornell.edu Tue Oct 8 00:57:59 2002 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g984vwh14226 for ; Tue, 8 Oct 2002 00:57:59 -0400 (EDT) Received: from zoopark.cornell.edu (syr-24-58-46-186.twcny.rr.com [24.58.46.186]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id AAA11350 for ; Tue, 8 Oct 2002 00:57:56 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008005721.01aa0360@postoffice.mail.cornell.edu> X-Sender: mvp9@postoffice.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 00:57:58 -0400 To: egs@CS.Cornell.EDU From: mike polyakov Subject: 615 PAPER 31 Mime-Version: 1.0 Content-Type: text/html; charset="us-ascii" This paper presents a different type of file system, one that is indexed by meanings  semantics  of documents, not their physical location.  The file system is layered on top of the existing one, such that no additional software or browsers are necessary for clients.  Directories, files, and components of files are periodically indexed to allow creation of categorization in the form of virtual directories on the fly.  The benefits are two fold.  First, no new file system or software to interact with it need to be created.  Second, so-called “transducers,” which are essentially sophisticated filters, allows users to submit arbitrary, complicated queries.  The performance is also reasonable.
        The SFS presents a more organized and efficient tool to perform the tasks unix operators have done for decades with command line utilities.  The functionality seems in fact to be a subset of perl, although the mysterious ‘transducers’ are never described in any detail.  The real improvement comes in speed due proactive indexing.  The authors claim that under expected use, the indexing performs relatively well (that is, the system is expected to ‘converge’ to consistency).  The weakest part of the paper is evaluation, although analysis in the variety of environments and loads inherent in the task is a formidable challenge.  Still it would be nice to take some relevant task for which a simple command is not readily available, like looking through articles by category, and compare the look up time of average users with and without the system.
        Besides the many extensions proposed in the paper itself, several come to mind.  Although look-up is much less intensive than the indexing, in complicated scenarios, some sort of caching could be used.  Similarly, how can performance be improved in presence of multiple users, and, in fact, will they interfere with each other?  But, in the end, my biggest question is, how useful is this, really?  Directory names are supposed to reflect contents, and if parallel attributes are desired, databases present a more obvious solution.  For file content sorting, perl, awk, and grep suffice.  How much does this improve the efficiency of the average user/programmer?
From shafat@CS.Cornell.EDU Tue Oct 8 01:53:19 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g985rJh24307 for ; Tue, 8 Oct 2002 01:53:19 -0400 (EDT) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Subject: 615 PAPER 31 X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 Date: Tue, 8 Oct 2002 01:53:19 -0400 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D134FA1@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 31 Thread-Index: AcJtmzmh9xvpa3m4SuSA/tQXKQLL7w== From: "Syed Shafat Zaman" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g985rJh24307 This paper implements an information storage system called the Semantic File System (SFS) that is aimed towards presenting a more effective storage abstraction for information sharing and command level programming. In SFS, user programmable "transducers" are used to extract attributes from files and directories. A host of associative access facilities is provided to help the user discover and share relevant file objects. SFS is a query based file system that creates virtual directories based on the user described desired attributes. A transducer table maintains a list of transducers to be used with different files, before filtering them based on some given attributes, and creating the virtual directories. SFS can be integrated into existing file systems, and the paper discusses in length about its implementation with NFS. The paper also presents a fair amount of experimental results that investigate the performance of SFS' effectiveness in storage abstraction. I got the feeling that a lot of numerical figures were stated in the evaluation section without sufficient elaboration on their relevance or importance to the tests. The paper did not seem to do a strong job on talking about transducers which is essentially the heart of the system. It fails to address the process of generating these transducers, and how they can be promptly developed to handle files of all possible types. In fact, this appears to be a major drawback of the system. One or two examples of SFS's usage in application programs could perhaps have been also helpful. In the scenario considered last week, where a user is wandering around a building with a laptop hooked up to the wireless network, and looking for the nearest printer, SFS can be useful only if the system maintains a "location" attribute associated with each object. Every device/object has to be represented as a file on the network system, and only then can SFS be used to locate a certain service meeting a set of requirements. However, my guess would be that in mobile networks, SFS will not be able function effectively because of the dynamic nature of the network. The location attributes have to updated constantly, and for queries with a large set of attributes, the overhead cost might be just too high for the current version of SFS to handle. From bd39@cornell.edu Tue Oct 8 02:11:39 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g986Bch27619 for ; Tue, 8 Oct 2002 02:11:38 -0400 (EDT) Received: from boweilaptop.cornell.edu (r102439.resnet.cornell.edu [128.253.163.42]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id CAA17476 for ; Tue, 8 Oct 2002 02:11:38 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008020952.00b75140@postoffice2.mail.cornell.edu> X-Sender: bd39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 02:10:17 -0400 To: egs@CS.Cornell.EDU From: Bowei Du Subject: 615 PAPER 31 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Paper 31 Semantic Filesystems The main contribution this paper is the introduction of the ability for an operating system to extract file attribute information from the contents of the file through application specific "transducers" and to organize this information through a virtual file system. User written transducers examine the contents of files and expose attribute/value pairs, which are indexed in the file system. The sematic information is integrated with existing file system structures through the use of virtual directories, which represent semantic content as "attrib:/value/" directory entries. Queries of attributes are represented as a logical and of the different attributes specified in the directory path. These hierarchies can work in the same way as attribute trees in the intentional naming services. (One can imagine replacing INS queries with this file system structure). Instead of simply indexing files, the transducers could index services and present services in a global namespace manner similar to AFS. The advantage of such a scheme is that the naming mechanism is easily integrated into existing schemes for naming operating system resources. For example, we could use "/location:/alice/printers:/lpr1" to print a file near Alice, which would be transparent to the applications using the printer. New search critieria could be implemented by writing new transducers. The big question is how this scheme will be implemented in an ad-hoc network. How will consistency of the state of the directory information be maintained? How will transducers be distributed into the network? I thought the on-demand generation of the directories was a good idea, one can imagine flooding the network with a query, and caching the value of the response at the local node, a la proactive routing protocols. Caching would improve performance, only if the repeated queries were directed in the category. In the Alice example, if Alice was highly mobile, then caching would be of little use - everytime we query the local state, we would end up with a broken link. However, resources that stay put would benefit, i.e. printers, not Alice. === Active Names Active Names contributes an interesting idea: that resource names can also be bound to processing of the resource that was named, how it is located, transported etc. The idea is that some intelligence may be needed in the use of a service which can be described in the name of the service. This is somewhat similar to INS in that the services not the server is named. Active Names assumes that there is the ability to execute mobile code to provide the services requested in the name. Active Names associate names with namespace programs, which are pieces of mobile code that can be downloaded and run on any Active Name server. A request is transformed by the services named in a pipeline fashion, one service applied after another. Effect to the data that need to be applied after a service is performed is carried with the data in the form of small "after effect" code snippets. For the purpose of naming similar to INS and Semantic Filesystem, one would write an Active Service that performed the requested query. In the Alice example, we would have an Active Name service "Find Nearest Printer". Active Names offer a very flexible framework in which to perform queries - essentially any program can be written. The composition function of the services is also very interesting - there could be a service which performed load balancing, another that located devices close to a location. Composing the two in a query would result in a device of least load from a nearby location. Caching of the Active Name programs would benefit the bandwidth consumption and intelligent use would reduce bandwidth (PDA graphics example). I would imagine Active Name programs would also cache some of their state in between runs. One problem with Active Names is that the functionality of the system is too general. Basically any program can be an Active Name service. Active Names suggest that services can be made to work in a pipeline/interchangable fashion, and also that services in a network can be mobile, moving from node to node. Beyond that, the functionality of the system is basically wide open. From jsy6@postoffice2.mail.cornell.edu Tue Oct 8 02:21:28 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g986LSh29335 for ; Tue, 8 Oct 2002 02:21:28 -0400 (EDT) Received: from Janet.cornell.edu (syr-24-58-41-193.twcny.rr.com [24.58.41.193]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id CAA03175 for ; Tue, 8 Oct 2002 02:21:26 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008010217.00b49e90@postoffice2.mail.cornell.edu> X-Sender: jsy6@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 02:21:10 -0400 To: egs@CS.Cornell.EDU From: Janet Suzie Yoon Subject: 615 PAPER 31 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed The intent of INS is to complement and not replace DNS. Likewise, Active Names is a flexible naming resolution is only meant to extend the current Internet Domain Naming system. Active names and INS share some goals in common. Both use naming to describe intent rather than location, but the naming abstraction provided by Active Names is programmable. Active Names is created for wide-are distributed services. The major contributions of Active Names are its extensibility, location independence, composibility, and efficient use of network resources. Active Names are similar to DNS in that they are hierarchical namespaces. Each namespace has a program associated with it for interpreting that namespace in any desired fashion. The program associated with a namespace is selected by the owner of the namespace. The client is the owner of the root namespace. Unlike DNS, a user only needs to name the service they wish to use and not the specific transport protocol. Suppose we want to find the printer closest to Alice. We will either need a location-support system integrated into the system or pre-compute relative geographical distances of the building. The hierarchical namespaces of the Active Name could correspond with the heirarchial geographical representation of the printer. So the query first sees finds the closest printers in respect to the same building, floor, room, and then actual space in the room. From mr228@cornell.edu Tue Oct 8 03:53:42 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g987rgh15729 for ; Tue, 8 Oct 2002 03:53:42 -0400 (EDT) Received: from cornell.edu (pptp-032.cs.cornell.edu [128.84.227.32]) by cornell.edu (8.9.3/8.9.3) with ESMTP id DAA03908 for ; Tue, 8 Oct 2002 03:53:42 -0400 (EDT) Message-ID: <3DA28F1D.14AAE3A@cornell.edu> Date: Tue, 08 Oct 2002 03:54:05 -0400 From: Mark Robson X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: egs@CS.Cornell.EDU Subject: 615 PAPER 31 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Semantic File Systems is an interesting new way to look at data. SFS takes all the files in a filesystem and extracts meta data from them via transducers. Transducers are small programs that tell SFS how to extract various types of meta data from the files. As image files would probably be processed very differently than text files, there is (potentially) a different transducer for each file type. Instead of (or maybe in addition to) the traditional way of looking at files and directories, SFS proposes that files be grouped into virtual directories. Virtual directories are nothing more than a result set for some query. A virtual directory might be "all the files created after date D" or "all the files whose size is X", etc. The paper argues that this is a more nature (read: better) way to look at your data. While it's not clear how this would be immediately applied to ad-hoc networks, there are some obvious wins. If you let services be files and attributes of the services be equivalent to the files' meta data, then you have a system much like INS. The problems arise when figuring out where the data is stored, cached, etc. Is there in network processing? That is, who is responsible for applying the transducers to the data, who get the results of this processing, etc. Future work might explore more precisely what it would take to implement this in an ad-hoc world -- either as is, or modified for services and their attributes. From xz56@cornell.edu Tue Oct 8 04:57:34 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g988vYh28101 for ; Tue, 8 Oct 2002 04:57:34 -0400 (EDT) Received: from XIN (ex120.dialup.cornell.edu [132.236.102.120]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id EAA16056 for ; Tue, 8 Oct 2002 04:57:32 -0400 (EDT) Message-ID: <004501c26ea8$bc24e1d0$af66ec84@XIN> From: "Xin Zhang" To: "Emin Gun Sirer" Subject: 615 PAPER 31 Date: Tue, 8 Oct 2002 04:53:15 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Semantic File System presents a file system, where files can be accessed with the aid of the automatic extraction and indexing of features of files, which helps in information searching, sharing and programming. Constructed above the traditional tree file structure, it is highly compatible. (INS, also seeks compatibility to some traditional lower layers.) Dealing with files, much like nodes in ad hoc networks accessing the services in it, it can be demand-oriented (in stead of traditionally through locations). So property extraction also falls in to the form of attribute-value pairs. They called this set of av-pairs as "transducers". By arbitrarily designing transducers, different demands (for the searching file) can be met. >From my understanding, semantic file system is much like INS (or I should say INS-99 is like semantic-91). Here, the transducer is just like the whole set of INRs. They choose the file/forward the packet to the server, according to the demand in terms of ac-pairs. The only difference is that INRs work distributively and the ad hoc network is more dynamic. So the performance should be improved through caching in semantic file system. From vrg3@cornell.edu Tue Oct 8 10:26:05 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98EQ4h11529 for ; Tue, 8 Oct 2002 10:26:05 -0400 (EDT) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id KAA00285; Tue, 8 Oct 2002 10:26:03 -0400 (EDT) Date: Tue, 8 Oct 2002 10:26:03 -0400 (EDT) From: vrg3@cornell.edu X-Sender: vrg3@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII This paper presents the concept of Semantic File Systems. Traditional filesystems, like traditional network naming schemes, are robotically based on a hierarchy of an abstract concept of location or position. SFSs index files based on more useful properties, such as libraries used (when referring to source code) or genre of music (when referring to MPEG audio and/or video). This type of metadata is extracted from files using an extensible set of "transducers." The attributes and values of a file determine where it can stand in the virtual directory hierarchy. A virtual directory is essentially a directory whose name represents a search query and whose contents represent the results of the search. Any application on the system which accesses files can do so using the SFS, so the natural description of a file can be used to locate it at all times, leaving any underlying traditional directory structure hidden. Although the paper presents the concept in terms of files, in UNIX everything is a file anyway, so we could also consider the same scheme to organize and locate nodes of a network. With our printer example, a printer transducer might query the spooler for its properties. It is unclear, however, how often to update the table of attribute values. For simple data files it makes sense to update whenever the file is accessed for writing, but for files which represent other things you would have to do periodic updates as well. Incorporating support for requests like "nearest printer" would probably be best done using application-level location determination, followed by searches on location. A "printer nearest Alice" query would work the same way, by first determining Alice's location and then searching on that location. From kwalsh@CS.Cornell.EDU Tue Oct 8 11:03:06 2002 Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98F36h19440 for ; Tue, 8 Oct 2002 11:03:06 -0400 (EDT) Received: from localhost (larry.cs.duke.edu [152.3.140.75]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id LAA03169 for ; Tue, 8 Oct 2002 11:03:05 -0400 (EDT) From: kwalsh@CS.Cornell.EDU Received: from 132.236.29.70 ( [132.236.29.70]) as user walsh@imap.cs.duke.edu by login.cs.duke.edu with HTTP; Tue, 8 Oct 2002 11:03:05 -0400 Message-ID: <1034089385.3da2f3a9a131e@login.cs.duke.edu> Date: Tue, 8 Oct 2002 11:03:05 -0400 To: egs@CS.Cornell.EDU Subject: 615 PAPER 31 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.0 X-Originating-IP: 132.236.29.70 Semantic File Systems The SFS system presents a natural extension to heirarchical file systems in which virtaul directories are created on-demand based on user queries. Little direct attention is given to the query language, other than to state that it allows boolean operations on attribute/value pairs. The matching is operator is apparently only '='. The authors do mention a more flexible query language as future work. It appears much simpler in SFS than INS to add, remove, and restructure the attribute names, as no meaning is given to them whatsoever. As with INS, though, it seems difficult to perform queries such as "nearest object". The same techniques as in INS would work, such as a ring search or a better query language. Since SFS is concerned only with file system contents, it is easy for the server to track, and manage a cache of past queries. Changes to file contents necessarily pass through the sfs server, and it then has the opportunity to update or invalidate cache entries. SFS might be applicable to ad hoc networks in at least two ways. First, traditional network file system tasks are very heavy weight (eg., all searching or filtering is done on the client). With SFS, much of this work can be efficiently offloaded to the server, reducing the network load much in the same way as sQL stored procedures. Second, a distributed version of SFS might serve as a naming mechanism for services and objects in an ad hoc network. This idea, however, suffers from the same problems as INS. Active Names Through user-extensible routing and name resolution, active names gains extraordinary flexibility. It is the only system of the three that could directly support "nearest" operators, or operators which simultaneously balance multiple metrics ("nearest" and "least loaded" and "fastest", etc.). These would all be implemented as user-defined extensions, uploaded into the active name resolvers. This flexibility comes at a high price, of course. In order to be useful, installations will need to populate resolvers with many types of resolvers, filters, and routing mechanisms, since individual users can not be routinely expected to do so. From smw17@cornell.edu Tue Oct 8 11:13:53 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98FDqh21580 for ; Tue, 8 Oct 2002 11:13:53 -0400 (EDT) Received: from cornell.edu (syr-24-161-107-202.twcny.rr.com [24.161.107.202]) by cornell.edu (8.9.3/8.9.3) with ESMTP id LAA11831 for ; Tue, 8 Oct 2002 11:13:52 -0400 (EDT) Message-ID: <3DA2F557.3090401@cornell.edu> Date: Tue, 08 Oct 2002 11:10:15 -0400 From: Sean Welch User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4.1) Gecko/20020508 Netscape6/6.2.3 X-Accept-Language: en-us MIME-Version: 1.0 To: Emin Gun Sirer Subject: 615 PAPER 31 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Intentional Naming System - The Intentional Naming System (INS) is a naming methodology intended to combine resource discovery and routing into a single service/system. The end system is similar in some respects to DNS, in that there is a pre-defined well-known server (the DSR) that coordinates the creation and matinence of the Intentional Naming Routers (INR) into a well-defined tree. Node attributes are application-specified attribute-value pairs (av pairs) arranged heirarchically into a tree structure layered above the IP layer in the network stack. The protocol operates in close conjunction with the application layer, with applications specifying av pairs, routing metrics, and providing the periodic service updates necessary to maintain freshness in the INS network. User applications provide the naming services with the intended service rather than a network address, and the INS makes its best effort to deliver the message to the optimal host or to the defined subset of nodes matching the naming criteria. INS operates in close conjunction with the application layer, which is responsible f.or defining attributes and metrics in a way that makes this system useful The INS implements a routing protocol based on the services requested and the advertised application metrics. While this may be reasonable in the average static case, the proactive, centralized network creation and matinence limits the applicability of the presented algorithm to more mobile situations. In addition, while it is undoubtedly easier to push the problem of metrics up to the application layer, it does not make the problem inherently simpler. A user looking for the nearest printer, for instance, would most likely prefer a printer two rooms down over a physically nearer printer on a lower floor. The examples presented are deciptions of largely static networks, such as a camera in the white house. Extending this mechanism to mobile nodes will likely require a more sophisticated routing and update algorithm to achieve reasonable performance. Active Names - Active Names is another mechanism for resource/service discovery and transport through naming control. In contrast to the INS system, the Active Names protocol achieves the routing by distributing the routing and processing throughout the network. Each routing path is comprised of (potentially) a series of name resolution steps at various resolver nodes throughout the network. These are distributed, location independent functions that, when applied in a series, provide the routing functionality. The advantages of an active naming scheme are that it allows for a very flexible, extensible system capable of encompassing many different types of services, it provides for distribution of various tasks throughout the network, potentially permitting better tradeoffs between processing required and transmission bandwidth, and allows the client to define after methods to apply to the returned data to make better use of available resources. Active names are an interesting concept. Distributing the load in a dynamic manner should allow the Active Names system to be more applicable in high-mobility situations than INS. In addition, the location independent functional behavior could also be advantageous in heterogeneous networks or highly congested networks by adjusting the load distribution based on actual or measured capacity. Unfortunately, dynamic distribution of executable code imparts a number of problems. First off, an active network system must be comprised of nodes capable of executing platform-independent code. While this may not be a problem for PC-type systems in wired networks, extending this to an ad-hoc PDA network may have a significant impact on the networked machines. Secondly, there are a number of security concerns inherent in allowing arbitrary code to be distributed as part of a routing protocol. Even with secure authentication, a single compromised system may be capable of infecting numerous other systems also running an active name system. Maintaining acceptable levels of system security becomes a more difficult problem when running active names schemes, especially in the case of a standardized routing architecture with a standard hardware and software configuration. Finally, there is the same problem in an active naming system as in the INS discussed above. While Active Names does provide a distributed mechanism to distribute the routing load for better performance, it does not solve the basic problem of how to convert traditional routing metrics into user expectations without explicit outside information. Semantic Filesystems - Semantic Filesystems are a filesystem interface that modifies the traditional file system tree to create a virtual filesystem. This virtual filesystem is composed of files combined with a collection of transducers, which are special programs that can extract the set of attributes from a given type of file. Examples presented include author for text files, imports and exports from various types of code files, and {from, to} attributes from mail files. Accessing the semantic file system includes a query implicit in the file reference that searches the transducer outputs for files matching the quieried attributes. Multiple conjunctive searches are permitted, but disjunctive searches had not yet been implemented. Recent searches are cached for improved performance, and only partial re-indexing is performed during operational updates and modifications, with scheduled full re-indexing. Semantic filesystems are an idea that may actually be better suited to network resource discovery than to filesystems as presented here. In networks, the larger latency for data transmission and lower bandwidth may make the implementation of query-based more attractive, especially in ad-hoc networks where the cost of at-node processing is less than that of transmission over a wireless link. This is similar to the intentional naming system, where the transducers have been replaced with application defined values and metrics, and where the filesystem has been replaced by a resource naming scheme. a) Rapid/Harsh Environment Sensor Networks - concept - Enable the rapid deployment of lightweight sensor nodes, potentially in hostile or remote locations that make conventional deployments unattractive (some similarities to smart dust systems). - Active networks may have some attractive implementations in heterogeneous systems by allowing more powerful nodes to take on more of the communications processing loads. INS does not appear particularly advantageous for this style of network. b) Appliance Networks - Concept - Enable intelligent appliances and industrial systems capable of inter- system coordination for better resource use - Intentional naming schemes may be a useful abstraction in these types of networks, as the name structure itself is conducive to specifying and locating different systems and classes of systems with minimal load to the devices themselves. This concept may permit better integration of low computational power devices into a heterogeneous network (such as a factory floor or home kitchen) at the cost of some initial setup or pre-defined discovery script. Active naming may also be useful, as it provides a mechanism both to offload processing from relatively dumb devices as well as a means to distill returned data into a more efficient form based on the desired return destination. c) Intelligent Resource Detection and Utilization - Concept - Provide meaningful interface to human users, such as 'print to closest printer' rather than 'print to device attached to node 172.22.5.233'. - Naming Systems - The naming systems suggested above can be used to support some degree of intelligent resource use. The critical issue in the case of the INS is that the quality of the results will depend heavily on the intelligence of the application layer controlling it. The command "find the nearest printer" is a fairly simple concept to a human, but defining its precise meaning in a heterogeneous computing environment is considerably more involved. Using the example of a hybrid wired/wireless network, any application implementing a hop count will see considerable variation in the number of hops per meter of 'real' distance, and may still return a printer that is physically close but inconvienient. The translation of simple user concepts and seperation of nodes into useful subdivisions (such as the set of nodes present in a room) without external sources of information is not provided by either scheme. Active Names suffers from a similar problem. It may be possible to do better than INS by delaying more precise determinations until you are closer to the destination (where persumably more, better information may be available), but there is still the exact issue mentioned above. Neither system provides an improved mechanism for resolving the translation of human language concepts into effective algorithms, but merely pushes the issue to higher level protocols and explicit knowledge. d) Movement Aware Routing - Concept - Use the location information to judge the movement rate, and to estimate when link breakage is probable. From here, schemes such as intelligent route invalidation and link forwarding/handoff can be implemented. - Pseudo-static routes - From the network information, provide a mechanism for nodes to identify and make efficient use of relatively stationary, stable routes. - More a network connectivity issue below the IP layer, with little interplay with these naming schemes. e) Adaptive Link Adjustment - Concept - Use geographic feedback and either directional or adjustible power systems to improve link coverage in sparse areas - Again, more an issue of network connectivity (PHY/DLC/MAC), sitting well below the area affected by these naming protocols. Caching - Both systems can potentially benefit from caching to improve overall system performance. The INS structure makes caching relatively simple to implement, as the data streams are referenced in a relatively clear, unambigious manner. The basic cache here is not particularly different from a standard cache with regards to data content, but the naming structure inherent in INS may well provide more efficient use of the cache for popular repeating or often updated data sources. Active names can also benefit from caching, and additionally includes the possibility to utilize downstream caches as a simplistic form of a dynamic server. By caching not only the data, but enough program functionality to implement some basic functions, the authors suggest that cache performance may improve significantly over simple static caches. They suggest that implementing an active cache through their active naming system may increase the utility of web caches, traditionally of limited effectiveness, by also caching and offloading some limited functional complexity (for instance - an ad determination function for banner ads). From tmroeder@CS.Cornell.EDU Tue Oct 8 11:47:09 2002 Received: from dhcp99-233.cs.cornell.edu (dhcp99-233.cs.cornell.edu [128.84.99.233]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Fl9h29603 for ; Tue, 8 Oct 2002 11:47:09 -0400 (EDT) Received: (from tmroeder@localhost) by dhcp99-233.cs.cornell.edu (8.11.6/8.11.6) id g98FjPw02514; Tue, 8 Oct 2002 11:45:25 -0400 From: Thomas Roeder MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15778.64917.21515.71918@dhcp99-233.cs.cornell.edu> Date: Tue, 8 Oct 2002 11:45:25 -0400 To: Emin Gun Sirer Subject: 615 PAPER #31 X-Mailer: VM 7.07 under Emacs 21.2.1 The semantic filesystems paper describes an addition to the standard filesystem protocols which allow properties of files to be searched using virtual directories, which are computed on the fly by transducers specific to given file types. Although this is an old paper, the ideas apply well to using mobile devices in resource discovery. If, for instance, a host fileserver annotated its files with geographic location from one of the protocols from last week, a mobile node could query for all files which are geographically near. In general, since files can be used as an abstracting for processes, and named pipes could be set up to printers, the filesystem abstraction allows us to compute geographical nearness using the Semantic Filesystems. It seems to be a strange workaround more than a solution, however, and ActiveNames seems to use similar adapter pattern, this time interpreting data in-flight, rather than indexing files, to better effect. The Active Names protocol allows a chain of programs to be constructed to and from a service, which programs move between servers to transform the data according to their whim (and hopefully according to some metrics to improve performance). Here, given a localization, we can specify the "printer nearest me" by a name resolution program for printing which wanders the network via some geographic search until in minimizes (or gets close enough) the distance metric. Caching would indeed help here, so that we would not have to wander more than once. This would work well for printers, but not particularly well for the "printer nearest Alice", unless Alice were asleep, or otherwise relatively immobile. The inaccuracy of the searches in Active Names depends greatly on the relative mobility of the services in the network, which bodes well for increasing the performance of HTTP, and not so well for IRC. From ag75@cornell.edu Tue Oct 8 12:02:13 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98G2Dh03193 for ; Tue, 8 Oct 2002 12:02:13 -0400 (EDT) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id MAA06784 for ; Tue, 8 Oct 2002 12:02:10 -0400 (EDT) Date: Tue, 8 Oct 2002 12:02:10 -0400 (EDT) From: ag75@cornell.edu X-Sender: ag75@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII In this paper we are presented with a Semantic File System. A semantic file system is an information storage system that provides associative access to the system's contents by automatically extracting attributes from files with file type specific transducers. So files can be located based upon transducer generated attributes such as type, title, author, etc. A transducer is a filter that takes the contents of the file as input and outputs the attributes. Of course, one has to write transducers for every type of file that one wants to have interpreted. Once th files are interpreted, queries based on attributes are used to access the desired files. It's easy to see how this system can be extended for our purposes, we can treat printers, cameras, etc. as file types with specific attributes and go from there. However, SFS suffers from the same problems as INS. It is good for describing what kind of service is needed, but it can't do relative positions. Additionally, it's not clear how SFS would function in an ad hoc network with all the challenges that come from working in such environment. From liuhz@CS.Cornell.EDU Tue Oct 8 12:10:02 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98GA2h05257 for ; Tue, 8 Oct 2002 12:10:02 -0400 (EDT) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Subject: 615 PAPER 31 X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 Date: Tue, 8 Oct 2002 12:10:01 -0400 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302CEE65E@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 31 Thread-Index: AcJu5Sggo4/02nPuQ1S9rEHM2HJ/3Q== From: "Hongzhou Liu" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g98GA2h05257 The main contribution of this paper is that it introduces a semantic file system that can provide associative attributed-based access to the contents of an information storage system and integrates this access into the existing tree structured file system with virtual directories. Virtual directories also enable unmodified remote hosts to access the facilities of a sematic file system with existing network file system protocols. In SFS, an attribute is a field-value pair, where a field describles a property of a file, and a value is a string or an integer that gives the value to the property. Virtual directories include field virtual directory and value virtual directory. A field virtual directory is named by a field, and has one entry for each possible value of its corresponding field. Vaule virtual directories are contained in field virtual directories and have one entry for each entity described by field-value pair. Accessing a path with virtual directories is actually querying entities which have allthe attibutes described by field-value pairs along the path. The mapping between attibutes and entities are maintained by transducers. SFS checks the file system periodically. Once a file is modified, a corresponding transducer is called to extract updated information from the modified file. Different type of files have different transducers. Transducers can be programmed by users to perform arbitrary interpretation of file and directory contents in order to produce a desired set of field-value pairs for later retrieval. The use of fields allows transducers to describe many aspects of a file, and thus permits subsequent sophisticated associative access to computed properties. Tansducers are highly flexible. They can identify entities within files as indepedent objects for retrival. SFS can describe location information by field-value pair(e.x Location - room5155). However, if you want to find "the closest printer", first you need to know where you are now with help of some localization system. And you should know the distance to differect location where there are printers. In other words, SFS itself can not support requests like "print to the closest printer". SFS also caches computed results for each query at the SFS server. This cache can reduce the number of disk accesses greatly. thus, no wonder it can improve the performance of the file system. From yao@CS.Cornell.EDU Tue Oct 8 12:45:40 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Gjdh13722 for ; Tue, 8 Oct 2002 12:45:39 -0400 (EDT) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Subject: 615 PAPER 31 X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 Date: Tue, 8 Oct 2002 12:45:39 -0400 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302ED4C4A@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 31 Thread-Index: AcJu6iI1iWSzu6H6TJqBviAVVzi2xQ== From: "Yong Yao" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g98Gjdh13722 The paper presents how to develop a specific approach for information storage to permits both effective information sharing and reductions in programming complexity. Another expected property of such an approach is easy incorporation to existing file system. A semantic file system is a more effective storage abstraction, which automatically extract attributes from files and organized them in a tree structure. It provides flexible associative access to the system's contents with specialized transducers. It performs automatic indexing when files or directories are created or updated. An attribute has two components, a field describes a property of the file, while a value is the corresponding content of the property. The field attribute is not unique for a file. A transducer is kind of a filter to get input from the contents of a file and outpute the field-value pairs. The transducer table helps to determine the exact transducer to interpret a given file type. Queries are used to access a semantice file system. A query is composed from a set of attributes, where each attribute describes the desired value of a field. The semantic file system is query consistent. The query result is a set of files that include the entites described. A query is executed through use of virtual directories, which is computed on demand and has no difference to an ordinary directory for a client program. Authors do not mention how to apply the approach directly in an ad-hoc network. One possibility is to map the storage at each node as a file. The whole network turns out to be a large distribute file system. Users access individual node through queries over attributes and values. A transducer can automatically publish the content of a file into the network for futer reference. Yong From ashieh@CS.Cornell.EDU Tue Oct 8 12:53:05 2002 Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Gr5h15334 for ; Tue, 8 Oct 2002 12:53:05 -0400 (EDT) Received: from localhost (ashieh@localhost) by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g98Gr5027827 for ; Tue, 8 Oct 2002 12:53:05 -0400 (EDT) Date: Tue, 8 Oct 2002 12:53:05 -0400 (EDT) From: Alan Shieh To: Subject: 615 PAPER 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII The semantic file system is a system that transparently filters files stored in a file system based on domain-specific knowledge. An transducer is associated with each file type; this transducer extracts key-value pairs from the files, which are indexed periodically, and may also generate "virtual files", e.g. represent individual mail messages as files. To provide transparency with UNIX file system semantics, queries are encoded (only conjunctions are supported) in directory paths. Results from a query are returned as files in the directory. ** Shortcomings The paper does not describe a mechanism for coordinating collisions between attributes that are defined by different inducers. The flattened attribute system pushes much of the semantics of the attributes into the transducer and interpreter programs. While this is the case with schemas in most databases, other systems typically provide ways to impose some structural relationship between the data, thus allowing more room for generic query optimizations. ** Future work - For the system to be usable as a service-discovery engine, a light-weight index update needs to be added. This doesn't need to be particularly fancy, as it would primarily support things such as bits for presence, location, and some space for transient state (printer queue size). These fields either have a small range, or are rarely changing. - If the UNIX file abstraction is to be retained, then services should be expressed as sockets or devices. - Add a mechanism for performing joins between query results. From mp98@cornell.edu Tue Oct 8 13:24:57 2002 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98HOvh23178 for ; Tue, 8 Oct 2002 13:24:57 -0400 (EDT) Received: from cornell.edu (r109493.resnet.cornell.edu [128.253.240.252]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA27214 for ; Tue, 8 Oct 2002 13:24:56 -0400 (EDT) From: mp98@cornell.edu Date: Tue, 8 Oct 2002 13:24:56 -0400 Mime-Version: 1.0 (Apple Message framework v546) Content-Type: text/plain; charset=US-ASCII; format=flowed Subject: 615 Paper 31 To: egs@CS.Cornell.EDU Content-Transfer-Encoding: 7bit Message-Id: X-Mailer: Apple Mail (2.546) I sent this to the wrong email address. Begin forwarded message: From: mp98@cornell.edu Date: Tue Oct 8, 2002 11:52:41 AM US/Eastern To: egs@cornell.edu Subject: 615 Paper 31 The semantic file system is an attempt to build on top of traditional file systems, using virtual directories to allow the user to do searches based on file attributes. The file attributes are extracted using 'transducers', which are basically large content parsing scripts. Obviously the system's usefulness is limited by the ability to write good transducers (it is hard, for example, to write a transducer for a binary file). To get all file owned by Alice with the word 'cat' one could 'cd /sfs/owner:/alice/text:/cat' and be in a virtual directory containing all matching files. The authors of this paper implemented it on top of an NFS server. The response times are a bit slow (are two seconds too great a latency for a good interface?), but in a file system environment, it does provide a good way to quickly find scattered files. This system might not be so appropriate for a wireless network. One could possibly imagine a transducer on a centralized server, accessible from wireless nodes, that determines the location of networked devices and then finding the close printers by doing 'ls -F /sfs/service:/printer/location:/`whatismylocation`'. In this case we have added a location attribute to the SFS. However, this is trying to extend a system based around files to a system for services, which seems a bit awkward. To find your closest neighbor you might have to do something like 'ls -F /sfs/type:/wirelessnode/location:/`whatismylocation`'. Note however, that the system as it stands only allows for attribute matching. I can find which printer is close, but not closest. To improve this, one would have to implement a set of attributes like 'distance:/' and values like '<5/' which would require the server knowing the location of the client or attributes like 'distance:/' coupled with a 'from:/' qualifier. In the end, however, the advantage of using a file system for such information breaks down when one is no longer interested primarily with files, but with services, neighbors, and so forth instead. From sc329@cornell.edu Tue Oct 8 13:32:30 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98HWTh24909 for ; Tue, 8 Oct 2002 13:32:29 -0400 (EDT) Received: from sangeeth.cornell.edu (syr-24-58-36-135.twcny.rr.com [24.58.36.135]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA03629 for ; Tue, 8 Oct 2002 13:32:28 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008133101.031581f8@postoffice2.mail.cornell.edu> X-Sender: sc329@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 13:32:31 -0400 To: egs@CS.Cornell.EDU From: Sangeeth Chandrakumar Subject: 615 PAPER 31 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Submitted by - Sangeeth Chandrakumar This paper presents semantic file system, an informational storage system which provides flexible access to its contents based on attributes from files. In the paper, the authors explores the scheme that a semantic storage system is much better abstraction than do the traditional tree structured file systems for information sharing. SFS also provides associative access to file servers in a distributed systems. SFS presents the notion of "transducers", which are specific to file types and can extract a set of attributes that enable later retrieval of the files. Every file has a set of field-value pair, where the fields describe the property of a file. A transducer is a filter that takes as its input the contents of a file, and outputs the file's entries and their corresponding attributes. The associative access interface to a semantic file systems is based upon queries that describe desired attributes of entities. The result of a query is a set of files/or directories that contain the entities described. SFS is query consistent, that is if the updates ceases to exist to the contents of a semantic file system, it will eventually become consistent. SFS supports the use of virtual directories to describe a view of file system contents. A virtual directory is computed on demand by SFS. Another feature of SFS is the compatibility of virtual directories with the existing file systems. The paper talks very little about systems, which have mobile and adhoc nodes. So the scheme of "printing to the nearest printer" will not work in this case to as you are not aware of your location. In a mobile environment, which would require frequent updates, this scheme may not be effective. Also having to specify all the field-value pairs would increase the network overhead in discovering new services. From pj39@CS.Cornell.EDU Tue Oct 8 14:08:44 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98I8ih03490 for ; Tue, 8 Oct 2002 14:08:44 -0400 (EDT) Received: from cornell-yb3go20.cornell.edu (syr-24-59-67-50.twcny.rr.com [24.59.67.50]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id OAA01609 for ; Tue, 8 Oct 2002 14:08:43 -0400 (EDT) Message-Id: <5.1.0.14.2.20021008140616.02464a50@postoffice2.mail.cornell.edu> X-Sender: pj39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 08 Oct 2002 14:08:38 -0400 To: egs@CS.Cornell.EDU From: Piyoosh Jalan Subject: 615 PAPER 31 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Semantic File System This paper describes Semantic File System an information storage system that provides flexible associative access to system resources. It provides attribute based access to system contents based on automatic extraction and indexing of key properties of system objects. SFS provides two type of interface a user interface (UI) and an application programming interface (API) to its associative access facilities. SFS creates virtual directories based on the user described desired attributes. SFS can be integrated with existing tree structured file systems with the concept of virtual directories. A transducer table maintains a list of transducers to be used with different files, before filtering them based on some given attributes, and creating the virtual directories. One way of implementing the transducers table is by allowing the users to store subtree specific transducers in the subtree's parent directory and search the directory strucutre for an appropriate structure during lookup. Implementing SFS with existing protocols like Network File System (NFS) and Andrew File System (AFS) would be easy because of its backward compatibility. SFS can also function as a Distributed File System for remote access to files. Associative access to interface to a SFS in based on lookup queries based on the desired attributes. The result of the query is a set of files and/or directories that contain entities described. The paper presents a fair amount of experiments to test their base thesis that SFS presents a more effective storage abstraction than do traditional tree structured file systems. One of the major drawbacks of the paper is the distributed properties of SFS needs to discussed in more details for distributed access of files (DFS). The paper at one or two places mentions about DFS but does not go in sufficient details. The queries "nearest printer" or "nearest printer to Alice" would work in SFS if it also smaintains the metrics associated with the location of the files (everything is file in SFS. SFS is not particularly suited for mobile wireless networks because it is not designed keeping MANET's in mind. From ks238@cornell.edu Tue Oct 8 14:27:32 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98IRWh08394 for ; Tue, 8 Oct 2002 14:27:32 -0400 (EDT) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id OAA00184; Tue, 8 Oct 2002 14:27:29 -0400 (EDT) Date: Tue, 8 Oct 2002 14:27:29 -0400 (EDT) From: ks238@cornell.edu Message-Id: <200210081827.OAA00184@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: ks238@cornell.edu Reply-To: ks238@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: ks238@cornell.edu X-Originating-IP: 128.84.99.8 Subject: 615 Paper #31 Semantic File Systems moves away from file systems based on traditional tree structures to one which is based on the attributes associated with files. The primary contribution of this paper comes from the idea of using transducers to glean certain information from a file and make corresponding assumptions about its attributes and values. This kind of information can be attained with associative access to the file in order to conduct content based searches of a file. When conducting a query within a semantic file system, the attributes in demand are searched for in the file system and the files matching this attribute are returned. Queries are conducted through the use of virtual directories which categorize different files based on their respective attributes and allow users to code specific transducers to search the contents of these files. Based on these searches of the virtual directories effective attribute based indexes of the file system can be created in order to efficiently search through the system. This is critically new and effective structure for information and resource sharing in file systems. A lot of the same problems that users see in web searches can also be seen in implementing transducers that are able to attain attribute information of the web. The primary issue is that denoting a file with a given attribute after gleaning information such as authors and title may return erroneous attributes in regards to the file. I think the two biggest issues with finding the printer “closest to Alice” are one, the ability to define location as the attribute of choice. Location and proximity to the point at which the query is initiated is difficult. Also, the integrity of the attributes is a critical problem with SFS. The biggest problem with ad hoc networks is the constant flux and change in the data upon which attributes are searched. So, should a printer have excessive load or should they have changed location the according data needs to be updated. Thus, I don’t see an SFS being quite robust. From linga@CS.Cornell.EDU Tue Oct 8 14:43:19 2002 Received: from snoball.cs.cornell.edu (snoball.cs.cornell.edu [128.84.96.54]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98IhIh12205 for ; Tue, 8 Oct 2002 14:43:18 -0400 (EDT) Received: from localhost (linga@localhost) by snoball.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g98IhIV07182 for ; Tue, 8 Oct 2002 14:43:18 -0400 (EDT) Date: Tue, 8 Oct 2002 14:43:18 -0400 (EDT) From: Prakash Linga To: Emin Gun Sirer Subject: 615 PAPER 31 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Semantic Filesystems -------------------- This is a new information storage system (like a traditional file system) which automates extracting attributes from certain files and hence promotes file sharing (using associative access). This automated indexing of files is "semantic" because the semantics of updated file system objects are used to extract the properties for indexing. A filetype specific transducer is used to interpret files stored in a semantic file system and produce attributes (atrribute is a field-value pair) which help in retrieval of files. Transducible table is used to decide on which transducer to use. Granularity of associated access is defined to be an entity. Entity could be a directory, a file or objects in a file. Transducer functions as a filter that processes the file and outputs the file's entries and corresponding atrributes. New transducers can be used to extend or modify the semantics of the file system. Associated access interface to a semantic file system is based on queries. A query describes the desired attributes of entities and the result of a query is a set of files/directories that contain the entities described by the query. To process queries in a semantic file system virtual directories are used at each level of the directory tree. Virtual directories are indistinguishable from normal directories. Authors have built a semantic file system that implements NFS protocol as its external interface. A series of results have been presented showing that semantic file systems are a more effective storage abstraction when compared to traditional hierarchical file systems. Pros: Semantic file systems promote associative access based on content of the files. This can be easily integrated into traditional file systems using virtual directories. Cons: This does not work well in dynamic and mobile environments. Benefits of virtual directories for application programmers are not very apparent and need to be investigated. Results not complete and exhaustive. Comments: Coming to the effectiveness of semantic file systems for users of mobile devices under the scenarios considered last week, it does not seem to be work well. It is not clear how this works in a dynamic and mobile environment. It is also not easy to express queries like nearest printer, printer nearest to Alice etc in this framework. From vivi@CS.Cornell.EDU Tue Oct 8 14:52:48 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Iqmh14386 for ; Tue, 8 Oct 2002 14:52:48 -0400 (EDT) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: 615paper31 X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3 Date: Tue, 8 Oct 2002 14:52:48 -0400 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D11AF88@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615paper31 Thread-Index: AcJu+9fmNr63UPzcR2udpXGLptRHiA== From: "Vivek Vishnumurthy" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id g98Iqmh14386 Semantic File Systems provide the user with a view confoming with the semantics of the files. The system builds an associative access interface on top of the existing file system. Here "transducers" extract attributes from the files. This allows users to look for files having certain properties, as in, tex files authored by a certain person, or object files importing certain library files. Semantic File Systems are applicable in scenarios considered in the last week. We could have devices listed as files in the file system, and have transducers extract the physical location of the device. Now a mobile user aware of his/her location, can issue semantic requests to look for required devices in the vicinity. Thus the "nearest printer" request can be handled. The accuracy of the results depend on the frequency of updates in the system. (More frequent the updates are, less inaccurate the results are). The SFS server uses caching to respond to virtual directory queries. This is effective when the file system is rarely modified, and when there are frequent occurences of the same virtual directory queries.