From hs247@cornell.edu Tue Oct  8 00:25:04 2002
Received: from mailout5-0.nyroc.rr.com (mailout5-0.nyroc.rr.com [24.92.226.122])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g984P3h08587
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 00:25:03 -0400 (EDT)
Received: from hubby.cornell.edu (syr-24-58-42-130.twcny.rr.com [24.58.42.130])
	by mailout5-0.nyroc.rr.com (8.11.6/RoadRunner 1.20) with ESMTP id g984P1p25891
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 00:25:01 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008002458.00b86b78@postoffice2.mail.cornell.edu>
X-Sender: hs247@postoffice2.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 00:25:10 -0400
To: egs@CS.Cornell.EDU
From: Hubert Sun <hs247@cornell.edu>
Subject: 615 Paper 31
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

The semantic file system is introduced in this paper.  A semantic file 
system is a file system where a user can specify a listing of directories 
or files based on their attributes.  Ie.)  Give all the files that were 
create between Dec 21st and January 5th.  In their implementation, these 
attributes are implemented as virtual directories on top of existing file 
systems.

The file system described offers two interfaces.  The user interface and 
the application interface.  The application interface allows programmers to 
describe transducers.  Transducers filter through files and directories and 
gathers attribute information.  An example described would be to write a 
transducer that searched through C files and figure out what they import or 
what their methods are.  Then a user through the user interface can query 
the system and find all files that include "iostream.h".

So how does this apply to ad-hoc networks and naming services?  One could 
imagine this file system to be on a distributed system or ad-hoc 
network.  (How this is done will be glossed over, we'll just assume it 
can).  Then we can imagine services to be file descriptors.  When a service 
is added to the system one, we can write special transducers for special 
services.  For a printer, we can sort by location, colour/non colour, 
laser/bubblejet..etc.  Now if a person wanted to find its closest printer, 
all he would have to do is query and list all printers by location.  Or 
query for the closest printer.   Again the problem exists like the INS for 
finding the closest printer to Alice.  One would have to know where Alice 
is.  Again, caching information could help, but this information may not be 
up to date.  But from the file system perspective since Alice is not a file 
or directory, we might have to modify the transducers to track users on the 
system.

Though the paper does describe this system as a possibility for a 
distributed file system, it doesn't mention anything about mobile nodes or 
ad-hoc networks.  For the file system to be consistent, when a file is 
added, the file and its attribute information have to be propagated to all 
its nodes.  One could look at a semantic files system as a database.  The 
data is all the files and directories in the system.  We can then form 
views, tables or indexes to look at this data (via transducers).  A user 
could then use a querying language like SQL to do searches.  (ie.  Select 
printer with resolution = 300).  However, one problem is that, how does 
this apply to ad-hoc networks with data that can change very frequently.


From mvp9@cornell.edu Tue Oct  8 00:57:59 2002
Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g984vwh14226
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 00:57:59 -0400 (EDT)
Received: from zoopark.cornell.edu (syr-24-58-46-186.twcny.rr.com [24.58.46.186])
	by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id AAA11350
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 00:57:56 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008005721.01aa0360@postoffice.mail.cornell.edu>
X-Sender: mvp9@postoffice.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 00:57:58 -0400
To: egs@CS.Cornell.EDU
From: mike polyakov <mvp9@cornell.edu>
Subject: 615 PAPER 31
Mime-Version: 1.0
Content-Type: text/html; charset="us-ascii"

<html>
<font face="Times New Roman, Times">This paper presents a different type
of file system, one that is indexed by meanings&nbsp; semantics&nbsp; of
documents, not their physical location.&nbsp; The file system is layered
on top of the existing one, such that no additional software or browsers
are necessary for clients.&nbsp; Directories, files, and components of
files are periodically indexed to allow creation of categorization in the
form of virtual directories on the fly.&nbsp; The benefits are two
fold.&nbsp; First, no new file system or software to interact with it
need to be created.&nbsp; Second, so-called “transducers,” which are
essentially sophisticated filters, allows users to submit arbitrary,
complicated queries.&nbsp; The performance is also reasonable.<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>The SFS
presents a more organized and efficient tool to perform the tasks unix
operators have done for decades with command line utilities.&nbsp; The
functionality seems in fact to be a subset of perl, although the
mysterious ‘transducers’ are never described in any detail.&nbsp; The
real improvement comes in speed due proactive indexing.&nbsp; The authors
claim that under expected use, the indexing performs relatively well
(that is, the system is expected to ‘converge’ to consistency).&nbsp; The
weakest part of the paper is evaluation, although analysis in the variety
of environments and loads inherent in the task is a formidable
challenge.&nbsp; Still it would be nice to take some relevant task for
which a simple command is not readily available, like looking through
articles by category, and compare the look up time of average users with
and without the system.<br>
<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>Besides
the many extensions proposed in the paper itself, several come to
mind.&nbsp; Although look-up is much less intensive than the indexing, in
complicated scenarios, some sort of caching could be used.&nbsp;
Similarly, how can performance be improved in presence of multiple users,
and, in fact, will they interfere with each other?&nbsp; But, in the end,
my biggest question is, how useful is this, really?&nbsp; Directory names
are supposed to reflect contents, and if parallel attributes are desired,
databases present a more obvious solution.&nbsp; For file content
sorting, perl, awk, and grep suffice.&nbsp; How much does this improve
the efficiency of the average user/programmer? <br>
</font></html>

From shafat@CS.Cornell.EDU Tue Oct  8 01:53:19 2002
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g985rJh24307
	for <egs@popsrv.cs.cornell.edu>; Tue, 8 Oct 2002 01:53:19 -0400 (EDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Subject: 615 PAPER 31
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
Date: Tue, 8 Oct 2002 01:53:19 -0400
Message-ID: <47BCBC2A65D1D5478176F5615EA7976D134FA1@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615 PAPER 31
Thread-Index: AcJtmzmh9xvpa3m4SuSA/tQXKQLL7w==
From: "Syed Shafat Zaman" <shafat@CS.Cornell.EDU>
To: "Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g985rJh24307

This paper implements an information storage system called the Semantic File System (SFS)
that is aimed towards presenting a more effective storage abstraction for information
sharing and command level programming. In SFS, user programmable "transducers" are used to
extract attributes from files and directories. A host of associative access facilities is
provided to help the user discover and share relevant file objects. SFS is a query based
file system that creates virtual directories based on the user described desired attributes.
A transducer table maintains a list of transducers to be used with different files, before
filtering them based on some given attributes, and creating the virtual directories. SFS can
be integrated into existing file systems, and the paper discusses in length about its
implementation with NFS.
 
The paper also presents a fair amount of experimental results that investigate the
performance of SFS' effectiveness in storage abstraction. I got the feeling that a lot of
numerical figures were stated in the evaluation section without sufficient elaboration on
their relevance or importance to the tests. The paper did not seem to do a strong job on
talking about transducers which is essentially the heart of the system. It fails to address
the process of generating these transducers, and how they can be promptly developed to
handle files of all possible types. In fact, this appears to be a major drawback of the
system. One or two examples of SFS's usage in application programs could perhaps have been
also helpful.
 
In the scenario considered last week, where a user is wandering around a building with a
laptop hooked up to the wireless network, and looking for the nearest printer, SFS can be
useful only if the system maintains a "location" attribute associated with each object.
Every device/object has to be represented as a file on the network system, and only then can
SFS be used to locate a certain service meeting a set of requirements. However, my guess
would be that in mobile networks, SFS will not be able function effectively because of the
dynamic nature of the network. The location attributes have to updated constantly, and for
queries with a large set of attributes, the overhead cost might be just too high for the
current version of SFS to handle.
 

From bd39@cornell.edu Tue Oct  8 02:11:39 2002
Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g986Bch27619
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 02:11:38 -0400 (EDT)
Received: from boweilaptop.cornell.edu (r102439.resnet.cornell.edu [128.253.163.42])
	by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id CAA17476
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 02:11:38 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008020952.00b75140@postoffice2.mail.cornell.edu>
X-Sender: bd39@postoffice2.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 02:10:17 -0400
To: egs@CS.Cornell.EDU
From: Bowei Du <bd39@cornell.edu>
Subject: 615 PAPER 31
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

Paper 31

Semantic Filesystems

The main contribution this paper is the introduction of the ability
for an operating system to extract file attribute information from the
contents of the file through application specific "transducers" and to
organize this information through a virtual file system. User written
transducers examine the contents of files and expose attribute/value
pairs, which are indexed in the file system.

The sematic information is integrated with existing file system
structures through the use of virtual directories, which represent
semantic content as "attrib:/value/" directory entries. Queries of
attributes are represented as a logical and of the different
attributes specified in the directory path.

These hierarchies can work in the same way as attribute trees in the
intentional naming services. (One can imagine replacing INS queries
with this file system structure). Instead of simply indexing files,
the transducers could index services and present services in a global
namespace manner similar to AFS. The advantage of such a scheme is
that the naming mechanism is easily integrated into existing schemes
for naming operating system resources. For example, we could use
"/location:/alice/printers:/lpr1" to print a file near Alice, which
would be transparent to the applications using the printer. New search
critieria could be implemented by writing new transducers. The big
question is how this scheme will be implemented in an ad-hoc
network. How will consistency of the state of the directory
information be maintained? How will transducers be distributed into
the network? I thought the on-demand generation of the directories was
a good idea, one can imagine flooding the network with a query, and
caching the value of the response at the local node, a la proactive
routing protocols.

Caching would improve performance, only if the repeated queries were
directed in the category. In the Alice example, if Alice was highly
mobile, then caching would be of little use - everytime we query the
local state, we would end up with a broken link. However, resources
that stay put would benefit, i.e. printers, not Alice.

===

Active Names

Active Names contributes an interesting idea: that resource names can
also be bound to processing of the resource that was named, how it is
located, transported etc. The idea is that some intelligence may be
needed in the use of a service which can be described in the name of
the service. This is somewhat similar to INS in that the services not
the server is named. Active Names assumes that there is the ability
to execute mobile code to provide the services requested in the
name.

Active Names associate names with namespace programs, which are pieces
of mobile code that can be downloaded and run on any Active Name
server. A request is transformed by the services named in a pipeline
fashion, one service applied after another. Effect to the data that
need to be applied after a service is performed is carried with the
data in the form of small "after effect" code snippets.

For the purpose of naming similar to INS and Semantic Filesystem, one
would write an Active Service that performed the requested query. In
the Alice example, we would have an Active Name service "Find Nearest
Printer". Active Names offer a very flexible framework in which to
perform queries - essentially any program can be written. The
composition function of the services is also very interesting - there
could be a service which performed load balancing, another that
located devices close to a location. Composing the two in a query
would result in a device of least load from a nearby location.

Caching of the Active Name programs would benefit the bandwidth
consumption and intelligent use would reduce bandwidth (PDA graphics
example). I would imagine Active Name programs would also cache some
of their state in between runs. One problem with Active Names is that
the functionality of the system is too general. Basically any program
can be an Active Name service. Active Names suggest that services can
be made to work in a pipeline/interchangable fashion, and also that
services in a network can be mobile, moving from node to node. Beyond
that, the functionality of the system is basically wide open.

From jsy6@postoffice2.mail.cornell.edu Tue Oct  8 02:21:28 2002
Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g986LSh29335
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 02:21:28 -0400 (EDT)
Received: from Janet.cornell.edu (syr-24-58-41-193.twcny.rr.com [24.58.41.193])
	by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id CAA03175
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 02:21:26 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008010217.00b49e90@postoffice2.mail.cornell.edu>
X-Sender: jsy6@postoffice2.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 02:21:10 -0400
To: egs@CS.Cornell.EDU
From: Janet Suzie Yoon <jsy6@cornell.edu>
Subject: 615 PAPER 31
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed


The intent of INS is to complement and not replace DNS.  Likewise, Active 
Names is a flexible naming resolution is only meant to extend the current 
Internet Domain Naming system.   Active names and INS share some goals in 
common.  Both use naming to describe intent rather than location, but the 
naming abstraction provided by Active Names is programmable.  Active Names 
is created for wide-are distributed services.  The major contributions of 
Active Names are its extensibility, location independence, composibility, 
and efficient use of network resources. Active Names are similar to DNS in 
that they are hierarchical namespaces.  Each namespace has a program 
associated with it for interpreting that namespace in any desired 
fashion.  The program associated with a namespace is selected by the owner 
of the namespace.  The client is the owner of the root namespace.  Unlike 
DNS, a user only needs to name the service they wish to use and not the 
specific transport protocol.
Suppose we want to find the printer closest to Alice.  We will either need 
a location-support system integrated into the system or pre-compute 
relative geographical distances of the building.  The hierarchical 
namespaces of the Active Name could correspond with the heirarchial 
geographical representation of the printer.  So the query first sees finds 
the closest printers in respect to the same building, floor, room, and then 
actual space in the room.           


From mr228@cornell.edu Tue Oct  8 03:53:42 2002
Received: from cornell.edu (cornell.edu [132.236.56.6])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g987rgh15729
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 03:53:42 -0400 (EDT)
Received: from cornell.edu (pptp-032.cs.cornell.edu [128.84.227.32])
	by cornell.edu (8.9.3/8.9.3) with ESMTP id DAA03908
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 03:53:42 -0400 (EDT)
Message-ID: <3DA28F1D.14AAE3A@cornell.edu>
Date: Tue, 08 Oct 2002 03:54:05 -0400
From: Mark Robson <mr228@cornell.edu>
X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To: egs@CS.Cornell.EDU
Subject: 615 PAPER 31
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Semantic File Systems is an interesting new way to look at data.  SFS
takes all the files in a filesystem and extracts meta data from them via
transducers.  Transducers are small programs that tell SFS how to
extract various types of meta data from the files.  As image files would
probably be processed very differently than text files, there is
(potentially) a different transducer for each file type.

Instead of (or maybe in addition to) the traditional way of looking at
files and directories, SFS proposes that files be grouped into virtual
directories.  Virtual directories are nothing more than a result set for
some query.  A virtual directory might be "all the files created after
date D" or "all the files whose size is X", etc.  The paper argues that
this is a more nature (read: better) way to look at your data.

While it's not clear how this would be immediately applied to ad-hoc
networks, there are some obvious wins.  If you let services be files and
attributes of the services be equivalent to the files' meta data, then
you have a system much like INS.  The problems arise when figuring out
where the data is stored, cached, etc.  Is there in network processing? 
That is, who is responsible for applying the transducers to the data,
who get the results of this processing, etc.

Future work might explore more precisely what it would take to implement
this in an ad-hoc world -- either as is, or modified for services and
their attributes.

From xz56@cornell.edu Tue Oct  8 04:57:34 2002
Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g988vYh28101
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 04:57:34 -0400 (EDT)
Received: from XIN (ex120.dialup.cornell.edu [132.236.102.120])
	by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id EAA16056
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 04:57:32 -0400 (EDT)
Message-ID: <004501c26ea8$bc24e1d0$af66ec84@XIN>
From: "Xin Zhang" <xz56@cornell.edu>
To: "Emin Gun Sirer" <egs@CS.Cornell.EDU>
Subject: 615 PAPER 31
Date: Tue, 8 Oct 2002 04:53:15 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="Windows-1252"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000

Semantic File System presents a file system, where files can be accessed
with the aid of the automatic extraction and indexing of features of files,
which helps in information searching, sharing and programming.
Constructed above the traditional tree file structure, it is highly
compatible. (INS, also seeks compatibility to some traditional lower
layers.) Dealing with files, much like nodes in ad hoc networks accessing
the services in it, it can be demand-oriented (in stead of traditionally
through locations). So property extraction also
falls in to the form of attribute-value pairs. They called this set of
av-pairs as "transducers". By arbitrarily designing transducers, different
demands (for the searching file) can be met.
>From my understanding, semantic file system is much like INS (or I should
say INS-99 is like semantic-91). Here, the transducer is just like the whole
set of INRs. They choose the file/forward the packet to the server,
according to the demand in terms of ac-pairs. The only difference is that
INRs work distributively and the ad hoc network is more dynamic. So the
performance should be improved through caching in semantic file system.


From vrg3@cornell.edu Tue Oct  8 10:26:05 2002
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98EQ4h11529
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 10:26:05 -0400 (EDT)
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id KAA00285;
	Tue, 8 Oct 2002 10:26:03 -0400 (EDT)
Date: Tue, 8 Oct 2002 10:26:03 -0400 (EDT)
From: vrg3@cornell.edu
X-Sender: vrg3@travelers.mail.cornell.edu
To: egs@CS.Cornell.EDU
Subject: 615 PAPER 31
Message-ID: <Pine.SOL.3.91.1021008102549.16259F@travelers.mail.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

This paper presents the concept of Semantic File Systems. Traditional 
filesystems, like traditional network naming schemes, are robotically 
based on a hierarchy of an abstract concept of location or position. SFSs 
index files based on more useful properties, such as libraries used (when 
referring to source code) or genre of music (when referring to MPEG audio 
and/or video). This type of metadata is extracted from files using an 
extensible set of "transducers."

The attributes and values of a file determine where it can stand in the 
virtual directory hierarchy. A virtual directory is essentially a 
directory whose name represents a search query and whose contents 
represent the results of the search. Any application on the system which 
accesses files can do so using the SFS, so the natural description of a 
file can be used to locate it at all times, leaving any underlying 
traditional directory structure hidden.

Although the paper presents the concept in terms of files, in UNIX 
everything is a file anyway, so we could also consider the same scheme to 
organize and locate nodes of a network. With our printer example, a 
printer transducer might query the spooler for its properties. It is 
unclear, however, how often to update the table of attribute values. For 
simple data files it makes sense to update whenever the file is accessed 
for writing, but for files which represent other things you would have to 
do periodic updates as well. Incorporating support for requests like 
"nearest printer" would probably be best done using application-level 
location determination, followed by searches on location. A "printer 
nearest Alice" query would work the same way, by first determining 
Alice's location and then searching on that location.

From kwalsh@CS.Cornell.EDU Tue Oct  8 11:03:06 2002
Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98F36h19440
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 11:03:06 -0400 (EDT)
Received: from localhost (larry.cs.duke.edu [152.3.140.75])
	by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id LAA03169
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 11:03:05 -0400 (EDT)
From: kwalsh@CS.Cornell.EDU
Received: from 132.236.29.70 ( [132.236.29.70])
	as user walsh@imap.cs.duke.edu by login.cs.duke.edu with HTTP;
	Tue,  8 Oct 2002 11:03:05 -0400
Message-ID: <1034089385.3da2f3a9a131e@login.cs.duke.edu>
Date: Tue,  8 Oct 2002 11:03:05 -0400
To: egs@CS.Cornell.EDU
Subject: 615 PAPER 31
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.0
X-Originating-IP: 132.236.29.70


Semantic File Systems

	The SFS system presents a natural extension to heirarchical file 
systems in which virtaul directories are created on-demand based on user 
queries. Little direct attention is given to the query language, other than to 
state that it allows boolean operations on attribute/value pairs. The matching 
is operator is apparently only '='. The authors do mention a more flexible 
query language as future work. It appears much simpler in SFS than INS to add, 
remove, and restructure the attribute names, as no meaning is given to them 
whatsoever. As with INS, though, it seems difficult to perform queries such 
as "nearest object". The same techniques as in INS would work, such as a ring 
search or a better query language.
	Since SFS is concerned only with file system contents, it is easy for 
the server to track, and manage a cache of past queries. Changes to file 
contents necessarily pass through the sfs server, and it then has the 
opportunity to update or invalidate cache entries.
	SFS might be applicable to ad hoc networks in at least two ways. First, 
traditional network file system tasks are very heavy weight (eg., all searching 
or filtering is done on the client). With SFS, much of this work can be 
efficiently offloaded to the server, reducing the network load much in the same 
way as sQL stored procedures. Second, a distributed version of SFS might serve 
as a naming mechanism for services and objects in an ad hoc network. This idea, 
however, suffers from the same problems as INS.

Active Names

	Through user-extensible routing and name resolution, active names gains 
extraordinary flexibility. It is the only system of the three that could 
directly support "nearest" operators, or operators which simultaneously balance 
multiple metrics ("nearest" and "least loaded" and "fastest", etc.). These 
would all be implemented as user-defined extensions, uploaded into the active 
name resolvers. This flexibility comes at a high price, of course. In order to 
be useful, installations will need to populate resolvers with many types of 
resolvers, filters, and routing mechanisms, since individual users can not be 
routinely expected to do so.


From smw17@cornell.edu Tue Oct  8 11:13:53 2002
Received: from cornell.edu (cornell.edu [132.236.56.6])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98FDqh21580
	for <egs@CS.Cornell.EDU>; Tue, 8 Oct 2002 11:13:53 -0400 (EDT)
Received: from cornell.edu (syr-24-161-107-202.twcny.rr.com [24.161.107.202])
	by cornell.edu (8.9.3/8.9.3) with ESMTP id LAA11831
	for <egs@CS.Cornell.EDU>; Tue, 8 Oct 2002 11:13:52 -0400 (EDT)
Message-ID: <3DA2F557.3090401@cornell.edu>
Date: Tue, 08 Oct 2002 11:10:15 -0400
From: Sean Welch <smw17@cornell.edu>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4.1) Gecko/20020508 Netscape6/6.2.3
X-Accept-Language: en-us
MIME-Version: 1.0
To: Emin Gun Sirer <egs@CS.Cornell.EDU>
Subject: 615 PAPER 31
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit


Intentional Naming System - The Intentional Naming System (INS) is a 
naming methodology
intended to combine resource discovery and routing into a single 
service/system.  The
end system is similar in some respects to DNS, in that there is a 
pre-defined well-known
server (the DSR) that coordinates the creation and matinence of the 
Intentional Naming
Routers (INR) into a well-defined tree.  Node attributes are 
application-specified
attribute-value pairs (av pairs) arranged heirarchically into a tree 
structure layered
above the IP layer in the network stack.  The protocol operates in close 
conjunction with
the application layer, with applications specifying av pairs, routing 
metrics, and providing
the periodic service updates necessary to maintain freshness in the INS 
network.  User
applications provide the naming services with the intended service 
rather than a network
address, and the INS makes its best effort to deliver the message to the 
optimal host or
to the defined subset of nodes matching the naming criteria. INS 
operates in close
conjunction with the application layer, which is responsible f.or 
defining attributes and
metrics in a way that makes this system useful

The INS implements a routing protocol based on the services requested 
and the advertised
application metrics.  While this may be reasonable in the average static 
case, the proactive,
centralized network creation and matinence limits the applicability of 
the presented
algorithm to more mobile situations.  In addition, while it is 
undoubtedly easier to push the
problem of metrics up to the application layer, it does not make the 
problem inherently
simpler.  A user looking for the nearest printer, for instance, would 
most likely prefer a
printer two rooms down over a physically nearer printer on a lower 
floor.  The examples
presented are deciptions of largely static networks, such as a camera in 
the white house.  
Extending this mechanism to mobile nodes will likely require a more 
sophisticated routing
and update algorithm to achieve reasonable performance.


Active Names - Active Names is another mechanism for resource/service 
discovery and
transport through naming control.  In contrast to the INS system, the 
Active Names protocol
achieves the routing by distributing the routing and processing 
throughout the network.
Each routing path is comprised of (potentially) a series of name 
resolution steps at various
resolver nodes throughout the network.  These are distributed, location 
independent
functions that, when applied in a series, provide the routing 
functionality.  The advantages
of an active naming scheme are that it allows for a very flexible, 
extensible system capable
of encompassing many different types of services, it provides for 
distribution of various
tasks throughout the network, potentially permitting better tradeoffs 
between processing
required and transmission bandwidth, and allows the client to define 
after methods to apply
to the returned data to make better use of available resources.

Active names are an interesting concept.  Distributing the load in a 
dynamic manner should
allow the Active Names system to be more applicable in high-mobility 
situations than INS.
In addition, the location independent functional behavior could also be 
advantageous in
heterogeneous networks or highly congested networks by adjusting the 
load distribution
based on actual or measured capacity.  Unfortunately, dynamic 
distribution of executable
code imparts a number of problems.  First off, an active network system 
must be comprised
of nodes capable of executing platform-independent code.  While this may 
not be a problem
for PC-type systems in wired networks, extending this to an ad-hoc PDA 
network may have a
significant impact on the networked machines.  Secondly, there are a 
number of security
concerns inherent in allowing arbitrary code to be distributed as part 
of a routing
protocol.  Even with secure authentication, a single compromised system 
may be capable of
infecting numerous other systems also running an active name system.  
Maintaining
acceptable levels of system security becomes a more difficult problem 
when running
active names schemes, especially in the case of a standardized routing 
architecture with
a standard hardware and software configuration.  Finally, there is the 
same problem in
an active naming system as in the INS discussed above.  While Active 
Names does provide a
distributed mechanism to distribute the routing load for better 
performance, it does not
solve the basic problem of how to convert traditional routing metrics 
into user
expectations without explicit outside information.


Semantic Filesystems - Semantic Filesystems are a filesystem interface 
that modifies the
traditional file system tree to create a virtual filesystem.  This 
virtual filesystem is
composed of files combined with a collection of transducers, which are 
special programs
that can extract the set of attributes from a given type of file.  
Examples presented
include author for text files, imports and exports from various types of 
code files, and
{from, to} attributes from mail files.  Accessing the semantic file 
system includes a
query implicit in the file reference that searches the transducer 
outputs for files
matching the quieried attributes.  Multiple conjunctive searches are 
permitted, but
disjunctive searches had not yet been implemented.  Recent searches are 
cached for
improved performance, and only partial re-indexing is performed during 
operational
updates and modifications, with scheduled full re-indexing.

Semantic filesystems are an idea that may actually be better suited to 
network resource
discovery than to filesystems as presented here.  In networks, the 
larger latency for data

transmission and lower bandwidth may make the implementation of 
query-based more attractive,

especially in ad-hoc networks where the cost of at-node processing is 
less than that of

transmission over a wireless link.  This is similar to the intentional 
naming system, where
the transducers have been replaced with application defined values and 
metrics, and where the

filesystem has been replaced by a resource naming scheme.


a) Rapid/Harsh Environment Sensor Networks
    - concept - Enable the rapid deployment of lightweight sensor nodes, 
potentially
        in hostile or remote locations that make conventional deployments
        unattractive (some similarities to smart dust systems).
    - Active networks may have some attractive implementations in 
heterogeneous systems
        by allowing more powerful nodes to take on more of the 
communications
        processing loads.  INS does not appear particularly advantageous 
for this
        style of network.


b) Appliance Networks
    - Concept - Enable intelligent appliances and industrial systems 
capable of inter-
        system coordination for better resource use
    - Intentional naming schemes may be a useful abstraction in these 
types of networks,
        as the name structure itself is conducive to specifying and 
locating
        different systems and classes of systems with minimal load to 
the devices
        themselves.  This concept may permit better integration of low 
computational
        power devices into a heterogeneous network (such as a factory 
floor or home
        kitchen) at the cost of some initial setup or pre-defined 
discovery script.
        Active naming may also be useful, as it provides a mechanism 
both to offload
        processing from relatively dumb devices as well as a means to 
distill returned
        data into a more efficient form based on the desired return 
destination.
    
c) Intelligent Resource Detection and Utilization
    - Concept - Provide meaningful interface to human users, such as 
'print to closest
        printer' rather than 'print to device attached to node 
172.22.5.233'.

    - Naming Systems - The naming systems suggested above can be used to 
support some
        degree of intelligent resource use.  The critical issue in the 
case of the INS
        is that the quality of the results will depend heavily on the 
intelligence of
        the application layer controlling it.  The command "find the 
nearest printer"
        is a fairly simple concept to a human, but defining its precise 
meaning in a
        heterogeneous computing environment is considerably more 
involved.  Using the
        example of a hybrid wired/wireless network, any application 
implementing a hop
        count will see considerable variation in the number of hops per 
meter of
        'real' distance, and may still return a printer that is 
physically close but
        inconvienient.  The translation of simple user concepts and 
seperation of
        nodes into useful subdivisions (such as the set of nodes present 
in a room)
        without external sources of information is not provided by 
either scheme.
        
        Active Names suffers from a similar problem.  It may be possible 
to do better
        than INS by delaying more precise determinations until you are 
closer to the
        destination (where persumably more, better information may be 
available), but
        there is still the exact issue mentioned above.  Neither system 
provides an
        improved mechanism for resolving the translation of human 
language concepts
        into effective algorithms, but merely pushes the issue to higher 
level
        protocols and explicit knowledge.


d) Movement Aware Routing
    - Concept - Use the location information to judge the movement rate, 
and to estimate
        when link breakage is probable.  From here, schemes such as 
intelligent route         

        invalidation and link forwarding/handoff can be implemented.
    - Pseudo-static routes - From the network information, provide a 
mechanism for nodes
        to identify and make efficient use of relatively stationary, 
stable routes.

    - More a network connectivity issue below the IP layer, with little 
interplay with
        these naming schemes.

e) Adaptive Link Adjustment
    - Concept - Use geographic feedback and either directional or 
adjustible power systems
        to improve link coverage in sparse areas
    - Again, more an issue of network connectivity (PHY/DLC/MAC), 
sitting well below the
        area affected by these naming protocols.


Caching - Both systems can potentially benefit from caching to improve 
overall system performance.
The INS structure makes caching relatively simple to implement, as the 
data streams are
referenced in a relatively clear, unambigious manner.  The basic cache 
here is not particularly
different from a standard cache with regards to data content, but the 
naming structure inherent
in INS may well provide more efficient use of the cache for popular 
repeating or often updated
data sources.

Active names can also benefit from caching, and additionally includes 
the possibility to
utilize downstream caches as a simplistic form of a dynamic server.  By 
caching not only the
data, but enough program functionality to implement some basic 
functions, the authors suggest
that cache performance may improve significantly over simple static 
caches.  They suggest
that implementing an active cache through their active naming system may 
increase the utility
of web caches, traditionally of limited effectiveness, by also caching 
and offloading some
limited functional complexity (for instance - an ad determination 
function for banner ads).

From tmroeder@CS.Cornell.EDU Tue Oct  8 11:47:09 2002
Received: from dhcp99-233.cs.cornell.edu (dhcp99-233.cs.cornell.edu [128.84.99.233])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Fl9h29603
	for <egs@CS.Cornell.EDU>; Tue, 8 Oct 2002 11:47:09 -0400 (EDT)
Received: (from tmroeder@localhost)
	by dhcp99-233.cs.cornell.edu (8.11.6/8.11.6) id g98FjPw02514;
	Tue, 8 Oct 2002 11:45:25 -0400
From: Thomas Roeder <tmroeder@CS.Cornell.EDU>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15778.64917.21515.71918@dhcp99-233.cs.cornell.edu>
Date: Tue, 8 Oct 2002 11:45:25 -0400
To: Emin Gun Sirer <egs@CS.Cornell.EDU>
Subject: 615 PAPER #31
X-Mailer: VM 7.07 under Emacs 21.2.1

The semantic filesystems paper describes an addition to the standard
filesystem protocols which allow properties of files to be searched
using virtual directories, which are computed on the fly by
transducers specific to given file types.  Although this is an old
paper, the ideas apply well to using mobile devices in resource
discovery.  If, for instance, a host fileserver annotated its files
with geographic location from one of the protocols from last week, a
mobile node could query for all files which are geographically near.
In general, since files can be used as an abstracting for processes,
and named pipes could be set up to printers, the filesystem
abstraction allows us to compute geographical nearness using the
Semantic Filesystems.  It seems to be a strange workaround more than a
solution, however, and ActiveNames seems to use similar adapter
pattern, this time interpreting data in-flight, rather than indexing
files, to better effect.

The Active Names protocol allows a chain of programs to be constructed
to and from a service, which programs move between servers to
transform the data according to their whim (and hopefully according to
some metrics to improve performance).  Here, given a localization, we
can specify the "printer nearest me" by a name resolution program for
printing which wanders the network via some geographic search until in
minimizes (or gets close enough) the distance metric.  Caching would
indeed help here, so that we would not have to wander more than once.
This would work well for printers, but not particularly well for the
"printer nearest Alice", unless Alice were asleep, or otherwise
relatively immobile.  

The inaccuracy of the searches in Active Names depends greatly on the
relative mobility of the services in the network, which bodes well for
increasing the performance of HTTP, and not so well for IRC.  

From ag75@cornell.edu Tue Oct  8 12:02:13 2002
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98G2Dh03193
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 12:02:13 -0400 (EDT)
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id MAA06784
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 12:02:10 -0400 (EDT)
Date: Tue, 8 Oct 2002 12:02:10 -0400 (EDT)
From: ag75@cornell.edu
X-Sender: ag75@travelers.mail.cornell.edu
To: egs@CS.Cornell.EDU
Subject: 615 PAPER 31
Message-ID: <Pine.SOL.3.91.1021008120146.21386B-100000@travelers.mail.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

In this paper we are presented with a Semantic File System. A semantic 
file system is an 

information storage system that provides associative access to the 
system's contents by 

automatically extracting attributes from files with file type specific 
transducers. So 

files can be located based upon transducer generated attributes such as 
type, title, 

author, etc. A transducer is a filter that takes the contents of the file 
as input and 

outputs the attributes. Of course, one has to write transducers for every 
type of file that 

one wants to have interpreted. Once th files are interpreted, queries 
based on attributes 

are used to access the desired files. It's easy to see how this system 
can be extended for 

our purposes, we can treat printers, cameras, etc. as file types with 
specific attributes 

and go from there. However, SFS suffers from the same problems as INS. It 
is good for 

describing what kind of service is needed, but it can't do relative 
positions. 

Additionally, it's not clear how SFS would function in an ad hoc network 
with all the 

challenges that come from working in such environment.

From liuhz@CS.Cornell.EDU Tue Oct  8 12:10:02 2002
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98GA2h05257
	for <egs@popsrv.cs.cornell.edu>; Tue, 8 Oct 2002 12:10:02 -0400 (EDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Subject: 615 PAPER 31
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
Date: Tue, 8 Oct 2002 12:10:01 -0400
Message-ID: <706871B20764CD449DB0E8E3D81C4D4302CEE65E@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615 PAPER 31
Thread-Index: AcJu5Sggo4/02nPuQ1S9rEHM2HJ/3Q==
From: "Hongzhou Liu" <liuhz@CS.Cornell.EDU>
To: "Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g98GA2h05257

The main contribution of this paper is that it introduces a semantic file system that 
can provide associative attributed-based access to the contents of an information 
storage system and integrates this access into the existing tree structured file system 
with virtual directories.  Virtual directories also enable unmodified remote hosts to 
access the facilities of a sematic file system with existing network file system 
protocols.
  In SFS, an attribute is a field-value pair, where a field describles a property of a
file, and a value is a string or an integer that gives the value to the property. 
Virtual directories include field virtual directory and value virtual directory. A field
virtual directory is named by a field, and has one entry for each possible value of its
corresponding field. Vaule virtual directories are contained in field virtual 
directories and have one entry for each entity described by field-value pair. Accessing 
a path with virtual directories is actually querying entities which have allthe 
attibutes described by field-value pairs along the path. 
  The mapping between attibutes and entities are maintained by transducers. SFS 
checks the file system periodically. Once a file is modified, a corresponding transducer
is called to extract updated information from the modified file. Different type of files 
have different transducers. Transducers can be programmed by users to perform arbitrary 
interpretation of file and directory contents in order to produce a desired set of
field-value pairs for later retrieval. The use of fields allows transducers to describe 
many aspects of a file, and thus permits subsequent sophisticated associative access to
computed properties. Tansducers are highly flexible. They can identify entities within
files as indepedent objects for retrival.
  SFS can describe location information by field-value pair(e.x Location - room5155). 
However, if you want to find "the closest printer", first you need to know where you are
now with help of some localization system. And you should know the distance to differect
location where there are printers. In other words, SFS itself can not support requests
like "print to the closest printer".  
  SFS also caches computed results for each query at the SFS server. This cache can 
reduce the number of disk accesses greatly. thus, no wonder it can improve the 
performance of the file system. 

From yao@CS.Cornell.EDU Tue Oct  8 12:45:40 2002
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Gjdh13722
	for <egs@popsrv.cs.cornell.edu>; Tue, 8 Oct 2002 12:45:39 -0400 (EDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Subject: 615 PAPER 31
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
Date: Tue, 8 Oct 2002 12:45:39 -0400
Message-ID: <706871B20764CD449DB0E8E3D81C4D4302ED4C4A@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615 PAPER 31
Thread-Index: AcJu6iI1iWSzu6H6TJqBviAVVzi2xQ==
From: "Yong Yao" <yao@CS.Cornell.EDU>
To: "Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g98Gjdh13722

The paper presents how to develop a specific approach for information
storage to permits both effective information sharing and reductions 
in programming complexity. Another expected property of such an approach
is easy incorporation to existing file system. A 
semantic file system is a more effective storage abstraction, which
automatically extract attributes from files and organized them
in a tree structure. It provides flexible associative access to the
system's contents with specialized transducers. It performs automatic 
indexing when files or directories are created or updated.

An attribute has two components, a field describes a property of the file,
while a value is the corresponding content of the property. The field 
attribute is not unique for a file. A transducer is kind of a filter to 
get input from the contents of a file and outpute the field-value pairs.
The transducer table helps to determine the exact transducer to interpret 
a given file type. 

Queries are used to access a semantice file system. A query is composed 
from a set of attributes, where each attribute describes the desired value 
of a field. The semantic file system is query consistent. The query result
is a set of files that include the entites described. A query is executed
through use of virtual directories, which is computed on demand and has no 
difference to an ordinary directory for a client program.

Authors do not mention how to apply the approach directly in an ad-hoc
network. One possibility is to map the storage at each node as a file.
The whole network turns out to be a large distribute file system. Users 
access individual node through queries over attributes and values. A transducer
can automatically publish the content of a file into the network for futer
reference. 

Yong
 

From ashieh@CS.Cornell.EDU Tue Oct  8 12:53:05 2002
Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Gr5h15334
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 12:53:05 -0400 (EDT)
Received: from localhost (ashieh@localhost)
	by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g98Gr5027827
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 12:53:05 -0400 (EDT)
Date: Tue, 8 Oct 2002 12:53:05 -0400 (EDT)
From: Alan Shieh <ashieh@CS.Cornell.EDU>
To: <egs@CS.Cornell.EDU>
Subject: 615 PAPER 31
Message-ID: <Pine.GSO.4.33.0210081252450.27732-100000@zinger.cs.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

The semantic file system is a system that transparently filters files
stored in a file system based on domain-specific knowledge. An
transducer is associated with each file type; this transducer extracts
key-value pairs from the files, which are indexed periodically, and
may also generate "virtual files", e.g. represent individual mail
messages as files. To provide transparency with UNIX file system
semantics, queries are encoded (only conjunctions are supported) in
directory paths. Results from a query are returned as files in the
directory.

** Shortcomings

The paper does not describe a mechanism for coordinating collisions
between attributes that are defined by different inducers. The
flattened attribute system pushes much of the semantics of the
attributes into the transducer and interpreter programs. While this is
the case with schemas in most databases, other systems typically
provide ways to impose some structural relationship between the data,
thus allowing more room for generic query optimizations.

** Future work
- For the system to be usable as a service-discovery engine, a
  light-weight index update needs to be added. This doesn't need to be
  particularly fancy, as it would primarily support things such as
  bits for presence, location, and some space for transient state
  (printer queue size). These fields either have a small range, or are
  rarely changing.
- If the UNIX file abstraction is to be retained, then services should
  be expressed as sockets or devices.
- Add a mechanism for performing joins between query results.


From mp98@cornell.edu Tue Oct  8 13:24:57 2002
Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98HOvh23178
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 13:24:57 -0400 (EDT)
Received: from cornell.edu (r109493.resnet.cornell.edu [128.253.240.252])
	by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA27214
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 13:24:56 -0400 (EDT)
From: mp98@cornell.edu
Date: Tue, 8 Oct 2002 13:24:56 -0400
Mime-Version: 1.0 (Apple Message framework v546)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Subject: 615 Paper 31
To: egs@CS.Cornell.EDU
Content-Transfer-Encoding: 7bit
Message-Id: <DD8BA77C-DAE2-11D6-BD49-003065EE5F0A@cornell.edu>
X-Mailer: Apple Mail (2.546)

I sent this to the wrong email address.

Begin forwarded message:

From: mp98@cornell.edu
Date: Tue Oct 8, 2002  11:52:41 AM US/Eastern
To: egs@cornell.edu
Subject: 615 Paper 31

The semantic file system is an attempt to build on top of traditional 
file systems, using virtual directories to allow the user to do 
searches based on file attributes. The file attributes are extracted 
using 'transducers', which are basically large content parsing scripts. 
Obviously the system's usefulness is limited by the ability to write 
good transducers (it is hard, for example, to write a transducer for a 
binary file).

To get all file owned by Alice with the word 'cat' one could 'cd 
/sfs/owner:/alice/text:/cat' and be in a virtual directory containing 
all matching files. The authors of this paper implemented it on top of 
an NFS server. The response times are a bit slow (are two seconds too 
great a latency for a good interface?), but in a file system 
environment, it does provide a good way to quickly find scattered files.

This system might not be so appropriate for a wireless network. One 
could possibly imagine a transducer on a centralized server, accessible 
from wireless nodes, that determines the location of networked devices 
and then finding the close printers by doing 'ls -F 
/sfs/service:/printer/location:/`whatismylocation`'. In this case we 
have added a location attribute to the SFS. However, this is trying to 
extend a system based around files to a system for services, which 
seems a bit awkward. To find your closest neighbor you might have to do 
something like 'ls -F 
/sfs/type:/wirelessnode/location:/`whatismylocation`'. Note however, 
that the system as it stands only allows for attribute matching. I can 
find which printer is close, but not closest. To improve this, one 
would have to implement a set of attributes like 'distance:/' and 
values like '<5/' which would require the server knowing the location 
of the client or attributes like 'distance:/' coupled with a 'from:/' 
qualifier. In the end, however, the advantage of using a file system 
for such information breaks down when one is no longer interested 
primarily with files, but with services, neighbors, and so forth 
instead.

From sc329@cornell.edu Tue Oct  8 13:32:30 2002
Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98HWTh24909
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 13:32:29 -0400 (EDT)
Received: from sangeeth.cornell.edu (syr-24-58-36-135.twcny.rr.com [24.58.36.135])
	by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA03629
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 13:32:28 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008133101.031581f8@postoffice2.mail.cornell.edu>
X-Sender: sc329@postoffice2.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 13:32:31 -0400
To: egs@CS.Cornell.EDU
From: Sangeeth Chandrakumar <sc329@cornell.edu>
Subject: 615 PAPER 31
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

Submitted by - Sangeeth Chandrakumar

This paper presents semantic file system, an informational storage system 
which provides flexible access to its contents based on attributes from 
files. In the paper, the authors explores the scheme that a semantic 
storage system is much better abstraction than do the traditional tree 
structured file systems for information sharing. SFS also provides 
associative access to file servers in a distributed systems.

SFS presents the notion of "transducers", which are specific to file types 
and can extract a set of attributes that enable later retrieval of the 
files. Every file has a set of field-value pair, where the fields describe 
the property of a file. A transducer is a filter that takes as its input 
the contents of a file, and outputs the file's entries and their 
corresponding attributes. The associative access interface to a semantic 
file systems is based upon queries that describe desired attributes of 
entities. The result of a query is a set of files/or directories that 
contain the entities described. SFS is query consistent, that is if the 
updates ceases to exist to the contents of a semantic file system, it will 
eventually become consistent.

SFS supports the use of virtual directories to describe a view of file 
system contents. A virtual directory is computed on demand by SFS. Another 
feature of SFS is the compatibility of virtual directories with the 
existing file systems.

The paper talks very little about systems, which have mobile and adhoc 
nodes. So the scheme of "printing to the nearest printer" will not work in 
this case to as you are not aware of your location. In a mobile 
environment, which would require frequent updates, this scheme may not be 
effective. Also having to specify all the field-value pairs would increase 
the network overhead in discovering new services.

From pj39@CS.Cornell.EDU Tue Oct  8 14:08:44 2002
Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98I8ih03490
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 14:08:44 -0400 (EDT)
Received: from cornell-yb3go20.cornell.edu (syr-24-59-67-50.twcny.rr.com [24.59.67.50])
	by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id OAA01609
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 14:08:43 -0400 (EDT)
Message-Id: <5.1.0.14.2.20021008140616.02464a50@postoffice2.mail.cornell.edu>
X-Sender: pj39@postoffice2.mail.cornell.edu (Unverified)
X-Mailer: QUALCOMM Windows Eudora Version 5.1
Date: Tue, 08 Oct 2002 14:08:38 -0400
To: egs@CS.Cornell.EDU
From: Piyoosh Jalan <pj39@CS.Cornell.EDU>
Subject: 615 PAPER 31
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

Semantic File System

This paper describes Semantic File System an information storage system 
that provides flexible associative access to system resources. It provides 
attribute based access to system contents based on automatic extraction and 
indexing of key properties of system objects. SFS provides two type of 
interface a user interface (UI) and an application programming interface 
(API) to its associative access facilities. SFS creates virtual directories 
based on the user described desired attributes.

SFS can be integrated with existing tree structured file systems with the 
concept of virtual directories. A transducer table maintains a list of 
transducers to be used with different files, before filtering them based on 
some given attributes, and creating the virtual directories. One way of 
implementing the transducers table is by allowing the users to store 
subtree specific transducers in the subtree's parent directory and search 
the directory strucutre for an appropriate structure during lookup. 
Implementing SFS with existing protocols like Network File System (NFS) and 
Andrew File System (AFS) would be easy because of its backward 
compatibility. SFS can also function as a Distributed File System for 
remote access to files.

Associative access to interface to a SFS in based on lookup queries based 
on the desired attributes. The result of the query is a set of files and/or 
directories that contain entities described. The paper presents a fair 
amount of experiments to test their base thesis that SFS presents a more 
effective storage abstraction than do traditional tree structured file systems.

One of the major drawbacks of the paper is the distributed properties of 
SFS needs to discussed in more details for distributed access of files 
(DFS). The paper at one or two places mentions about DFS but does not go in 
sufficient details.

The queries  "nearest printer" or "nearest printer to Alice" would work in 
SFS if it also smaintains the metrics associated with the location of the 
files (everything is file in SFS. SFS is not particularly suited for mobile 
wireless networks because it is not designed keeping MANET's in mind.

From ks238@cornell.edu Tue Oct  8 14:27:32 2002
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98IRWh08394
	for <egs@cs.cornell.edu>; Tue, 8 Oct 2002 14:27:32 -0400 (EDT)
Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id OAA00184;
	Tue, 8 Oct 2002 14:27:29 -0400 (EDT)
Date: Tue, 8 Oct 2002 14:27:29 -0400 (EDT)
From: ks238@cornell.edu
Message-Id: <200210081827.OAA00184@travelers.mail.cornell.edu>
To: egs@CS.Cornell.EDU
Errors-To: ks238@cornell.edu
Reply-To: ks238@cornell.edu
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9
Sender: ks238@cornell.edu
X-Originating-IP: 128.84.99.8
Subject: 615 Paper #31

Semantic File Systems moves away from file systems 
based on traditional tree structures to one which is 
based on the attributes associated with files. The 
primary contribution of this paper comes from the idea 
of using transducers to glean certain information from 
a file and make corresponding assumptions about its 
attributes and values. This kind of information can be 
attained with associative access to the file in order 
to conduct content based searches of a file. When 
conducting a query within a semantic file system, the 
attributes in demand are searched for in the file 
system and the files matching this attribute are 
returned. Queries are conducted through the use of 
virtual directories which categorize different files 
based on their respective attributes and allow users to 
code specific transducers to search the contents of 
these files. Based on these searches of the virtual 
directories effective attribute based indexes of the 
file system can be created in order to efficiently 
search through the system. This is critically new and 
effective structure for information and resource 
sharing in file systems.

A lot of the same problems that users see in web 
searches can also be seen in implementing transducers 
that are able to attain attribute information of the 
web. The primary issue is that denoting a file with a 
given attribute after gleaning information such as 
authors and title may return erroneous attributes in 
regards to the file.

I think the two biggest issues with finding the 
printer “closest to Alice” are one, the ability to 
define location as the attribute of choice. Location 
and proximity to the point at which the query is 
initiated is difficult. Also, the integrity of the 
attributes is a critical problem with SFS. The biggest 
problem with ad hoc networks is the constant flux and 
change in the data upon which attributes are searched. 
So, should a printer have excessive load or should they 
have changed location the according data needs to be 
updated. Thus, I don’t see an SFS being quite robust.

From linga@CS.Cornell.EDU Tue Oct  8 14:43:19 2002
Received: from snoball.cs.cornell.edu (snoball.cs.cornell.edu [128.84.96.54])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98IhIh12205
	for <egs@sundial.cs.cornell.edu>; Tue, 8 Oct 2002 14:43:18 -0400 (EDT)
Received: from localhost (linga@localhost)
	by snoball.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g98IhIV07182
	for <egs@snoball.cs.cornell.edu>; Tue, 8 Oct 2002 14:43:18 -0400 (EDT)
Date: Tue, 8 Oct 2002 14:43:18 -0400 (EDT)
From: Prakash Linga <linga@CS.Cornell.EDU>
To: Emin Gun Sirer <egs@CS.Cornell.EDU>
Subject: 615 PAPER 31
Message-ID: <Pine.GSO.4.33.0210081442410.7178-100000@snoball.cs.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Semantic Filesystems
--------------------

This is a new information storage system (like a traditional file
system) which automates extracting attributes from certain files and
hence promotes file sharing (using associative access). This automated
indexing of files is "semantic" because the semantics of updated file
system objects are used to extract the properties for indexing.

A filetype specific transducer is used to interpret files stored in a
semantic file system and produce attributes (atrribute is a
field-value pair) which help in retrieval of files. Transducible table
is used to decide on which transducer to use. Granularity of
associated access is defined to be an entity. Entity could be a
directory, a file or objects in a file. Transducer functions as a
filter that processes the file and outputs the file's entries and
corresponding atrributes. New transducers can be used to extend or
modify the semantics of the file system. Associated access interface
to a semantic file system is based on queries. A query describes the
desired attributes of entities and the result of a query is a set of
files/directories that contain the entities described by the query. To
process queries in a semantic file system virtual directories are used
at each level of the directory tree. Virtual directories are
indistinguishable from normal directories. Authors have built a
semantic file system that implements NFS protocol as its external
interface. A series of results have been presented showing that
semantic file systems are a more effective storage abstraction when
compared to traditional hierarchical file systems.

Pros: Semantic file systems promote associative access based on
content of the files. This can be easily integrated into traditional
file systems using virtual directories.

Cons: This does not work well in dynamic and mobile
environments. Benefits of virtual directories for application
programmers are not very apparent and need to be investigated. Results
not complete and exhaustive.

Comments:

Coming to the effectiveness of semantic file systems for users of
mobile devices under the scenarios considered last week, it does not
seem to be work well. It is not clear how this works in a dynamic and
mobile environment. It is also not easy to express queries like
nearest printer, printer nearest to Alice etc in this framework.


From vivi@CS.Cornell.EDU Tue Oct  8 14:52:48 2002
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g98Iqmh14386
	for <egs@popsrv.cs.cornell.edu>; Tue, 8 Oct 2002 14:52:48 -0400 (EDT)
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Subject: 615paper31
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
Date: Tue, 8 Oct 2002 14:52:48 -0400
Message-ID: <47BCBC2A65D1D5478176F5615EA7976D11AF88@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615paper31
Thread-Index: AcJu+9fmNr63UPzcR2udpXGLptRHiA==
From: "Vivek Vishnumurthy" <vivi@CS.Cornell.EDU>
To: "Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id g98Iqmh14386

   Semantic File Systems provide the user with a view confoming with the semantics of the files.  The system 
builds an associative access interface on top of the existing file system. Here "transducers" extract attributes 
from the files. This allows users to look for files having certain properties, as in, tex files authored by a certain 
person, or object files importing certain library files.

   Semantic File Systems are applicable in scenarios considered in the last week. We could have devices listed 
as files in the file system, and have transducers extract the physical location of the device. Now a mobile user 
aware of his/her location, can issue semantic requests to look for required devices in the vicinity. Thus the 
"nearest printer" request can be handled. The accuracy of the results depend on the frequency of updates in the 
system. (More frequent the updates are, less inaccurate the results are).

   The SFS server uses caching to respond to virtual directory queries. This is effective when the file system is 
rarely modified, and when there are frequent  occurences of the same virtual directory queries.