Cornell Computer Science

CS 779 - Seminar of Web Searching and Mining

Fall 2005

This seminar will provide a venue for interaction among different people at Cornell working on Web search and mining and facilitate the exchange of programs, algorithms, and methods developed in different projects.  It will focus on special topics and discuss selected papers from recent Web-related conferences such as the World Wide Web Conference, Hypertext Conference, and the Semantic Web Conference, as well as results from ongoing web-related research projects at Cornell.  Hopefully the seminar will encourage new research at Cornell in this exciting area.

Schedule: Wednesday 3-4

Location: 5126 Upson (NOTE: On September 14 we will meet in 5130 Upson)

Format: Throughout the semester we'll choose a number of special topics depending on the interest of the group.  We'll devote each meeting to a discussion of one or two papers relevant to the topic. Participants will share responsibilities for presenting papers and leading discussion about them.

Credits: 3

Grading: S/U

Organizers: Pavel Dmitriev and Carl Lagoze

Special Topic I : The Semantic Web: In his 1998 Semantic Web Roadmap, Tim Berners-Lee (inventor of the Web) introduced the semantic Web as follows:

"The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form."

We'll spend some time examining whether the semantic web is actually necessary, useful, practical, sound, etc.  There is lots of work on the semantic web in a variety of areas including databases, logics, languages, etc so there are lots of papers available.

Date Readings Comments
09/07
  • Heflin, J.D. Towards the Semantic Web: Knowledge Representation in a Dynamic, Distributed Environment Department of Computer Science, University of Maryland, College Park, MD, 2001. (Chapters 1,2) http://www.cse.lehigh.edu/~heflin/pubs/heflin-thesis-orig.pdf
  • Hendler, J. Agents and the Semantic Web, IEEE Intelligent Systems, March/April 2001 (find using Google Scholar)
There are a lot of awful introductions to the semantic web.  These two publications have a reasonable amount of integrity and avoid a good bit of the hype.  I (Carl) will give a semi-lecture on what the semantic web is and what are some of the research areas within it. 
09/14 This is a nice paper detailing the formal foundations and design decisions underlying OWL, the ontology language for the semantic web.  Reading and discussing it should help us understand distinctions between the semantic web and other knowledge representation research.
09/21
  • Reeve, Lawrenace and Han, Hyoil, Survey of Semantic Annotation Platforms, SAC'05, Santa Fe, http://www.pages.drexel.edu/~lhr24/pubs/2005SAC-WTA-548.pdf
  • Dill, Stephen, Eiron, Nadav, et. al., SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation, WWW2003, Budapest, http://www.almaden.ibm.com/WebFountain/resources/semtag.pdf.
Both of these papers deal with general theme of populating the semantic web with minimal human annotation.  The first is a recent survey paper.  The second presents research work in automated semantic annotation. 
09/28
  • Gruber, T.R. Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal Human-Computer Studies, 43. 907-928 (available from lots of web sources)
Ontologies play a major role in the semantic web.  This is a pre-SM paper that provides an excellent foundation on ontology design and why it is important.
10/05
  • Broekstra, J. Storage, Querying and Inferencing for Semantic Web Languages, Ph.D. Thesis, Vrije Universiteit, 2005 (available on Google Scholar), Chapter 3, Query Languages for the Semantic Web
This chapter in Broekstra's thesis has a nice classification of requirements for storing and querying RDF-based graphs.
10/12
  • Martin, D., Paolucci, M., McIlraith, S., Burstein, M., McDermott, D., McGuinness, D., Barsia, B., Payne, T., Sabout, M., Solanki, M., Srinivasan, N. and Sycara, K., Bringing Semantics to Web Services: The OWL-S Approach. in Semantic Web Services and Web Process Composition, First International Workshop, SWSWPC 2004, (San Diego, 2004) (available on Google Scholar)
There is considerable interest in web services in commercial, scientific, and other areas.  Semantic web technology provides some promise of automatic matchmaking of services to needs.  This paper describes OWL-S, one of the more promising activities in this area.
10/19
  • Aya, S., Lagoze, C. and Joachims, T., Citation Classification and its Applications. in International Conference on Knowledge Management, (Charlotte, 2005). (distributed by email)
Selcuk will give a practice talk in preparation for his conference talk at ICKM 2005.
10/26
  • Golbeck, J. and Hendler, J., Accuracy of Metrics for Inferring Trust and Reputation. in 14th International Conference on Knowledge Engineering and Knowledge Management, (Northamptonshire, UK, 2004). (available on Google Scholar)
Recent research by one of the leading researchers in trust, reputation, and the semantic web (and social networks in general).  This is a shorter version of a journal article at http://trust.mindswap.org/papers/toit.pdf
11/02
  • Kamvar, S.D., Schlosser, M.T. and Garcia-Molina, H., The EigenTrust Algorithm for Reputation Management in P2P Networks. in Twelfth International World Wide Web Conference (WWW), (2003).
We'll stay in the realm of trust.  This is well-known paper that extents PageRank techniques into the area of global trust.
11/09
  • Drost, I. and Scheffer, T., Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam. in 16th European Conference on Machine Learning, (Porto, 2005).
Any discussion about trust needs to include the issue of intentional deception.  One area where this is prevalent is "link spamming", or trying to influence the ranking of search engines.  This is a recent paper in this area.
11/16
  • Dmitriev, P. and Lagoze, C., Automatically Constructing Descriptive Site Maps. in Eighth Asia Pacific Web Conference, (Harbin, China, 2006).
Pavel will present a paper that he will present at a conference in January.
11/23 No Seminar - Thanksgiving Break  

 

 Last Update: Carl Lagoze 11/07/05