The database colloquium is the weekly meeting of students and faculty interested in data management, data mining, or related topics at Cornell. The colloquium is typically a paper presentation of seminal or recent papers of general interest. While many of the speakers are from the Cornell community, the colloquium also invites outside speakers to talk about their research. The colloquium is held every Monday in from 12:15-1 pm in 5130 Upson Hall.
On those days in which the database colloquium does not have an outside speaker, the colloquium is replaced by a more informal database lunch. This is a short lunch lunch starting at noon followed by an informal paper discussion on a recent topic of interest.
Higher-order language-integrated query
We present CoreLinks, a call-by-value variant of System F with row polymorphism, row-based effect types, and implicit subkinding, which forms the basis for the Links web programming language. We focus on extensions to CoreLinks for database programming. The effect types support abstraction over database queries, while ensuring that queries are translated predictably to idiomatic and efficient SQL at run-time. Subkinding statically enforces the constraint that queries must return a list of records of base type. Polymorphism over the presence of record labels supports abstraction over database queries, inserts, deletes and updates.
|Sam Lindley||5130 Upson|
SVM Indexing and Processing for Data Retrieval
In this talk, I will first present a recent work on SVM indexing and processing which was published at SIMGOD 2011, and I will introduce several ongoing research projects of our data mining group at POSTECH, such as Novel recommendation for digital TV, Protecting location privacy for mobile devices, and Online advertizing for sponsored search. These projects are supported by Samsung Electronics and Microsoft Research Asia.
In applications such as relevance feedback search system, a query is a ranking function F learned by a machine learning methodology such as SVM, and the query result is a set of items ranked the highest according to F. Processing the query F to find top-k items requires evaluating the entire data by F. We developed an indexing method for query of an SVM ranking function, which enables quickly finding top-k items without evaluating the entire data. Our indexing method, iKernel, produces overall 1~5% of evaluation ratio on large data sets. iKernel is currently the only indexing solution which finds exact top-k items for SVM functions. This work passed the repeatability test of SIGMOD 2011.
Speaker bio: Hwanjo Yu received his PhD in Computer Science at the University of Illinois at Urbana-Champaign at June 2004 under the supervision of Prof. Jiawei Han. From July 2004 to January 2008, he was an assistant professor at the University of Iowa. After that, he joined POSTECH (Pohang University of Science and Technology), South Korea. He is now an associate professor and running the data mining lab at POSTECH.
|Hwanjo Yu||5130 Upson|
Optimizing Top-k Query Processing in Web Search Engines
Large web search engines have to answer thousands of queries per second over tens of billions of documents, while satisfying interactive response times. Because of this tremendous workload, search engines have to spend significant hardware and energy resources to process user queries, and various techniques such as caching, index compression, index pruning, and early termination are used to decrease these costs.
In this talk, I will first give a brief overview of query processing in large search engines. I will then describe a set of new algorithms for top-k query processing based on so-called block-max indexes, and show that they obtain significant speedups over other methods for common classes of simple ranking functions. Finally, I will discuss open problems in this area, including some interesting questions arising from our new algorithms.
[This talk contains joint work Shuai Ding, Costas Dimopoulos, and Sergey Nepomnyachiy]
Bio: Torsten Suel is an Associate Professor in the Department of Computer Science and Engineering at the Polytechnic Institute of New York University, located in Brooklyn, NY. He received a Diplom degree from the Technical University of Braunschweig (Germany), and a Ph.D. from the University of Texas at Austin. After postdoctoral research at the NEC Research Institute, UC Berkeley, and Bell Labs, he joined NYU Poly in the Fall of 1998. From January to December of 2008 he was also a Principal Research Scientist at Yahoo! Research in Santa Clara, CA. His research interests are in the areas of web search engines and web mining, algorithms, databases, and distributed systems.
|Torsten Suel||5130 Upson|