DBPL 2011
The 13th International Symposium
on Database Programming Languages

August 29th, 2011
Seattle, Washington, USA
Co-located with VLDB 2011

Invited Speakers

Philip Wadler

Philip Wadler (Edinburgh)

Databases and Programming Languages:
Together again for the first time


Abstract: A venerable line of research aims to provide a general-purpose programming language with a well-defined subset that compiles into efficient queries, perhaps by translation into SQL or some other suitable query language. This talk discusses some older and more recent advances in this direction, including the languages Kleisli, LINQ, Ferry, and Links. We stress the point, not widely appreciated, that a source language with higher order functions can support dynamically generated queries in a natural way, even if the target query language is first order. Joint work with Sam Lindley.

Bio: Philip Wadler is Professor of Theoretical Computer Science at the University of Edinburgh and Director of the Laboratory for Foundations of Computer Science. He is an ACM Fellow and a Fellow of the Royal Society of Edinburgh, past holder of a Royal Society-Wolfson Research Merit Fellowship, and currently serves as Chair of ACM SIGPLAN. Previously, he worked or studied at Avaya Labs, Bell Labs, Glasgow, Chalmers, Oxford, CMU, Xerox Parc, and Stanford, and visited as a guest professor in Paris, Sydney, and Copenhagen. He appears at position 201 on Citeseer's list of most-cited authors in computer science and is a winner of the POPL Most Influential Paper Award. He contributed to the designs of Haskell, Java, and XQuery, and is a co-author of XQuery from the Experts (Addison Wesley, 2004) and Generics and Collections in Java (O'Reilly, 2006). He has delivered invited talks in locations ranging from Aizu to Zurich.

Christopher Olston

Christopher Olston (Bionica Human Computing)

Programming and Debugging Large-Scale Data Processing Workflows

Abstract: This talk gives an overview of the work on large-scale data processing I did recently at Yahoo! Research, with many collaborators. The talk begins with overviews of two data processing systems I helped develop: PIG, a dataflow programming environment and Hadoop-based runtime, and NOVA, a workflow manager for Pig/Hadoop. The bulk of the talk focuses on debugging, and looks at what can be done before, during and after execution of a data processing operation:

  • Pig's automatic EXAMPLE DATA GENERATOR is used before running a Pig job to get a feel for what it will do, enabling certain kinds of mistakes to be caught early and cheaply. The algorithm behind the example generator performs a combination of sampling and synthesis to balance several key factors---realism, conciseness and completeness---of the example data it produces.
  • INSPECTOR GADGET is a framework for creating custom tools that monitor Pig job execution. We implemented a dozen user-requested tools, ranging from data integrity checks to crash cause investigation to performance profiling, each in just a few hundreds of lines of code.
  • IBIS is a system that collects metadata about what happened during data processing, for post-hoc analysis. The metadata is collected from multiple sub-systems (e.g. Nova, Pig, Hadoop) that deal with data and processing elements at different granularities (e.g. tables vs. records; relational operators vs. reduce task attempts) and offer disparate ways of querying it. IBIS integrates this metadata and presents a uniform and powerful query interface to users.

Bio: Christopher Olston is a web data researcher/entrepreneur, and co-founder of Bionica Human Computing. His past affiliations include Yahoo! Research (principal research scientist) and Carnegie Mellon (assistant professor). He holds computer science degrees from Stanford (2003 Ph.D., M.S.; funded by NSF and SGF fellowships) and UC Berkeley (B.S. with highest honors). While at Yahoo, Olston won the 2009 SIGMOD Best Paper Award and co-created Apache Pig, which is used for data processing inside LinkedIn, Twitter, Yahoo and others, comes in Cloudera's standard Hadoop bundle, and is offered by Amazon as a cloud service.

Image credit thebartonsnet.