|
Philip Wadler (Edinburgh)
Databases and Programming Languages: Together again for the first time
Abstract: A venerable line of research aims to provide a general-purpose
programming language with a well-defined subset that compiles into
efficient queries, perhaps by translation into SQL or some other
suitable query language. This talk discusses some older and more
recent advances in this direction, including the languages Kleisli,
LINQ, Ferry, and Links. We stress the point, not widely appreciated,
that a source language with higher order functions can support
dynamically generated queries in a natural way, even if the target
query language is first order. Joint work with Sam Lindley.
Bio: Philip Wadler is Professor of Theoretical Computer Science at the
University of Edinburgh and Director of the Laboratory for Foundations
of Computer Science. He is an ACM Fellow and a Fellow of the Royal Society
of Edinburgh, past holder of a Royal Society-Wolfson Research Merit
Fellowship, and currently serves as Chair of ACM SIGPLAN. Previously, he
worked or studied at Avaya Labs, Bell Labs, Glasgow, Chalmers, Oxford, CMU,
Xerox Parc, and Stanford, and visited as a guest professor in Paris, Sydney,
and Copenhagen. He appears at position 201 on Citeseer's list of most-cited
authors in computer science and is a winner of the POPL Most Influential
Paper Award. He contributed to the designs of Haskell, Java, and XQuery, and
is a co-author of XQuery from the Experts (Addison Wesley, 2004) and Generics
and Collections in Java (O'Reilly, 2006). He has delivered invited talks in
locations ranging from Aizu to Zurich.
|
|
Christopher Olston (Bionica Human Computing)
Programming and Debugging Large-Scale Data Processing Workflows
Abstract: This talk gives an overview of the work on large-scale data processing I did recently at Yahoo! Research, with many collaborators. The talk begins with overviews of two data processing systems I helped develop: PIG, a dataflow programming environment and Hadoop-based runtime, and NOVA, a workflow manager for Pig/Hadoop. The bulk of the talk focuses on debugging, and looks at what can be done before, during and after execution of a data processing operation:
- Pig's automatic EXAMPLE DATA GENERATOR is used before running a Pig job to
get a feel for what it will do, enabling certain kinds of mistakes to
be caught early and cheaply. The algorithm behind the example
generator performs a combination of sampling and synthesis to balance
several key factors---realism, conciseness and completeness---of the
example data it produces.
- INSPECTOR GADGET is a framework for creating custom tools that
monitor Pig job execution. We implemented a dozen user-requested
tools, ranging from data integrity checks to crash cause investigation
to performance profiling, each in just a few hundreds of lines of
code.
- IBIS is a system that collects metadata about what happened during
data processing, for post-hoc analysis. The metadata is collected from
multiple sub-systems (e.g. Nova, Pig, Hadoop) that deal with data and
processing elements at different granularities (e.g. tables
vs. records; relational operators vs. reduce task attempts) and offer
disparate ways of querying it. IBIS integrates this metadata and
presents a uniform and powerful query interface to users.
Bio: Christopher Olston is a web data researcher/entrepreneur, and
co-founder of Bionica Human Computing. His past affiliations include
Yahoo! Research (principal research scientist) and Carnegie Mellon
(assistant professor). He holds computer science degrees from Stanford
(2003 Ph.D., M.S.; funded by NSF and SGF fellowships) and UC Berkeley
(B.S. with highest honors). While at Yahoo, Olston won the 2009 SIGMOD
Best Paper Award and co-created Apache Pig, which is used for data
processing inside LinkedIn, Twitter, Yahoo and others, comes in
Cloudera's standard Hadoop bundle, and is offered by Amazon as a cloud service.
|