Text Box: Department of Computer Science at Cornell University
Text Box: Text Box: The Salton Series is supported by Amit Singhal, Cornell PhD ‘97

 

 

 

Text Box: Many say that "The Cloud" will be the next game-changing computing platform, and the race is on to define and capture that domain.  Historically, new platforms take off when independent developers start to make innovative use of platform-specific features. In the case of Cloud Computing, that means exploiting distributed systems in a datacenter.   But there is as yet no widely-used programming model that lets a developer easily coordinate the distributed power of a datacenter, which severely limits the population of developers who can exercise these platforms creatively.
	
Over the last five years, in collaboration with researchers at Intel, Yahoo! and elsewhere, our group at Berkeley has been exploring the use of data-centric programming for distributed systems like overlay networks and wireless sensornets.  Most recently, we have been pursuing datacenter-style computing.  As a concrete exercise on that front, we developed BOOM: an API-compliant reimplementation of Hadoop and HDFS written in the Overlog declarative language.  Developed in a relatively short nine-month design cycle, our Overlog interpreter and Hadoop implementation perform as well as the standard Java-only implementation, with a compact and easily-extendible codebase.  Within that timeframe we extended BOOM with new features not yet available in Hadoop, including Paxos-driven high availability, parallel scale-out of master nodes, and intrinsic monitoring and debugging facilities. 
	
This talk will overview our ideas for data-centric programming in datacenters, reflect on our experience building and extending BOOM in Overlog, and talk about some of our future plans for language design and cloud system development. 
Text Box:                The      
GERARD SALTON
                Lecture Series           
Text Box: Thursday
November 19, 2009
Text Box: 4:15 pm
B17 Upson Hall
Reception - 4th Floor Atrium at 3:45pm

Joseph M. Hellerstein

Professor, EECS Computer Science Division

UC BERKELEY

Joseph M. Hellerstein is a Professor of Computer Science at the University of California, Berkeley, whose research focuses on data management and distributed systems. His work has been recognized via awards including an Alfred P. Sloan Research Fellowship, MIT Technology Review's inaugural TR100 list, and two ACM-SIGMOD "Test of Time" awards. Key ideas from his research have been incorporated into commercial and open-source database software released by IBM, Oracle, and PostgreSQL. He has also held industrial posts including Director of Intel Research Berkeley and Chief Scientist of Cohera Corporation, and currently serves as an advisor to a number of technology companies.

Text Box: The Cloud Goes BOOM: 
Data-Centric Programming for Datacenters

Gerard Salton (1927- 1995) A towering figure in the field of information retrieval, Gerard Salton synthesized ideas from mathematics, statistics, and natural language processing to create a scientific basis for extracting semantics from word frequency. The impact of his contributions is profound - five textbooks, over 150 research papers, and dozens of Ph.D. students. The modern computer science and information science research scene, with its terabyte databases, Web, and related technologies, owes a great deal to Gerry's pioneering efforts.

This lecture series honors our former colleague with speakers who similarly are innovators in their fields.