Database Colloquium

The database colloquium is the weekly meeting of students and faculty interested in data management, data mining, or related topics at Cornell. The colloquium is typically a paper presentation of seminal or recent papers of general interest. While many of the speakers are from the Cornell community, the colloquium also invites outside speakers to talk about their research. The colloquium is held every Monday in from 12:15-1 pm in 5130 Upson Hall.

On those days in which the database colloquium does not have an outside speaker, the colloquium is replaced by a more informal database lunch. This is a short lunch lunch starting at noon followed by an informal paper discussion on a recent topic of interest.





February 25

Laasie: Collaborating through changes

Abstract:Over the past several years, there has been a trend towards fully-featured web applications, such as Office 365, Google Docs, Dropbox, MyPlex, and so forth. The state of these applications is often persisted in the cloud, typically in real time as users make changes. This in turn enables "collaborative applications", where application state is not only replicated from the client to the server (for persistence) but also from the server to other clients who are simultaneously operating on the same state.

Unfortunately, even providing just replication for persistence is a nontrivial task. In this talk, I introduce Laasie, a data management system designed to support collaborative web applications. At the heart of Laasie is BarQL, a simple functional state manipulation language that is used to express both queries and updates to replicated state. Updates are maintained in-situ at the server in an "intent log", which allows clients to recover from transient disconnections by re-playing applicable state updates. Laasie employs an algebra of composition to keep the intent log simple and small by performing small (efficient) local rewrites over it rather than (potentially expensive) evaluation of updates as they arrive.

Speaker Bio: Oliver Kennedy is an Assistant Professor in the Department of Computer Science and Engineering at the University at Buffalo, where he just started, after a PhD at Cornell and a postdoc at EPFL, Switzerland. Oliver's research spans the gamut of database technologies and techniques, and includes incremental computation, probabilistic databases and uncertainty, distributed systems, and query optimization. He has worked on the MayBMS probabilistic database system, and the DBToaster agile view management system. Recently, he has also been exploring data management challenges behind collaborative (web) applications with his work on the Laasie state replication system.

Oliver Kennedy Upson 5130
April 1st

Adventures in Enterprise Software Startups, and Lessons Learned

Abstract:This talk covers two intertwined topics: an (objective) overview on the mechanics that make an enterprise software company tick, and a (subjective) account on the speaker's experiences and lessons learned over the past 5 years of working on 2 database startups: Vertica (a columnar analytic database company acquired by HP in 2011 for $350M), and Hadapt (a 2-year-old startup riding the strong momentum of Hadoop).

Speaker Bio: Mingsheng is Hadapt’s Chief Data Scientist and is responsible for driving Hadapt’s product roadmap and incubating advanced analytic use cases. Prior to Hadapt, Mingsheng was an architect and engineer at Vertica, an HP Company, where he was instrumental in the development of Vertica’s Analytics Platform (specifically, auto-tuning, and time series and pattern matching in-database analytics).

Mingsheng earned a Ph.D. in Computer Science from Cornell University, where he built Cayuga, the world’s first expressive and scalable Complex Event Processing (CEP) engine. He also co-founded the Microsoft CEDR event processing project, which became the Microsoft StreamInsight technology shipped with SQL Server 2008 and 2012. Mingsheng is a frequent speaker on Big Data, regularly participating in panels and delivering keynote addresses at industry and academic events such as Hadoop World, TDWI, the Cube, and Harvard Business School. He also serves as the President of NECINA, a non-profit, non-political organization focused on promoting cutting-edge technologies, entrepreneurship, and leadership.

Mingsheng Hong Upson 5130
April 29th

Big Learning Systems

Abstract:A new wave of systems is emerging in the space of Big Data Analytics that open the door to programming models beyond Hadoop MapReduce (HMR). It is well understood that HMR is not ideal for applications in the domain of machine learning and graph processing. This realization is fueling a number of new (Big Data) system efforts: Berkeley Spark, Google Pregel, GraphLab (CMU), and Hyracks (UC Irvine), to name a few. Each of these add unique capabilities, but form islands around key functionalities: fault-tolerance, resource allocation, and data caching.

In this talk, I will provide an overview of Big Data Systems starting with Google's MapReduce, which defined the foundational architecture for processing large data sets. I will then identify a key limitation in this architecture; namely, its inability to efficiently support iterative workflows. I will then describe real-world examples of systems that aim to fill this computational void and argue that all these designs are flawed in some regard. I will conclude with a description of my own work on building a Big Data Application Server that unifies the key runtime functionalities (fault-tolerance, resource allocation, data caching, and more) for workflows (both iterative and acyclic) that process large data sets.

Tyson Condie Upson 5130

Prior semesters: