Marcos Vaz Salles

Postdoctoral Associate, Department of Computer Science
4105A Upson Hall
Cornell University
Ithaca, NY 14853
vmarcos [at] cs [dot] cornell [dot] edu
Voice: 607-341-7519

Check the Big Red Data Blog out.

marcos_vaz_salles


My research targets building scalable data-driven systems. I am particularly fond of bringing data management techniques to new domains, such as behavioral simulations, computer games, and personal information management. In terms of approach, I am a systems-oriented, experimental computer scientist. In other words, I love to build novel systems and use experiments to validate their properties.

From August 2011, I will join as an assistant professor the Department of Computer Science at the University of Copenhagen (DIKU).


 

Projects

 

ˇ        Computer Games and Behavioral Simulations: In this project at the Cornell Database Group, we are developing a new scripting platform for games and agent-based simulations. My recent work in this project has been around efficient checkpoint-recovery techniques for Massively Multiplayer Online Games (MMOs) and automatic parallelization techniques for large-scale behavioral simulations. In the first line of work, we have performed an experimental evaluation of several main-memory checkpoint-recovery techniques with an eye for the latency spikes that they introduce in game play. For various update rates, there is no universal winning method and we are investigating new techniques to address this problem. In the second line of work, we are building a new scripting engine for behavioral simulations called BRACE, the Big Red Agent-based Computation Engine. BRACE combines ease of programming through a simple scripting language with scalability through database techniques such as data parallelism and indexing.

ˇ        Dataspaces and Personal Information Management: During my PhD at the ETH Zurich Systems Group, I have worked on the iMeMex Dataspace Management System, a hybrid information integration architecture that allows users to transition from search to data integration in a pay-as-you-go fashion. Unlike traditional relational DBMS, iMeMex does not take full control of the data, but offers services over one's complex personal dataspace. We have explored several interesting themes in the design of iMeMex, such as the definition of a unified data model for personal information, a novel technique based on mapping hints (called trails) to increase the level of integration of personal information over time, and the search over graphs of user data created by view definitions.

ˇ        Indexing: I have also looked at more traditional problems in data management, in particular the study of index structures for either read-intensive or write-intensive workloads. For the first class of workloads, I have studied experimentally, together with collaborators from Saarland University and ETH Zurich, the performance of one specific index structure, the Dwarf index. For the second class of workloads, I have studied how to answer queries over collections of moving objects, e.g., for vehicle tracking or spatial agent-based simulations. The problem is challenging because these applications have very high update rates that result from continuous movement. Our technique, MOVIES, is based on frequently rebuilding index snapshots in main memory. Using data partitioning over multiple nodes in a small cluster, we have scaled MOVIES up to 100 million moving objects over the road network of Germany, while keeping snapshot latencies below a few seconds.

 

 

Selected Publications

 

ˇ         Tuan Cao, Marcos Vaz Salles, Benjamin Sowell, Yao Yue, Alan Demers, Johannes Gehrke, Walker White.

Fast Checkpoint Recovery Algorithms for Frequently Consistent Applications.

SIGMOD 2011, Athens, Greece.

At the conference, we will also present the following demo on our recovery library.

ˇ         Tuan Cao, Benjamin Sowell, Marcos Vaz Salles, Alan Demers, Johannes Gehrke.

BRRL: A Recovery Library for Main-Memory Applications in the Cloud (Demo Paper).

SIGMOD 2011, Athens, Greece.

ˇ         Jens Dittrich, Lukas Blunschi, Marcos Vaz Salles.

MOVIES: Indexing Moving Objects by Shooting Index Images.

GeoInformatica 2011, to appear. This paper is an extended version of the SSTD 2009 conference paper.

ˇ         Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White.

Behavioral Simulations in MapReduce.

VLDB 2010, Singapore.

ˇ         Marcos Antonio Vaz Salles, Jens Dittrich, Lukas Blunschi.

Intensional Associations in Dataspaces. [Full Version].

ICDE 2010, Long Beach, USA.

ˇ         Marcos Vaz Salles, Tuan Cao, Benjamin Sowell, Alan Demers, Johannes Gehrke, Christoph Koch, Walker White.

An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games.

VLDB 2009, Lyon, France.

ˇ         Jens Dittrich, Marcos Antonio Vaz Salles, Lukas Blunschi.

iMeMex: From Search to Information Integration and Back.

IEEE Data Engineering Bulleting 2009, Vol. 32 No. 2 (invited paper).

ˇ         Jens Dittrich, Lukas Blunschi, Marcos Antonio Vaz Salles.

Indexing Moving Objects using Short-Lived Throwaway Indexes.

SSTD 2009, Aalborg, Denmark.

ˇ         Jens Dittrich, Lukas Blunschi, Marcos Antonio Vaz Salles.

Dwarfs in the Rearview Mirror: How Big are they really?

VLDB 2008, Auckland, New Zealand.

ˇ         Marcos Antonio Vaz Salles, Jens-Peter Dittrich, Shant Kirakos Karakashian, Olivier René Girard, Lukas Blunschi.

iTrails: Pay-as-you-go Information Integration in Dataspaces. [Slides][Video]

VLDB 2007, Vienna, Austria.

ˇ         Lukas Blunschi, Jens-Peter Dittrich, Olivier René Girard, Shant Kirakos Karakashian, Marcos Antonio Vaz Salles.

A Dataspace Odyssey: The iMeMex Personal Dataspace Management System (Demo Paper).

CIDR 2007, Asilomar, USA.

ˇ         Jens-Peter Dittrich, Marcos Antonio Vaz Salles.

iDM: A Unified and Versatile Data Model for Personal Dataspace Management.

VLDB 2006, Seoul, South Korea.

ˇ         Jens-Peter Dittrich, Marcos Antonio Vaz Salles, Donald Kossmann, Lukas Blunschi.

iMeMex: Escapes from the Personal Information Jungle (Demo Paper). [Poster]

VLDB 2005, Trondheim, Norway.

 

 

Teaching and Mentoring

 

ˇ         Last Fall, I taught Introduction to Database Systems (CS4320/1) at Cornell University. Previously, I have been a teaching assistant for the database implementation and data warehousing courses at ETH Zurich. I have also taught extension courses in database tuning while back home in Brazil. 

ˇ         I am helping Johannes Gehrke advise a group of talented PhD students at Cornell working on data management for games and simulations. While at ETH Zurich, I have co-advised seven master's thesis and 10 semester projects.

ˇ         An acknowledgement is here due to my mentors: Johannes Gehrke at Cornell University (postdoc), Jens Dittrich (now at Saarland University) and Donald Kossmann at ETH Zurich (PhD), Sérgio Lifschitz at PUC-Rio (MSc), and Claudia Bauzer Medeiros at UNICAMP (BSc).

 

 

Additional Information

 

[DBLP | CV available upon request]