Postdoctoral
Associate, Department of Computer
Science Check the Big Red Data Blog out. |
|
My
research targets building scalable data-driven systems. I am particularly fond
of bringing data management techniques to new domains, such as behavioral
simulations, computer games, and personal information management. In terms of
approach, I am a systems-oriented, experimental computer scientist. In other
words, I love to build novel systems and use experiments to validate their
properties.
From
August 2011, I will join as an assistant professor the Department of Computer
Science at the University of Copenhagen (DIKU).
Projects
·
Computer
Games and Behavioral Simulations: In this project at the Cornell Database Group, we are
developing a new scripting platform for games and agent-based
simulations. My recent work in this project has been around efficient checkpoint-recovery
techniques for Massively Multiplayer Online Games (MMOs) and automatic
parallelization techniques for large-scale behavioral simulations. In the
first line of work, we have performed an experimental evaluation of several
main-memory checkpoint-recovery techniques with an eye for the latency spikes
that they introduce in game play. For various update rates, there is no
universal winning method and we are investigating new techniques to address
this problem. In the second line of work, we are building a new scripting
engine for behavioral simulations called BRACE, the Big Red Agent-based
Computation Engine. BRACE combines ease of programming through a simple
scripting language with scalability through database techniques such as data
parallelism and indexing.
·
Dataspaces
and Personal Information Management: During my PhD at the ETH Zurich Systems Group, I have worked
on the iMeMex Dataspace Management System,
a hybrid information integration architecture that allows users to transition
from search to data integration in a pay-as-you-go fashion. Unlike traditional
relational DBMS, iMeMex does not take full control of the data, but offers
services over one's complex personal dataspace. We have explored several
interesting themes in the design of iMeMex, such as the definition of a unified data model for personal information,
a novel technique based on mapping hints
(called trails) to increase the level of integration of personal information
over time, and the search over graphs
of user data created by view definitions.
·
Indexing: I
have also looked at more traditional problems in data management, in particular
the study of index structures for either read-intensive or write-intensive
workloads. For the first class of workloads, I have studied experimentally, together with
collaborators from Saarland University and ETH Zurich, the performance of one
specific index structure, the Dwarf index. For the second class of workloads, I
have studied how to answer queries over collections of moving objects, e.g.,
for vehicle tracking or spatial agent-based simulations. The problem is
challenging because these applications have very high update rates that result
from continuous movement. Our technique, MOVIES,
is based on frequently rebuilding index snapshots in main memory. Using data
partitioning over multiple nodes in a small cluster, we have scaled MOVIES up
to 100 million moving objects over the road network of Germany, while keeping
snapshot latencies below a few seconds.
Selected Publications
·
Tuan Cao, Marcos Vaz Salles,
Benjamin Sowell, Yao Yue, Alan Demers, Johannes Gehrke, Walker White.
Fast Checkpoint Recovery Algorithms for
Frequently Consistent Applications.
SIGMOD 2011, Athens, Greece.
At the conference, we will also
present the following demo on our recovery library.
·
Tuan Cao, Benjamin Sowell, Marcos
Vaz Salles, Alan Demers, Johannes Gehrke.
BRRL: A Recovery Library for Main-Memory
Applications in the Cloud (Demo Paper).
SIGMOD 2011, Athens, Greece.
·
Jens Dittrich, Lukas Blunschi,
Marcos Vaz Salles.
MOVIES:
Indexing Moving Objects by Shooting Index Images.
GeoInformatica 2011, to appear. This
paper is an extended version of the SSTD 2009 conference
paper.
·
Guozhang Wang, Marcos Vaz Salles,
Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker
White.
Behavioral Simulations in MapReduce.
VLDB 2010, Singapore.
·
Marcos Antonio Vaz Salles, Jens
Dittrich, Lukas Blunschi.
Intensional Associations in Dataspaces.
[Full Version].
ICDE 2010, Long Beach, USA.
·
Marcos Vaz Salles, Tuan Cao,
Benjamin Sowell, Alan Demers, Johannes Gehrke, Christoph Koch, Walker White.
An Evaluation of Checkpoint Recovery for
Massively Multiplayer Online Games.
VLDB 2009, Lyon, France.
·
Jens Dittrich, Marcos Antonio Vaz Salles,
Lukas Blunschi.
iMeMex: From Search to Information Integration
and Back.
IEEE Data Engineering Bulleting
2009, Vol. 32 No. 2 (invited paper).
·
Jens Dittrich, Lukas Blunschi, Marcos Antonio
Vaz Salles.
Indexing Moving Objects using Short-Lived
Throwaway Indexes.
SSTD 2009, Aalborg, Denmark.
·
Jens Dittrich, Lukas Blunschi, Marcos Antonio
Vaz Salles.
Dwarfs in the Rearview Mirror: How Big are they
really?
VLDB 2008, Auckland, New Zealand.
·
Marcos Antonio Vaz Salles, Jens-Peter
Dittrich, Shant Kirakos Karakashian, Olivier René Girard, Lukas Blunschi.
iTrails: Pay-as-you-go Information Integration
in Dataspaces. [Slides][Video]
VLDB 2007, Vienna, Austria.
·
Lukas Blunschi, Jens-Peter Dittrich, Olivier
René Girard, Shant Kirakos Karakashian, Marcos Antonio Vaz Salles.
A Dataspace Odyssey: The iMeMex Personal
Dataspace Management System (Demo Paper).
CIDR 2007, Asilomar, USA.
·
Jens-Peter Dittrich, Marcos Antonio Vaz
Salles.
iDM: A Unified and Versatile Data Model for
Personal Dataspace Management.
VLDB 2006, Seoul, South Korea.
·
Jens-Peter Dittrich, Marcos Antonio Vaz
Salles, Donald Kossmann, Lukas Blunschi.
iMeMex: Escapes from the Personal Information
Jungle (Demo Paper). [Poster]
VLDB 2005, Trondheim, Norway.
Teaching and Mentoring
·
Last Fall, I taught Introduction to Database
Systems (CS4320/1) at Cornell University. Previously, I have been a teaching
assistant for the database implementation and data warehousing courses at ETH
Zurich. I have also taught extension courses in database tuning while back home
in Brazil.
·
I am helping Johannes Gehrke advise a group of
talented PhD students at Cornell working on data management for games and
simulations. While at ETH Zurich, I have co-advised seven master's thesis and
10 semester projects.
·
An acknowledgement is here due to my mentors:
Johannes Gehrke at Cornell
University (postdoc), Jens
Dittrich (now at Saarland University) and Donald Kossmann at ETH Zurich
(PhD), Sérgio Lifschitz at
PUC-Rio (MSc), and Claudia Bauzer
Medeiros at UNICAMP (BSc).
Additional Information
[DBLP
| CV available upon request]