Recovery for Games and Simulations
Research | Software | Publications | Funding
Research
Massively multiplayer online games (MMOs) are persistent virtual worlds that allow tens of thousands of users to interact in fictional settings. Users typically select a virtual avatar and collaborate with other users to solve puzzles or complete quests. These games are extremely popular, and successful MMOs have millions of subscribers and have generated billions of dollars in revenue. Unlike single player computer games, MMOs must persist across user sessions. Players can leave the game at any time, and they expect their achievements to be reflected in the world when they rejoin. Similarly, it is unacceptable for the game to lose player data in the event of a crash. These demands make it essential for MMOs to ensure that their state is durable.
Updates and Persistence in MMOs
Typical MMOs follow the architecture outlined below:
Clients join the virtual world through a connection server that connects them to a single shard. Shards are independent versions of the virtual world aimed at improving scalability. Shards are not synchronized, and players on one shard cannot interact with players on another. Current MMOs focus on providing transactional guarantees for a small subset of updates that need to be globally consistent across servers or communicate with external services. For example, MMOs may include financial transactions that require ACID properties. These transactions frequently involve user interaction or communication with an external system, and thus the update rate is fairly low. As a consequence, recovery can be handled by a standard DBMS with an ARIES-style recovery manager. This system is the persistence server in the figure above.
In addition to proper transactions, however, MMOs also include a large number of local updates that change the game state but do not require complete transactional behavior. For instance, character movement is the single largest source of updates in most MMOs, but game specific logic ensures that these updates never conflict. Nevertheless, we would like to ensure that local updates are durable so that players do not lose their progress in the game in the event of a server failure.
Our Current Work
We are currently revisiting main-memory database recovery techniques in the context of MMOs. These techniques are useful to deal with the high update rates resulting from local game updates. As a first step, we are working on recovery schemes for a single shard, though our techniques can be extended to shardless architectures. Our long-term goal is to provide a recovery service for more sophisticated distributed server architectures.
As a first result of our investigation, we have performed an experimental evaluation of several checkpoint recovery algorithms using a detailed simulation model. The advantage of using a simulator is that the algorithms can be easily compared over different hardware configurations. You will find a summary of our experimental study in our blog and the full study in our paper.
The main conclusions of our experimental study are:
- Methods that perform copy on update of dirty objects only have clear latency advantages over methods based on eager copies of the game state. They avoid latency peaks by spreading their overhead over a number of game ticks.
- When update rates are so dramatically large and skewed that the entire game state gets updated in a single tick of the game, little can be done to reduce the latency impact of the checkpoint algorithms. In this extreme situation, an algorithm based on an eager copy of the entire game state introduces the minimum pause in the game.
- Methods based on a double-backup organization either match or outperform log-based alternatives in terms of recovery time.
- The best method for a wide range of parameters is copy on update combined with a double backup. This method outperforms alternatives by up to a factor five in latency without any degradation in recovery time.
The source code for our simulator can be downloaded below.
Software
Publications
See All Publications in Games and Simulations
-
Marcos Vaz Salles, Tuan Cao, Benjamin Sowell, Alan Demers, Johannes Gehrke, Christoph Koch, and Walker White,
An Evaluation of Checkpoint Recovery for Massively Multiplayer Online Games.
In Proc. of the 2009 VLDB Conf. on Very Large Databases (VLDB 2009).
Funding
This research has been supported by the National Science Foundation under Grant IIS-0725260, by the Air Force Office of Scientific Research, and by a grant from Microsoft. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.