Architecture

rsmModel
Figure: Workflow of the methodology for auto-tuning distributed and parallel configurations.

The Figure illustrates the basic workflow used in Troika. The methodology employs simulations to build an RSM model that characterizes the performance response of applications as a function of configuration parameters. It then discovers the optimal combination of configuration parameters that achieve the user’s goals.

We have instantiated this methodology in a configuration recommendation tool, called Troika, for MapReduce 2.0 applications. Troika makes application, framework, and hardware-specific configuration recommendations within user-defined budget constraints. It comprises a simulator for MapReduce 2.0 applications, a model builder, and an RSM solver.

  • Simulator: Uses an application and content-dependent mechanism to quickly estimate the execution time of a profile, which is a set of parameters that describe applications on a given hardware platform with a given software configuration.
  • Recommendation Engine: Finds a near-optimal profile by means of RSM models. It targets a good trade-off between recommendation optimality and the time spent in the recommendation process. It comprises a model builder, and an RSM solver.
    • Model Builder: Parses a specification of the profile space, prunes profiles that violate user constraints, and generates profiles for the simulator. Execution times obtained through simulation are used to construct an RSM model that captures the various tradeoffs between profiles.
    • RSM Solver: Examines the feasible envelope to determine the top few profiles that are estimated to yield the best performance.
  • Initial Profile Wizard (IPW): Facilitates collecting Troika parameters necessary for the simulation. IPW parses configuration files and measures software and hardware characteristics of the target cluster to produce the initial profile.