***********************************************************************  Team TAL  Dan Grossman, danieljg@cs.cornell.edu Fred Smith, fms@cs.cornell.edu Dave Walker, walker@cs.cornell.edu Stephanie Weirich, sweirich@cs.cornell.edu Steve Zdancewic, zdance@cs.cornell.edu *********************************************************************** This document describes the TAL solution to the ICFP'99 programming contest solution. It is divided into two parts. The first part describes the functional language (TALx86) that we have adopted to solve the problem. The second part describes our optimal lineartime solution. ***********************************************************************  The Functional Language  Our submission is in TALx86, a strongly typed functional language that encourages an explicit continuationpassing style and supports mutually recursive modules. We were encouraged to use this language when we learned that the competition would allow us to run our program on an interpreter implemented in hardware. We are grateful to the Intel Corporation for developing this interpreter. A basic introduction to the terms and primitive constructors of the language is helpful, so we provide such an introduction below. However, those programmers experienced in using functional languages may prefer to skip this tutorial and jump right into the example code we have provided as a solution to programming contest. We believe the explicit type annotations on distinguished TALx86 continuations make the code essentially selfdocumenting. In addition, the file tal.el is an emacsmode suitable for source editing and viewing. A TAL module is an ordered list of declarations followed by an ordered record of continuations followed by an ordered record of primitive values. In the concrete syntax, these components, or "segments", are delineated by the reserved words, "CODE" and "DATA". We focus on the ordered record of continuations since it is the most useful for the TALx86 programmer. The _simplest_ way to understand how to construct and manipulate these continuations is to view them as syntactic sugar for a stronglytyped, closed, continuationpassing lambda calculus augmented with a number of primitive operations. TALx86 also provides convenient support for traditional imperative operations by encouraging a storepassing style and by providing syntactic support for composing storepassing continuations. In fact, much of the power and expressiveness of our system revolves around the continuation composition operator, or CCO. In the concrete syntax, the CCO is the newline character, generally created by pressing the "Enter" key on a PC keyboard. Given two continuations c1 and c2, the CCO creates a new continuation c3 with arguments s and k. In familiar terms, c3 may be thought of as, \ s k . c1 s (\ s' . (c2 s' k)) The type of a continuation is expressed by stating the types of state and continuations which it may consume. The most important part of a state type is the "register file type" which is a record type with 8 fields. (Programmers often express frustration that this type does not admit wider records, but Intel has remained unresponsive.) The reason we have chosen TALx86 as our source language is that we feel that it contains primitives at an appropriate level of abstraction for completing this task. In particular, we believe its vast collection of arithmetic operations and the efficient mechanisms for composing computations will give our team the edge we need when it comes to computing the complicated cost metrics that we have determined constitute the bottleneck. Although some observers may claim that the TALx86 primitives are somewhat lowlevel, we defend our choice by pointing out that the primitives are asymmetric enough to allow a very compact encoding (so there isn't quite as much typing as you might think, if you use the TALx86 binary format).  Optimal solution in linear time  Our solution uses wellknown properties of the plus operator on the Pentium architecture. Since the contest specified that our program would be run on a Pentium, we assume that the program to test the size of our output will also be run on a Pentium. Therefore, the reasonable semantics to ascribe to the + operator in the definition of program size is 32bit addition on a Pentium processor. Obviously, the smallest programs are those with a size that is a multiple of 2^32. (Since size normally connotes the impossibility of negative values, the sizes we compute should be interpreted as unsigned 32bit numbers.) To create a program with size a multiple of 2^32, we exploit the size of the case statement, in particular the span of the arms. We simply replace the body of the first rule in the program (call it s), with the following equivalent one (where x is a metavariable explained below): (CASE (VAR "state") ((ARM 2147483648 s) (ARM x s)) s) We note that for any int x, this statement is wellformed and equivalent to s. It simply remains to identify an optimal value for x. Our program first computes the size of the whole program (call it SIZE_P) and the size of s (call it SIZE_S). We would like an x such that: 2^32 = SIZE_P + 10 + SIZE_S + SIZE_S + (x + 2^31 + 1) Hence, x = 2147483647  SIZE_P  10  SIZE_S  SIZE_S where the symbol "" designates subtraction mod 2^32 (see Intel Architecture Software Developer's manual, Volume 2, pages 3448 and 3449 under the title "SUB" for a more precise specification). We also handle one exceptional case: When presented the empty program, we output the empty program. Since this program has size 0, we remain optimal even in this case. 
