CYCLONE BENCHMARKS

Each of the subdirectories of this directory (minus bin/ and results/) is a
benchmark program directory.  In each such directory are a number of
subdirectories: c/, cyc/, and potentially others having the prefix cyc-.  In
each of these directories is code for the particular versions of the given
benchmark.  The c/ directory has the original C version of the program, and
the cyc/ directory has the basic Cyclone port of that program.  The
directory cyc-region/, if present, contains a "regionized" version of the C
program that attempts to eliminate the use of the garbage collector in favor
of Cyclone's dynamic regions.  Other directories may exist as well.

The benchmark infrastructure that we provide here measures both the running
time of the various benchmark programs, and the code differences between
their C and Cyclone versions.  We also provide scripts that will generate
LaTeX tables based on the results.


RUNNING THE BENCHMARKS

To build the tests, you need to have Cyclone installed and available in your
PATH.  You also need to have installed a version of the standard Cyclone
libraries with bounds checks turned off.  Do this by invoking 'make nocheck'
in the toplevel directory of the Cyclone distribution, and then doing a 'make
install'.

To run all the tests, from the toplevel (benchmarks) directory, do

make

This will in turn invoke three targets: build, test, and diff, which we now
explain.

1) make build

This target builds all of the benchmark programs, using a combination of two
top-level Makefiles, called Makefile and Makefile.project, and the Makefiles
in each of the benchmark subdirectories.  The toplevel Makefile is used to
multiplex basic targets; all of the work gets done in Makefile.project.  In
each benchmark subdirectory is a Makefile.inc file with definitions needed
by Makefile.project for that particular benchmark.

When the user does make build, the toplevel Makefile reinvokes make, for
each program P specified in the PROGRAMS variable, with the targets P-build
and P-build-nocheck.  By default this includes every benchmark provided with
the distribution.  These targets build the various versions of the program P
by invoking the build and build-nocheck targets, respectively, in
Makefile.project after first changing to the directory P.  The
Makefile.project targets will then change into the appropriate code
directories (stored in c/ and cyc/ and cyc-*/ subdirectories) and build them
using their specific Makefiles.  The P-build targets generate the C and
Cyclone executables in each subdirectory of P, and the P-build-nocheck
target builds the Cyclone versions again but with bounds checks and null
checks turned off.

NOTE: All of the benchmarks have been verified to build and run properly
under Linux, but we know that some programs will not bulid on other
supported platforms, like OSX and Cygwin, due to library incompatibilities.
These should port with a small amount of work.

2) make test

This target invokes the running time tests for each program in the PROGRAMS
variable.  The methodology is basically the same as for building: there is a
%-test target in the Makefile, which results in the test target being called
in the appropriate directory in Makefile.project.  Each program is run
NUMTESTS times (a variable in the toplevel Makefile), which is by default
21.  In each benchmark directory is a file called TEST_SPEC that describes
how the tests should be run for the executables in that directory.  The
results of running the tests are stored in the toplevel results/ directory;
this includes the raw output of running the tests, as well as the "cooked"
output that results from processing the raw data.  The cooked output is
generated by the C program bin/hist, and is used to build graphs, tables,
etc., mentioned below.  A description of the TEST_SPEC file format is also
below.

All of the raw results are stored in results/RUN/PROG.raw for each benchmark
directory PROG, where RUN is an indicator of the run; unless specified by
the user as a make variable, this will be the current date and time.  Any
output from the programs is stored in results/RUN/PROG.log.  In addition,
the raw data is collated by the program bin/munge.pl and stored in the file
results/RUN/PROG.cooked.  This format includes the relevant statistical
data, in particular the mean, median, standard deviation, etc.  A comment at
the top of the file indicates which is which.

The cooked data can be used to generate a LaTeX table of the running times,
in the format of our 2002 PLDI paper.  To create this table, invoke

bin/maketab.sh RUN

where RUN is a directory in results/ that contains the *.cooked data files.
By default, a symbolic link "last" is created to point to the last
successful run. Thus you can do

bin/maketab.sh last

to generate a table for the data just collected.

NOTE: Two of the benchmark program tests rely on other benchmarks having
been built.  In particular, mini_httpd requires http_load to have been
built, and vice versa.  If you attempt to run one of these tests without
having built both programs (by doing make http_load-test for example),
the test will fail.

3) make diff

This target invokes the diff tests for each program in the PROGRAMS
variable.  For each benchmark directory, the bin/diff.pl script is invoked
to compare the differences in the source files between the various
versions.  The diff report for a benchmark P obtained by doing make P-diff
is stored in P/report.diff.  The toplevel target concatenates all of these
files together and stores them in the file results/report.diff.


TEST_SPEC

Finally, we will now explain the format of the TEST_SPEC file.  We consider
it abstractly first.  The file essentially defines the following arrays:

  TESTNAME [T] where T is the number of tests
  VAR [V]      where V is the number of variations
  PROG [P,V]   where P is the number of programs per variation
  ARGS [P,T]   
  FILES [P,T,2]

The TESTNAME array defines the specific tests to be run.  The VAR array
defines the "variations" to be compared.  Each variation consists of an
equal number of equivalent programs whose output is combined to form a
single test.  These program names are stored in the PROG array.  For
example, we might define a variation "Gzip" that consists of two programs,
"gzip" and "gunzip".  The running times of these two programs are combined
to provide one running time for variation "Gzip."  Each program is timed
separately, so we avoid measuring the overhead of combining the programs
together (i.e. the shellscript, pipe, etc.), for more accurate accounting.
The ARGS and FILES arrays indicate the argument flags and the input/output
files, respectively, to be given to each program.  These flags and files are
defined *per test* (not per variation), so all variations use the same
flags.

The file runtest.pl reads in the specfile and runs the tests.  If we want to
perform N iterations of each test, runtest.pl essentially executes the
following pseudocode:

for f = 1 to T
  for v = 1 to V
    for i = 1 to P
      for x = 1 to N
        timeit PROG[i,v] ARGS[i,f] < FILES[i,f,0] > FILES[i,f,1]
      done
    done
  done
done

The format of the TEST_SPEC file is as follows:

TESTNAME[\t]n1[\t]n2...[\t]nT[\n]
VAR[\t]v1[\t]v2...[\t]nV[\n]
PROG[\t]p11 p12 ...p1P[\t]p21 p22 ...p2P[\t]...pVP[\n]
ARGS[\t]a11 a12 ...a1P[\t]a21 a22 ...a2P[\t]...aTP[\n]
FILES[\t]f11 f12 ...f1P[\t]f21 f22 ...f2P[\t]...fTP[\n]

Furthermore, the ARGS variables (i.e. a11) may contain # characters that are
later removed to represent spaces.  Similarly with the FILES variables
(i.e. a # character is used to divide the input and the output file).  If no
args or files are needed for a particular program, the {} text is used
instead.  Here is an example file:

TESTNAME	test1
VAR	C	Cyclone	Cyclone-nobc
PROG	c/encode c/decode	cyc/encode cyc/decode	cyc/encode-nochk cyc/decode-nochk
ARGS	{} {}
FILES	test1#test1.ar.tmp test1.ar.tmp#test1.new.tmp

This file defines one test, test1, with three variations: "C", "Cyclone" and
"Cyclone-nobc".  Each variation has two programs, one for encoding and one
for decoding.  For the C variation, these two programs are "c/encode" and
"c/decode"; for the Cyclone variation they are "cyc/encode" and "cyc/decode"
(notice the two programs are separated by spaces, while the program groups
are separated by tabs), etc.  Neither program requires any arguments, so the
two argument specifiers are {}.  If arguments were specified, any occurrence
of # is replaced with a space.  Finally, the programs do require an input
and output file; for the encode program, the input file is "test1" and the
output file is "test1.ar.tmp" (notice that these two files are separated by
a #) while the decode program has input file "test1.ar.tmp" and output file
"test1.new.tmp".  In the case that no input file is specified, "/dev/null"
is used instead; if no output file is specified, then the output is logged
to the file results/RUN/log, where RUN is a directory that identifies the
current run (usually the date).

Each program is executed using the script bin/test.sh.  This program uses
/usr/bin/time to time each program execution (it is the "timeit" command in
the above nested loops).

Alternatively, each test can perform specific timing actions by defining
a test.sh file in the test's directory.  This is useful if you don't
want the overhead of /usr/bin/time but instead want the program to
time itself, and this timing information needs to be extracted in
a program-specific manner.  Look at the grobner directory for an
example.

---last updated: 4/17/02
