OCaml ByteCode Structure and Interpreter
Design And Approach
Source code distribution and installation
References
Summary
Because of the features mentioned above, Caml is often used to implement complex systems such as theorem provers or compilers. The COQ theorem prover is written in Caml. Ensemble(now Horus group), a toolkit for building distributed applications developed at Cornell University is written in Objective Caml. The web browser MMM is also written in Caml.
The objective of the Master of Engineering project
is to enhance the capability of OCaml interpreter. Two goals are
set. The first goal is to make the original interpreter more powerful.
By adding multithreaded feature, ML interpreter can take in and execute
several compiled bytecodes simultaneously. Since threads (light-weight
process) per bytecode are used, this will improve the efficiency of the
memory and CPU usages. The second goal is to port the interpreter
down to kernel space of WindowsNT4.0 operating system. By converting
the interpreter into a device driver, multiple bytecodes can be injected
and executed inside the kernel space at the same time. Most applications
that are written in OCaml and their bytecodes need to be ported down to
kernel can utilize this driver such as Active
Bridge (Active Network) developed by University of Pennsylvania, Computer
and Information Science (CIS) Department.
This project is a part of degree requirements
for Master
of Engineering program ('97-'98). It was guided and supervised
by Dr. Werner Vogels at Cornell
University, Computer Science Department.
The original abstraction can be found at the MEng
project list.
The reason to split the design into two phases
was solely because of the efficiency in development. User space
debugging are much more user-friendly and, more importantly, less time
consuming because any bugs in the programs, such as an access to an illegal
memory address, does not crash the system or DBS (Death Blue Screen) which
requires the memory dump and the whole process of system recovery (rebooting).
Although we can not avoid the kernel space programming, the first phase
design can take away most bugs that might occurred and the implementation
can be used as a guideline for the second phase.
First Phase: User Space Development.
1) The interface to
the user.
The original interpreter (OCamlrun.exe) arguments
were only one bytecode and the bytecode arguments itself. The "main.c"
was modified in order to take in multiple bytecodes. To distinguish
bytecodes and their arguments, the bytecode arguments follow the
respective bytecode with prefix "-".
For example:
# ocamlrun <bytecode1>
-<bytecode1_arg1> -<bytecode1_arg2> <bytecode2> -<bytecode2_arg1>
There is no limit in the number of bytecodes that can be executed at the same time as long as the memory space is available.
2) The interpreter
thread creation.
One thread will be created to execute each bytecode
(CreateThread()) [1] (P.60). The "windows.h"
header file is required to implement thread operations. In
"main.c", the main process will wait for all the created threads
to finish its execution before terminated (WaitForMultipleObjects()).
As far as the memory usage and error handling are concerned, all threads
are completely isolated. Threads are given the same priorities and
thread scheduler is managed solely by operating system. Each
thread has a unique identification number (GetThreadId()) when it
was created. The threadId is output to the screen when thread
is created. The status of each thread can be monitor through WindowsNT
Task Manager.
3) Thread local storage.
Since the original interpreter was designed to
single thread application, there were approximately one hundred global
variables in the codes. To avoid thread from corrupting each other
data, thread-local storage[1] (p.536) is applied.
Thread local storage can be implemented in two different ways: Dynamically
and Statically. Dynamic thread local storage involves a set of four
functions. It is most often used by DLLs design (Dynamic Library
Link). Static thread local storage, which is used here, automatically
allocates a data structure for each thread. Any variables that are
thread specific need to be declared as follow;
__declspec(thread) DWORD gt_dwStartTime = 0;
With "__declspec(thread)"
prefix[1] (p.557), the system creates a new instance of this variable
every time a new thread is created in the process. After figuring
out all global variables using Microsoft Developer, the static thread specific
declaration was inserted in front of all global variables. This assures
the isolation of thread data.
4) Error Exception
handler.
To avoid one thread's premature termination from
affecting the other, the exception handler [1] (p.715)("__try,
__except") is used. Any thread abnormal termination such as wrong
number of arguments or illegal memory access will throw the exception instead.
All thread abnormal terminations invoke "exit()" system call routine.
Any occurence of "exit()" in the codes has been replaced with "mt_exit()"
in which "PREMATURE_EXIT_EXCEPTION" exception is thrown instead.
The "mt_exit()" is defined inside "multithreads.c" module.
5) Mutual exclusion.
Some C runtime library such as errno and
malloc() [1] (p.86) are not designed for multithread applications.
Therefore, any calls to those functions need to be mutually excluded.
The Mutex kernel object [1] (p.311) is implemented.
Any calls to allocate heap memory ("memory.c") were wrapped by Mutex
object.
6) Memory clean-up.
With single thread application, the memory clean-up
is not neccesary because eventually the main process will be terminated.
However, in mulfstithread applications, other threads are still executing
since threads do not finished at the same time. Consequently, the
memory cleanup ("free()") for each thread's data structure need
to be performed to avoid memory leakage and increase the availability of
resource for other threads. The "cleanup()" routine was defined
and added after the call to "interprete()" inside "startup.c"
module.
Second Phase: Kernel Driver Programming.
1) Kernel-User Space
interface design.
The device is residing inside the kernel space.
We, therefore, need to define the procedure in getting the bytecode down
there and invoke the interpreter. The following Major Function Device
Control I/O Request Packets (IRP_MJ_DEVICE_CONTROL) [2] (p.63)
were created to handle the procedures.
They are defined inside "OCamlDrv.h" file.
|
Bytecode size information is need for Device Driver to allocate the memory inside kernel space. |
|
Bytecode is fed down to kernel as chunks of 128 bytes (BYTE_PACKAGE_SIZE) |
|
Invoke the OCaml interpreter procedure after all bytecodes have been transferred to kernel |
|
Clean-up all memory usage such as bytecode allocation once finishing interpreting. |
2) Thread Local Storage
Data Structure.
Unfortunately, there is no such thing like dynamic
or static thread local storage inside kernel. Consequently,
the data structure (struct THREADLOCALSTORAGE) will have
to be defined for each threads. The data structure in "OcamlDrv.h"
module is designed by gathering all variables that required Thread Local
Storage in the first phase. Since some variables have initialized
values, the method (tlsIntialize()) in "OcamlDrv.c" module
is created to handle all necessary initialization. The data
structure size is approximately 2kB per thread.
3) Degree of Multithreading
and Thread Table.
The maximum number of threads to be able to run
simultaneously is defined in "OcamlDrv.h" module under a variable
named MAXTHREAD. We have tested with up to 7 threads simultaneously
in which no error has occurred. The MAXTHREAD variable
limited the number of threads to be executed at the same time. However,
you can feed more thread than MAXTHREAD variable and the exceeding
threads will queue up and wait to be executed. To associate
each thread to its own data structure, the thread table has been invented.
The thread table (peThread[MAXTHREAD]) contains a pair of thread
Handle and threadID. With the knowledge that each kernel thread
has a unique HANDLE (pointer to the thread object), every time a
new thread gets created, it will be put into the thread table and removed
after its task finished. The methods (insertThread(), removeThread(),
getPeThread() and getThreadId()) in "OCamlRun.c" module
are used to manipulate the thread table.
4) C routine call.
Certain C standard library calls, such as malloc()
and free(), are not available with kernel programming. The
over-written and customized methods will need to be separately defined
to handle those cases. The declaration prefix (__cdecl) means
that these functions or methods will be called from C library files (".lib").
As defined in "OCamlRun.c" module, most system calls are over-written
or customized. The memory management routines are overwritten
by driver memory allocation [2] (p.86). As you might
notice, the memory allocation will be allocated 8 more extra bytes.
First 4 bytes keep the blueprint that this memory address was allocated
by this method and last 4 bytes keep the information about the size of
the memory. After many software test, this procedure is found to
be unnecessary and can be removed. The routine write() which
writes the output to console has been modified to redirect the output to
the kernel debugger instead.
5) Floating Points
Elimination.
One limitation in kernel programming is that
no floating point or double variable is allowed. The macro definition
KAMEEL (Camel in Dutch) secludes all the floating point references
inside the codes. If a bytecode uses floating points, it will not
be able to feed and run under kernel space. To test whether or not
a bytecode will be able to feed and run inside kernel, "OCamlTry"
is used to simulate kernel execution. If the bytecode runs
correctly with "OCamlTry", it will be able to feed and executed
into kernel.
6) Interpretion.
The way bytecode gets executed in kernel is exactly
the same way as it is done in user space.
7) Memory Cleanup.
Unlike user space program, kernel driver is supposed
to last as long as the life time of the operating system. Therefore,
the memory cleanup procedure is a very critical one. If kernel driver
fails to return all memory usages, it can exhaust the availability of resources
and lead to the system crash or dead blue screen. The memory cleanup
procedures have been divided into two classes; driver or interpreter allocated
memory. The driver allocated memory class gets cleaned up through
OcamlDrvCleanup() in "OcamlDrv.c" module. The interpreter
allocated memory gets cleaned up thread_cleanup() in "startup.c"
module. Although the interpreter allocated memory could be cleaned
in the driver cleanup, we keep them separate just for structural reason.
8) OcamlFeed.
The OcamlFeed behaves like an interface between
user space program and kernel driver. Its functions are
simply to take bytecode (as argument), establish the communication with
kernel device via IRP(I/O Request Package), inject bytecode into driver
and invoke the interpreter. The status for each function will
be output to console.
9) OcamlTry.
The OcamlTry is used to test the program whether
or not it is executable inside kernel space. It simply detects the
usage of floating or double.
1) Platform and General
Requirement.
The project has 2 sets of source codes distribution;
User Space and Kernel Device Driver source codes. The
interpreter was intended to operate on Windows95 or NT4.0 or higher platform.
The Microsoft Developer Studio VC++ version 5 or higher compiler is required
in compiling and linking all codes. The same requirement hold for
multithreaded version installation. The DDK (Device Driver Developer
Kit) is required for kernel driver installation. The first installation
of driver requires system restart. Any consequence modification require
only driver restart ("net stop ocamldrv" and "net start ocamldrv")
Finally, you will also need to down load GNU.zip
pc-unix commands.
2) To Obtain Codes.
The original OCaml version 1.07 interpreter source
codes are kept under byterun.zip file.
They can also be downloaded from INRIA
ftp distribution site directly.
The multithreaded OCaml interpreter source codes
are kept under byterun-mt.zip
file.
The OCaml interpreter driver source codes are
kept under byterun-drv.zip file.
The OCamlFeed source codes are kept under OcamlFeed.zip
file.
The OCamlTry source codes are kept under OcamlTry.zip
file.
The standard procedure in Building and Installing
Drivers can be found in Device Driver book [2] (p.398).
The OCamlbytecode examples for testing are kept
under bytecodes.zip file.
3) Installation Instruction.
To compile the codes, check the compilation procedure
and instruction in "Makefile.nt" file extracted from the zip file.
4) White Paper.
The white paper
is in Word Format. MS Word is required.
[3] Silberschatz Galvin.
Process Synchronization (pages 163-210). In Operating System Concepts
- Fourth Edition, January, 1995.
[4] Xavier Leroy. The
Introduction to Objective Caml. Institut National de Recherche
en Informatique et Automatique, Paris, France, 1997.
[5] Pierre.Weis.
One hundred
lines of Caml. Institut National de Recherche en Informatique
et Automatique, Paris, France, 1997.
[6] Xavier Leroy. The Objective Caml system, documentation and user's guide. Institut National de Recherche en Informatique et Automatique, Paris, France, 1997.
And some other OCaml
related papers.