CS790 - ML Interpreter

CS790 MEng Project Multithreaded ML Interpreter Suwat Ch.

Introduction

   OCaml ByteCode Structure and Interpreter

   Design And Approach

   Source code distribution and installation

   References

   Summary

Introduction

Objective Caml programming language (OCaml) is developed and distributed by the leading French Institute INRIA (Institut National de Recherche en Informatique et Automatique). It is a safe language with type-check at compiled time and optimizing compiler features. The compiler performs many sanity checks on programs and eliminates many programming errors such as data type confusions, erroneous accesses into compound values. In effect, all these points are carefully checked by the compiler, so that the data accesses can be verified to the compiler code generator, to ensure that data manipulated by programs may never be corrupted. Like other newly developed languages such as Java, Ocaml comes with automatic memory management (garbage collector) and implements the object oriented paradigm.

Because of the features mentioned above, Caml is often used to implement complex systems such as theorem provers or compilers. The COQ theorem prover is written in Caml. Ensemble(now Horus group), a toolkit for building distributed applications developed at Cornell University is written in Objective Caml. The web browser MMM is also written in Caml.

The objective of the Master of Engineering project is to enhance the capability of OCaml interpreter. Two goals are set. The first goal is to make the original interpreter more powerful. By adding multithreaded feature, ML interpreter can take in and execute several compiled bytecodes simultaneously. Since threads (light-weight process) per bytecode are used, this will improve the efficiency of the memory and CPU usages. The second goal is to port the interpreter down to kernel space of WindowsNT4.0 operating system. By converting the interpreter into a device driver, multiple bytecodes can be injected and executed inside the kernel space at the same time. Most applications that are written in OCaml and their bytecodes need to be ported down to kernel can utilize this driver such as Active Bridge (Active Network) developed by University of Pennsylvania, Computer and Information Science (CIS) Department.

This project is a part of degree requirements for Master of Engineering program ('97-'98). It was guided and supervised by Dr. Werner Vogels at Cornell University, Computer Science Department. The original abstraction can be found at the MEng project list.

Design and Approach

The project has 2 design phases; User Space Development and Kernel Space Porting.

The reason to split the design into two phases was solely because of the efficiency in development. User space debugging are much more user-friendly and, more importantly, less time consuming because any bugs in the programs, such as an access to an illegal memory address, does not crash the system or DBS (Death Blue Screen) which requires the memory dump and the whole process of system recovery (rebooting). Although we can not avoid the kernel space programming, the first phase design can take away most bugs that might occurred and the implementation can be used as a guideline for the second phase.

First Phase: User Space Development.

1) The interface to the user.
The original interpreter (OCamlrun.exe) arguments were only one bytecode and the bytecode arguments itself. The "main.c" was modified in order to take in multiple bytecodes. To distinguish bytecodes and their arguments, the bytecode arguments follow the respective bytecode with prefix "-".
For example:
# ocamlrun <bytecode1> -<bytecode1_arg1> -<bytecode1_arg2> <bytecode2> -<bytecode2_arg1>

There is no limit in the number of bytecodes that can be executed at the same time as long as the memory space is available.

2)    The interpreter thread creation.
One thread will be created to execute each bytecode (CreateThread()) [1] (P.60).   The "windows.h" header file is required to implement thread operations.   In "main.c", the main process will wait for all the created threads to finish its execution before terminated (WaitForMultipleObjects()).   As far as the memory usage and error handling are concerned, all threads are completely isolated. Threads are given the same priorities and thread scheduler is managed solely by operating system.   Each thread has a unique identification number (GetThreadId()) when it was created.   The threadId is output to the screen when thread is created. The status of each thread can be monitor through WindowsNT Task Manager.

3)    Thread local storage.
Since the original interpreter was designed to single thread application, there were approximately one hundred global variables in the codes. To avoid thread from corrupting each other data, thread-local storage[1] (p.536) is applied.   Thread local storage can be implemented in two different ways: Dynamically and Statically. Dynamic thread local storage involves a set of four functions. It is most often used by DLLs design (Dynamic Library Link).   Static thread local storage, which is used here, automatically allocates a data structure for each thread. Any variables that are thread specific need to be declared as follow;

__declspec(thread) DWORD gt_dwStartTime = 0;

With "__declspec(thread)" prefix[1] (p.557), the system creates a new instance of this variable every time a new thread is created in the process. After figuring out all global variables using Microsoft Developer, the static thread specific declaration was inserted in front of all global variables. This assures the isolation of thread data.

4)    Error Exception handler.
To avoid one thread's premature termination from affecting the other, the exception handler [1] (p.715)("__try, __except") is used. Any thread abnormal termination such as wrong number of arguments or illegal memory access will throw the exception instead.   All thread abnormal terminations invoke "exit()" system call routine. Any occurence of "exit()" in the codes has been replaced with "mt_exit()" in which "PREMATURE_EXIT_EXCEPTION" exception is thrown instead. The "mt_exit()" is defined inside "multithreads.c" module.

5)    Mutual exclusion.
Some C runtime library such as errno and malloc() [1] (p.86) are not designed for multithread applications. Therefore, any calls to those functions need to be mutually excluded. The Mutex kernel object [1] (p.311) is implemented. Any calls to allocate heap memory ("memory.c") were wrapped by Mutex object.

6)    Memory clean-up.
With single thread application, the memory clean-up is not neccesary because eventually the main process will be terminated. However, in mulfstithread applications, other threads are still executing since threads do not finished at the same time. Consequently, the memory cleanup ("free()") for each thread's data structure need to be performed to avoid memory leakage and increase the availability of resource for other threads. The "cleanup()" routine was defined and added after the call to "interprete()" inside "startup.c" module.

Second Phase: Kernel Driver Programming.

1)    Kernel-User Space interface design.
The device is residing inside the kernel space. We, therefore, need to define the procedure in getting the bytecode down there and invoke the interpreter. The following Major Function Device Control I/O Request Packets (IRP_MJ_DEVICE_CONTROL) [2] (p.63) were created to handle the procedures.   They are defined inside "OCamlDrv.h" file.

IOCTL_OCAML_BYTECODE_SIZE Bytecode size information is need for Device Driver to allocate the memory inside kernel space.

IOCTL_OCAML_BYTECODE_FEED Bytecode is fed down to kernel as chunks of 128 bytes (BYTE_PACKAGE_SIZE)

IOCTL_OCAML_BYTECODE_INTERPRETE Invoke the OCaml interpreter procedure after all bytecodes have been transferred to kernel

IOCTL_OCAML_BYTECODE_CLEAN Clean-up all memory usage such as bytecode allocation once finishing interpreting.

The above four device controls define the interaction between OCaml device driver (OCamlDrv.c) and OcamlFeed (OCamlFeed.c).    The status of each stage will be output to screen when executing OcamlFeed.

2) Thread Local Storage Data Structure.
Unfortunately, there is no such thing like dynamic or static thread local storage inside kernel. Consequently, the data structure (struct THREADLOCALSTORAGE) will have to be defined for each threads. The data structure in "OcamlDrv.h" module is designed by gathering all variables that required Thread Local Storage in the first phase. Since some variables have initialized values, the method (tlsIntialize()) in "OcamlDrv.c" module is created to handle all necessary initialization. The data structure size is approximately 2kB per thread.

3) Degree of Multithreading and Thread Table.
The maximum number of threads to be able to run simultaneously is defined in "OcamlDrv.h" module under a variable named MAXTHREAD. We have tested with up to 7 threads simultaneously in which no error has occurred. The MAXTHREAD variable limited the number of threads to be executed at the same time. However, you can feed more thread than MAXTHREAD variable and the exceeding threads will queue up and wait to be executed. To associate each thread to its own data structure, the thread table has been invented. The thread table (peThread[MAXTHREAD]) contains a pair of thread Handle and threadID. With the knowledge that each kernel thread has a unique HANDLE (pointer to the thread object), every time a new thread gets created, it will be put into the thread table and removed after its task finished. The methods (insertThread(), removeThread(), getPeThread() and getThreadId()) in "OCamlRun.c" module are used to manipulate the thread table.

4)    C routine call.
Certain C standard library calls, such as malloc() and free(), are not available with kernel programming. The over-written and customized methods will need to be separately defined to handle those cases. The declaration prefix (__cdecl) means that these functions or methods will be called from C library files (".lib"). As defined in "OCamlRun.c" module, most system calls are over-written or customized.   The memory management routines are overwritten by driver memory allocation [2] (p.86).   As you might notice, the memory allocation will be allocated 8 more extra bytes. First 4 bytes keep the blueprint that this memory address was allocated by this method and last 4 bytes keep the information about the size of the memory. After many software test, this procedure is found to be unnecessary and can be removed.   The routine write() which writes the output to console has been modified to redirect the output to the kernel debugger instead.

5)    Floating Points Elimination.
One limitation in kernel programming is that no floating point or double variable is allowed. The macro definition KAMEEL (Camel in Dutch) secludes all the floating point references inside the codes. If a bytecode uses floating points, it will not be able to feed and run under kernel space. To test whether or not a bytecode will be able to feed and run inside kernel, "OCamlTry" is used to simulate kernel execution.   If the bytecode runs correctly with "OCamlTry", it will be able to feed and executed into kernel.

6)    Interpretion.
The way bytecode gets executed in kernel is exactly the same way as it is done in user space.

7)    Memory Cleanup.
Unlike user space program, kernel driver is supposed to last as long as the life time of the operating system. Therefore, the memory cleanup procedure is a very critical one. If kernel driver fails to return all memory usages, it can exhaust the availability of resources and lead to the system crash or dead blue screen. The memory cleanup procedures have been divided into two classes; driver or interpreter allocated memory.   The driver allocated memory class gets cleaned up through OcamlDrvCleanup() in "OcamlDrv.c" module. The interpreter allocated memory gets cleaned up thread_cleanup() in "startup.c" module. Although the interpreter allocated memory could be cleaned in the driver cleanup, we keep them separate just for structural reason.

8)    OcamlFeed.
The OcamlFeed behaves like an interface between user space program and kernel driver.    Its functions are simply to take bytecode (as argument), establish the communication with kernel device via IRP(I/O Request Package), inject bytecode into driver and invoke the interpreter.   The status for each function will be output to console.

9)    OcamlTry.
The OcamlTry is used to test the program whether or not it is executable inside kernel space. It simply detects the usage of floating or double.

Souce code Distribution and Installation

1)    Platform and General Requirement.
The project has 2 sets of source codes distribution; User Space and Kernel Device Driver source codes.   The interpreter was intended to operate on Windows95 or NT4.0 or higher platform. The Microsoft Developer Studio VC++ version 5 or higher compiler is required in compiling and linking all codes. The same requirement hold for multithreaded version installation. The DDK (Device Driver Developer Kit) is required for kernel driver installation. The first installation of driver requires system restart. Any consequence modification require only driver restart ("net stop ocamldrv" and "net start ocamldrv")   Finally, you will also need to down load GNU.zip pc-unix commands.

2)    To Obtain Codes.
The original OCaml version 1.07 interpreter source codes are kept under byterun.zip file.   They can also be downloaded from INRIA ftp distribution site directly.

The multithreaded OCaml interpreter source codes are kept under byterun-mt.zip file.

The OCaml interpreter driver source codes are kept under byterun-drv.zip file.

The OCamlFeed source codes are kept under OcamlFeed.zip file.

The OCamlTry source codes are kept under OcamlTry.zip file.

The standard procedure in Building and Installing Drivers can be found in Device Driver book [2] (p.398).

The OCamlbytecode examples for testing are kept under bytecodes.zip file.

3)    Installation Instruction.
To compile the codes, check the compilation procedure and instruction in "Makefile.nt" file extracted from the zip file.

4)    White Paper.
The white paper is in Word Format.   MS Word is required.

References

[1] Jefrey Richter. Thread Synchronization (pages 283-416), Thread-Local Storage (pages 535-568), Structures Exception Handling (pages 683-782). In Advanced Windows, The Developer's Guide to the Win32 API for Windows NT3.5 and Windows 95, Redmond, WA, 1995.

[2] Art Baker. Driver Memory Allocation (pages 86-90), Building And Installing Drivers (pages 398-418), Testing and Debugging Drivers (pages 419-458). In The Windows NT Device Driver Book, A Guide For Programmers, Upper Saddle River, NJ, 1996.

[3]    Silberschatz Galvin. Process Synchronization (pages 163-210). In Operating System Concepts - Fourth Edition, January, 1995.

[4]    Xavier Leroy. The Introduction to Objective Caml. Institut National de Recherche en Informatique et Automatique, Paris, France, 1997.

[5]     Pierre.Weis. One hundred lines of Caml. Institut National de Recherche en Informatique et Automatique, Paris, France, 1997.

[6] Xavier Leroy. The Objective Caml system, documentation and user's guide. Institut National de Recherche en Informatique et Automatique, Paris, France, 1997.

And some other OCaml related papers.

Summary

The project was initiated in December 1997 and completed in April 1998. I would like to thank Dr. Werner Vogels at Cornell University for his technical guidance and valuable advises, Xavier Leroy at INRIA for OCaml clarification and my fellow CS MEng student, Wilson Xun Huang, for his device driver design support.

Although the OCaml driver is working in the laboratory, future research and improvements, such as debugging feature and application integration, will still be needed. Any suggestion for improvement to the project or correction for any mistakes in the paper will certainly be welcome and appreciated. Please feel free to contact the author in case there is any question or more information needed.

Created and maintained by Suwat Chitphakdibodin. This page was last modified on Friday May 1, 1998. Contact the author suwatch@cs.cornell.edu Back to Top

IOCTL_OCAML_BYTECODE_SIZE	Bytecode size information is need for Device Driver to allocate the memory inside kernel space.
IOCTL_OCAML_BYTECODE_FEED	Bytecode is fed down to kernel as chunks of 128 bytes (BYTE_PACKAGE_SIZE)
IOCTL_OCAML_BYTECODE_INTERPRETE	Invoke the OCaml interpreter procedure after all bytecodes have been transferred to kernel
IOCTL_OCAML_BYTECODE_CLEAN	Clean-up all memory usage such as bytecode allocation once finishing interpreting.