Software-based Fault Isolation
Lecturer: Professor Fred B. Schneider
 
Lecture notes by
Lynette I. Millett
We have been discussing protection measures that a single operating
system can provide. One way to think of this is to view the operating
system as a padded cell in which programs operate; we have been
discussing the nature of the padding. Another way to get programs to
behave in a manner consistent with a given security policy is by
"brainwashing." That is, modify the programs so that they behave only
in safe ways. This is embodied by a recent approach to security known
as software-based fault isolation (SFI).
 So far, the environment has been responsible for policy
enforcement, where the environment is either the OS/kernel or the
hardware. Hardware methods include addressing mechanisms (e.g. virtual
memory); OS methods include having two modes (where the supervisor
mode has access to everything).  The new approach we discuss today is
to construct a piece of software that transforms a given program p
into a program p', where p' is guaranteed to satisfy a security policy
of interest.
 
 This SFI SW transformation could be any number of things. It could be
a piece of the compiler or of the loader. It could also involve a
separate pass over machine language code before execution
commences. The point is that we are modifying the program before it is
executed. (One easy realization of SFI SW is to always output a
program that does nothing. However, there are likely to be properties
of the original program that we are interested in preserving, and
these properties might not be satisfied by a program that does
nothing.)
This SFI SW transformation could be any number of things. It could be
a piece of the compiler or of the loader. It could also involve a
separate pass over machine language code before execution
commences. The point is that we are modifying the program before it is
executed. (One easy realization of SFI SW is to always output a
program that does nothing. However, there are likely to be properties
of the original program that we are interested in preserving, and
these properties might not be satisfied by a program that does
nothing.)
 
When would the SFI approach be useful (as opposed to
the other approaches we have been discussing this semester)?
 
-  If we are trying to run programs on a machine that lacks suitable
protection hardware, then with this method we can achieve the
functionality that the hardware doesn't support.
-  Another application is when hardware doesn't support the security
policy of interest. This is likely to become increasingly important,
especially in situations where security policies relate to
higher-level abstractions (e.g. paragraphs in a document being
assigned security ratings). If the hardware or the OS software doesn't
know about a particular abstraction, we can add a program as described
above to handle it.
-  The original motivation for this approach was performance. We can
use SFI to achieve memory protection at a low cost. Most operating
systems/HW provide security by protecting disjoint address
spaces. However, programs operating in these isolated address spaces
need to communicate with each other. An inter-process communication
(IPC) provided by the operating system is invariably 10-100 times as
costly as a procedure call. An IPC needs to trap, copy arguments,
save/restore arguments, change the address space and then repeat all
of the above on the way out. We would like to avoid this costly
mechanism and can replace it with SFI.
-  Another use might be for programs that support plug-ins. We would
like to protect the main program (e.g. a browser) from buggy
plug-ins. This is a form of "sub-address space" protection, since the
browser and the plug-in execute in the same address space.
-  Finally, this approach is useful for programs comprising many 
"objects." Such  programs would profit from having sub-address space level 
protection. If one object "goes bad," it would be nice if that behavior 
didn't trash everything else. However, an IPC-like operation that would 
accompany operating system address space protection is likely to be much too 
expensive for use with every method invocation. 
 SFI for Logical Fault Domains 
 
We take a single address space and partition it into regions called
logical fault domains (LFDs). We would like to enforce
(without resorting to help from the operating system) programs in an
LFD being isolated from reading or writing to memory outside their
LFD. We also need a cheap way to communicate between LFDs. The policy
we want is as follows:
 
-  Programs can read and write memory within their LFD. This ensures secrecy
     and integrity for that memory. 
-  Jumps (jmps, calls and returns) are to be kept within their own LFD.
 To start, we will address the read/write constraints. Suppose that
the LFD is memory address '0300'h to '03FF'h. Given a program's code,
the software to modify it should look for instructions equivalent to
'write x' and ensure that x is located in the LFD's region. An
instruction 'write x' can be transformed to
 
cmp x 0300 
 
if less Error 
 
cmp x 03FF 
 
if greater Error 
 
write x 
The same can be done for a read.  One instruction is transformed to
five, which is a five-fold code blow-up.  We might be better off just
accepting the cost of a context switch (IPC) rather than inflating the
code (writes occur much more frequently than context switches). 
In order to cut down on the number of additional instructions, we can
assume that LFDs all start on integral boundaries (e.g. 0000-00FF,
0100-01FF, 0200-02FF, etc.) Also, note that if we control the
writes, then we don't really need to worry about the reads: even if a
program could read outside of its LFD, if it can't also write, 
secrecy and integrity will be maintained. Suppose we are modifying a
program whose LFD falls between 0300 and 03FF. Then, if we see an
address whose high-order bits are 04 we know the program is attempting
to access something outside of its LFD.  We will use a temporary
register and masks to transform an instruction 'write x' as follows:
tmp := x & FF00 
 
cmp tmp 0300 
 
if not equal Error 
 
write x 
 
We have now decreased the code blow-up to 4x.  However, we can still
do better. We make a further assumption that if a program attempts to
write outside of its LFD, the program is malfunctioning anyway so,
instead of trying to detect bad addresses, we can just alter them so
they are within the LFD. This allows for a cheaper implementation. How
a bad program dies is immaterial, and we have effectively only altered
the behavior of malfunctioning programs. We again use a separate
temporary register (tmp) to transform an instruction 'write x' to: 
tmp : = x & 00FF 
 
tmp : = tmp | 0300 
 
write tmp 
 
The cost for a write is now just 3 times what it was originally.  Note
that we could rewrite this without using the separate temporary
register.  However, this would allow a malicious programmer to exploit
the fact that SFI was being done. For instance, if we transformed
'write x' to the following SFI sequence: 
x : = x & 00FF 
 
x : = tmp | 0300 
 
write x 
 
then a programmer could take advantage of the fact that an SFI write
is 3 instructions long and branch to the third (the new 'write x')
instruction, thereby circumventing the SFI transformation. This is
avoided by using the temporary register which the original program
does not read or write and maintaining the invariant: the register
'tmp' always contains a valid address unless the program is in the
midst of an SFI sequence.  This technique is referred to as
sandboxing. We now turn to the question of how to transform jumps. We need to
prevent jumps into regions that are outside of the given program's
LFD. We also need to ensure that there are no jumps into the middle of
an SFI sequence.  There are two types of jump instructions: direct and
indirect. Direct jumps are not a problem, since we can easily inspect
the address to ensure that it is in the correct LFD.
There are three
kinds of indirect jumps: jmp, call and return. The notation is: jmp
(r) which means: take r and jump to what it points to.
Handling indirect jumps is not all that different from the way we
handle write instructions. We could add code that compares the
branch/target address under consideration, or we could sandbox it as
we did with writes. To pursue the sandboxing approach, we postulate a
new register 'ctmp' and transform an instruction 'jmp (r)' as follows:
 
ctmp := r & SuffixMask 
 
ctmp := ctmp | PrefixMask 
 
jmp (ctmp) 
 
where SuffixMask is, for example, 00FF, and PrefixMask might be 0500 (if we
are interested in staying within the LFD 0500-05FF). If 'ctmp' is 
manipulated only in SFI regions, then, as long as the program is 
executing outside an SFI sequence, ctmp has a value within the the 
appropriate LFD. Thus, even if a malicious program tries to jump to the 
middle of an SFI region, it will still not be able to jump outside of 
the current LFD. 
The transformation for 'call (r)' is similar. 
 For a return (which pops the stack and branches) we need to make
sure that the address at the top of the stack is reasonable. To do
that, an instruction is prefixed to the above jump transformation:
ctmp = PopStack.
 
Even with these transformations, there remain problems 
due to jumps. Complex opcodes may contain unsafe 
opcodes. For example, the instruction mov edx, 'C23BCB50'h
assembles to BA 50 CB 3B C2. However, 50 CB 3B C2 can be 
disassembled to 
push eax 
 
return 
 
cmp . . . 
 
In this instance, jumping into the middle of the original instruction
means something very different, but is a legitimate instruction (due
to compact instruction representations). SFI won't catch or correct
this sort of problem. For example, it could possible to branch into
the middle of an instruction, which would then be interpreted as an
instruction to jump outside the LFD. A solution is to enforce our
assumptions about where the control points are in a program. We design
predicates that are true for instruction boundaries:
ValidIndirectCallPoint, ValidIndirectRetPoint, and
ValidIndirectJumpPoint. We augment the SFI sequence for a jump with a
check to make sure that the target address is to a control point we
know about.  For instance, 'jmp (r)' now becomes: 
push r 
 
call predicate to check whether r is a control point
jmp (r) 
 
 
There are two systems-level issues that should be mentioned. Multiple 
LFDs in a single address space might correspond to multiple programs. 
Then: 
 
-  As with address spaces, it is necessary to support communication between 
     LFDs. Local procedure calls don't work because we need to preserve the 
     sanctity of the registers 'tmp' and 'ctmp' registers. So, we need to 
     support some sort of an inter-LFD
     call; it will save and restore the 'tmp' and 'ctmp' registers as 
     necessary. 
-  Programs running in an LFD will likely require OS/kernel services. 
     However, we cannot allow direct access to the kernel, because one LFD
     could affect resources 'owned' by the entire address-space. One solution
     is to modify kernel services to support individual LFDs and their 
     associated
     resources. This would involve having  multiple sets of resources per 
     address 
     space, changing the kernel's resource granularity level, and 
     adding kernel operations. The kernel 
     could examine the program counter to determine which LFD is the source 
     of a request. A second solution is a mechanism (known as a veneer
     or stub) that allows inter-LFD communication. This effectively places
     wrappers around kernel operations to ensure safety. 
There are some problems with using SFI. Clearly, we would like the SFI
transformation process to preserve some properties of the original
program. The problem is in defining precisely what those properties
are, since the transformed program obviously doesn't preserve
all properties.  For instance, the transformed program will
not have the same number of instructions as the original.  Library
routines can also be used to circumvent SFI. In particular, we must
prevent programs from invoking library routines that have not been
SFI'd. Finally, denial of service attacks remain possible. For
example, one LFD could acquire a shared lock and then never relinquish
it.