Multiple Inheritance
The multiple-inheritance dispatch problem
Java, C++, and other OO languages introduce the challenge of multiple inheritance, in which a class or interface can extend multiple interfaces or classes. In Java, multiple inheritance is restricted to implementing or extending multiple interfaces; C++ has a more general (and more confusing) mechanism. Multiple inheritance poses an implementation challenge; the simple dispatch vector approach we have already seen does not work, because method and field indices can collide.
For example, consider the following Java interface hierarchy:
class I1 { void a(); } class I2 extends I1 { void b(); } class I3 extends I1 { void c(); } class I4 extends I2, I3 { }
With the simple dispatch vector layout approach, method a
is located at
index 0 in its dispatch vector, and methods b
and c
are located
at index 1 in their respective dispatch vectors. But then the indices of methods b
and c
collide in the dispatch vector of interface I4
! There is no
no way to have an object implement I4
yet satisfy both the I2
and
I3
interfaces.
Hashing
A straightforward solution to the problem is to give up on simple indices, and instead using a hash table to look up methods in contexts where collisions are possible. This is the approach taken in Java for interface methods; if the type of the receiver object is an interface, the method is looked up by hashing its name. Note that since Java only supports single class inheritance, a regular dispatch vector is used when invoking a method through a receiver whose type is a class. Hashing is also an attractive technique for dynamically typed languages where the caller cannot be certain which methods the object supports.
For example, suppose that such a call is done to a Java method
p.setX(42.0)
, where p
has an interface type. The
canonical name of the method will be an encoding of “setX(float)”. Let
hsetX
be the precomputed hash code of this string (there is no need to wait
until run time to compute the hash code.) This hash code can then be taken modulo the
number of entries in the object's dispatch vector to locate the appropriate method
code pointer. We may want to allow the size of object's dispatch vectors to vary,
so the dispatch vector can also record the number of entries it contains. Even
faster is to use
Of course, in general we cannot avoid collisions in the hash table. If, unluckily, two methods hash to the same hash table index, a way is needed to resolve the collision. Rather than resolve collisions with conventional hash table chaining or rehashing, a better trick is to instead have the bucket point to a collision resolution trampoline which figures out which method was meant to be invoked. In order for the trampoline to make the correct decision, however, it needs to receive the identity of the method to be invoked. For example, the hash code of the method can be passed as an additional argument.
Assuming the bitmask is stored at the first entry in the dispatch vector, we obtain a dispatch code sequence like the following:
mov tDV, [p] mov t1, hsetX mov rdi, p // implicit receiver argument mov rsi, t1 // implicit method identity argument and t1, [tDV] // zero out high bits of hash code call [tDV + t1 + 8]
Comparing this dispatch sequence to the original dispatch vector sequence, we can see that 3 instructions have ballooned to 6, and the original 1 memory load has increased to 3. However, note that the two loads from the dispatch vector are likely to be fast because (shared) dispatch vectors will tend to be in cache. Loading the dispatch vector pointer from the object is more likely to be a cache miss, although one that probably must be taken anyway in order to access the object. The loads from the dispatch vector are also reading immutable information that is relatively easy to optimize with common subexpression elimination.
For dynamically typed languages, all method code must be ready to do collision resolution, since there is always the possibility of collision with a method that is not supported by the receiver object. Invoking unsuppported methods could also cause the dispatch code to use entries in the hash table that do not correspond to any supported method; these entries can be initialized to point to code that raises an exception.
Inline caching
A classic way to speed up slow computations is caching popular results. The original Smalltalk implementation introduced inline caching as a way to accelerate otherwise slow method dispatch. Empirically, it was observed that about 90% of the method calls from a given call site are dispatched to the same code as the previous call from that location. This fact justifies caching the dispatch result per call site.
Assume that we have some slow method dispatch technique—call it slowDispatch(o, m)
,
where o
is the receiver object and m
is the method identifier, and
that at a particular location in the code—call it line 436—there is a call o.m()
that we want to accelerate. We allocate two memory cells to help accelerate the line 436 call:
one stores the unique identifier of the class of the previously used receiver,jand the other stores a pointer to the
method code that was last used. The method dispatch code then starts by checking whether the
object's class matches the stored identifier; if so, it uses the method code from the cache. If
not, it calls the slowDispatch
function and uses its results to update the cache.
The resulting code looks like the following:
L436: mov rsi, [rdi] cmp rsi, [id436] jne miss436 call [code436] done436: ... miss436: call slowDispatch mov [id436], rdx // extra return value mov [code436], r9 // second extra return value jmp done436 .data id436: dq 0 code436: dq 0
Storing the cache as global variables is not necessarily thread-safe. For multithreaded execution, one possibility is to place the cache in thread-local storage (which in effect consumes another register) or to pack the class id and method code into a single word so that both can be accessed atomically, depending on the target architecture.
Inline caching works well for many call sites, but a fraction of call sites are “polymorphic”, generating calls to multiple classes. An extension of inline caching is polymorphic inline caching, which maintains a small number of cache entries (2–4) instead of just one.
Sparse dispatch vectors
Hash tables are naturally sparse: they are arrays only a fraction of whose elements are occupied. Rather than choosing indices randomly by hashing, an alternative is to have sparse dispatch vectors in which indices are deliberately chosen not to collide.
Consider the following example class hierarchy in which the class Graphic
inherits from both
Shape
and Color
(argument and return types are omitted):
class Shape { bounds() int x, y, z } class Point extends Shape { getX() getY() } class Color { rgb() hsv() int r, g, b } class Graphic extends Shape, Color { draw() bounds() { ... } }
Since Shape
and Color
have a common descendant and no shared methods,
their method indices must be disjoint. We could achieve this goal by assigning the method
bounds
index 0 and giving the two methods of Color
indices 1 and 2.
The result will be that one entry in the dispatch vector of Color
is unused,
but other classes can have densely packed dispatch vectors:
A trivial way to avoid collisions between indices would be to assign every method a distinct index, but the dispatch vectors would become unnecessarily sparse. How do we automatically achieve a mostly packed layout like the one above?
The insight is that if two methods are present in the same class, they cannot be assigned the same method index. We then say that these method conflict, which we can represent as an interference graph. The problem of assigning nonconflicting method indices then becomes simply a problem of graph coloring. As with register allocation, although graph coloring is an NP-complete problem, it can be solved reasonably efficiently and well using heuristics. In this case, the problem is actually easier than graph coloring because there is no bound on the number of colors used. Our example code above results in the interference graph shown below, with assigned method indices in blue:
It is also not necessary to assign all methods small integer indices in order to achieve dense packing! As long as the method indices for each class are within a fairly tight range of indices, the base address of the dispatch vector can be offset to avoid sparsity with larger indices.
The main downside of this technique is that to construct the interference graph, the compiler or run-time system needs to see the whole program. Therefore method indices cannot be assigned during separate compilation. Instead the run-time system must generate method indices (and possibly dispatch code) when the program is loaded, and even regenerate indices and code if new code is dynamically loaded into the program.
Decision trees
Another approach to dispatch that is both general and potentially efficient is to construct decision trees. It is a fundamentally different approach from the dispatch mechanisms we have been discussing so far, which directly look up the address of method code to jump to. A call of this this form is a form of indirect jump—a jump to a computed address—which stalls the processor pipeline unless the hardware can predict where the jump is going. To do this prediction, modern processors use a branch target buffer (BTB), which records the target address of indirect jumps. Since the BTB stores a whole target address, an entry in the BTB is significantly more expensive than the hardware tables used to predict conditional branches.
The idea of dispatching via decision trees is to handle the dispatch entirely with conditional branches. Since there can be more than two targets for a given method call, in general the compiler needs to generate a decision tree. A simple form of decision tree relies on the first word of an object storing a class identifier—perhaps a small integer. The decision tree can then branch on the class identifier to find the right method code.

A numbered class hierarchy
For example, consider the class hierarchy shown above, where each of the classes has been
assigned an identifier rather arbitrarily based on a traversal of
the class hierarchy (some coherence in the numbering of classes
will help keep the decision tree small). Suppose that RGBColor
inherits its implementation of a method from Color
and
that Color
and Square
inherit it from
Shape
. Then the decision tree for dispatching might look as shown below.
In this case, the indirect
jump is replaced by two conditional branches.

Decision tree for dispatching in the example class hierarchy
Notice that this approach is quite general—it can handle complex hierarchies. It is also probably the best approach for more complex OO features like multimethods, where dispatch tables tend to blow up in size. However, dispatch code depends on knowing the entire class hierarchy, so this approach is more challenging to use with separate compilation. If new classes are dynamically loaded into a running program, it may be necessary to regenerate class indices and dispatch code.
Using multiple dispatch vectors for separate compilation
To deal with method index conflicts among superclasses, C++ may use multiple dispatch tables per object, and multiple references to the object. Which dispatch is to be used depends on which references to the object is used.
Different C++ implementations use different object layouts, but here is one possibility. Consider the following three classes:
class Shape { bounds() x,y,z: num } class Color { rgb() hsv() r,g,b: num } class Graphics extends Shape, Color { draw() location: int }
For separate compilation the method indices for Shape
and Color
have to both be assigned independently. So
both start methods in their dispatch table at zero:

We can merge both these layouts into a single object, but we need
separate dispatch tables because bounds()
and
rgb()
use the same method index:

There are two distinct “views” of the object, one as either a
Graphic
or a Shape
, and one as a Color
. To switch between these
views, some computation is required. For example, we might subsumption
to view a Graphic
as a Color
:
Graphic = new Graphic(); Color c = g;
We might expect that the assignment c=g
involves no
computation, but in fact it is necessary to add 40 to the address
of g
.
The result is fast dispatch in the usual case, but high per-object overhead, since we have two dispatch table pointers per object rather than just one. Supporting pointers to the interior of objects makes garbage collectors more complex and probably a little slower.
It's possible to put the methods of Color
also into
the Graphic
dispath table, but since different class
code expects different views of the receiver object, a
trampoline is needed to bump the receiver pointer
to the correct view.
This layout merges the dispatch tables for Graphic
and
Shape
. In general, one can merge a class with that
of one of its superclasses or implemented interfaces. There are
more complex schemes for merging multiple dispatch tables more
effectively, such as bidirectional dispatch tables. With a bidirectional
layout, a class hierarchy that uses multiple inheritance only to
allow a class to extend one other class and to implement one interface
requires only a single dispatch table. This is possible by having
the dispatch table grow in opposite directions.
Fields and multiple inheritance
Even the offsets to fields can conflict with with multiple inheritance. For eaxmple, consider this inheritance hierarchy:

The code of both Shape
and Color
might
need access to the fields of Object
. But in a
Graphic
object, those fields can't be located at the
same offset from the Shape
and Color
fields as in the Shape
and Color
object
layouts.Actually, C++ offers a version of
“non-virtual” inheritance in which the fields are
located at the same offset, but at the cost of duplicating the
Object
fields, which has strange semantics.
One way to solve the problem is to introduce internal pointers within the object between different views of the same object. This gives fast access to the fields of the current class view, and imposes no space or time overhead when inheritance is not being used. However, it has high per-object overhead even when single inheritance is being used. And internal pointers are a demanding feature that probably make the garbage collector slower.
A probably better idea is to store the offsets to fields in the dispatch table. For example, each field can be assigned a dispatch table index that is consulted to find the field. Dispatch table indices can be assigned using graph coloring or by using multiple dispatch tables. For example, the following figure shows how the object layout might look assuming that dispatch table indices are assigned using graph coloring, so that there is a single dispatch table. As the figure suggests, we don't actually need a distinct offset per field, since fields cannot be overridden by subclasses. It is enough to have an offset recorded per class or superclass of the object; all of the fields of each such suboject can be found relative to that offset.

The sequence to access a field is more expensive than the usual
indexed load. Before multiple inheritance, an access like
o.f
could be implemented as a memory operand [to + kf]
,
where kf
is a compile-time
constant offset for field f
, and temporary to
holds the address of the object.
With the multiple-inheritance object layout, accesses are more
complex:
mov tDV, [to] mov toff, [tDV + mf] mov t, [to + toff + kf]
Here, the offset mf
is the location in the dispatch vector of
the offset to a given subobject; the offset kf
is the offset within
the subobject of the particular field. Since values tDV
and toff
are constants, CSE can help avoid fetching them more than once.
This approach has much lower space overhead than using internal pointers, and access to fields from other class views are faster. However, access to fields of the current class view is slower, and there is a performance penalty even when inheritance is not being used.
Avoiding dispatch
We've talked about layouts and algorithms to speed up method dispatch. But the fastest way to do method dispatch is not to do it at all. If we can determine that there is only possible implementation of the method that is being invoked, the generated code can simply jump directly to the method code. Or the method code can be inlined at the call site, possibly enabling other optimizations. Similarly, if we know that a given field can only be at a particular fixed offset, more efficient access code can be generated.
Given a call o.m()
where o
has type m
—that is,
all subtypes of .m()
is the only possible code that dispatch could
reach. This optimization does require knowledge of the whole class
hierarchy, so it is not compatible with separate compilation. It
is also not compatible with dynamic linking, which might cause new
implementations of
A more sophisticated way to avoid dispatch is by acquiring more
precise information about the class of an object than is present
in the declared type. A variable has exact type
x: C = new C()
x.m()
Because of the constructor call, we know x
has exact
type C
, sometimes written as x: C!
.
An exact type analysis finds exact types for
expressions in the program, by propagating information from
new
expressions to possible uses. This can be done by
building directly on an inclusion-based pointer analysis, since
each new
allocation is a distinct “object” in pointer
analysis. If the set of objects that a given pointer can point to
all have the same class, the exact type of the pointer is known.
Even if not, we may be able to determine that there is only one
implementation for a given method. The analysis probably needs
to be interprocedural to be effective.
Specialization
Inheritance is usually implemented by having the code for an inherited
method shared across all classes that inherit it. Sometimes it is
better to specialize the method code for particular
inheriting classes, however. For example, consider two classes
A
and B
:
class A { f() { ... g() ... } g() { A.g code } } class B { g() { B.g code } }
Ordinarily we'd share the code for A.f
with B
.
However, consider what happens if we instead specialize the method
f
to both A
and
B
. Assuming there are no other implementations of g()
in other
subclasses, the version of A.f
specialized to
A
knows that the exact type of this
is
A!
. Therefore the call to g()
must go
to the “A.g
code”. Similarly the version specialized
to B
can call the “B.g
code” directly.
The code for g
can even be inlined inside f
,
possibly enabling further optimizations.
This optimization is a space–time tradeoff. If f
is
called infrequently, we don't want to waste memory and cache space
on storing multiple versions of it. If f
is frequently
used and its code is not large, then it makes sense to specialize
it. It is a good idea to couple this optimization to some method
for determining which methods are “hot”—either a program analysis
or, even better, run-time profiling.
Multimethods
In most object-oriented languages, method code are chosen according to the class of the receiver object. The receiver object is an argument to the method; why not choose method code based on the other arguments as well? This is the idea behind multimethods, also known as generic functions. Multimethods are a feature in Common Lisp (CLOS), MultiJava, Dylan, Cecil, and other languages. CLOS, in particular, is quite widely used in industry.
Multimethods are helpful for so-called “binary” methods, in which
there is an explicit argument with the same type as the class. For
example, suppose we want to implement a class Shape
with an a method
intersect(s: Shape): bool
, where Shape
has various subclasses:
Box
, Circle
, Triangle
, and so on. With multimethods, we can
think of this method as a generic function of two arguments:
intersect(Shape receiver, Shape s): bool
.
We can imagine wanting to implement different algorithms for different
combinations of shapes. For example, when intersecting two boxes,
we can use this test:
intersect(b1: Box, b2: Box) : bool {
return b1.x0 <= b2.x1 && b2.x0 <= b1.x1
&& b1.y0 <= b2.y1 && b2.y0 <= b1.y1
}
Another place where multimethods are handy is in fact for compilers.
Recall that visitors are an answer to the problem of how to write
the code for a compiler pass in a modular way. With multimethods,
we can define a function visit(Node, Pass)
that specifies
the boilerplate traversal behavior in the base implementation. We
then override it in a modular way for particular (Node
,
Pass
) pairs where there is something interesting going
on. In fact, MultiJava has been used to build compilers in this
way.
A related idea to multimethods is predicate
dispatch, in which methods are chosen based
on arbitrary properties of objects, rather than just their run-time
class. The two ideas can be naturally combined to select on arbitrary
properties of multiple arguments. For example, we might override
intersect
to give code that just works on squares:
intersect(b1: Box, b2: Box) : boolean where b1.width == b1.height && b2.width == b2.height {...}
If the method is called on two boxes that don't satisfy the
where
clause, the ordinary intersect(Box,
Box)
implementation is called instead.
Predicate dispatch allows dispatching to acquire the power of pattern matching, though arguably in a more modular way, since the code for handling the different pattern cases can be implemented separately.
One problem with multimethods is implementing them efficiently. If
there are
The usual approach to implementing multimethods, though, is to implement the generic function as a decision tree (or DAG). Building the decision tree requires knowing about all the possible implementations. A decision tree also enables testing on conditions other than the run-time class, so it works for predicate dispatch too. For reasonable programs, the overhead of a decision tree is space and time is reasonable, no worse than implementing the dispatch in other ways.