Multiple Inheritance

The multiple-inheritance dispatch problem

Java, C++, and other OO languages introduce the challenge of multiple inheritance, in which a class or interface can extend multiple interfaces or classes. In Java, multiple inheritance is restricted to implementing or extending multiple interfaces; C++ has a more general (and more confusing) mechanism. Multiple inheritance poses an implementation challenge; the simple dispatch vector approach we have already seen does not work, because method and field indices can collide.

For example, consider the following Java interface hierarchy:

class I1 { void a(); }
class I2 extends I1 { void b(); }
class I3 extends I1 { void c(); }
class I4 extends I2, I3 { }

With the simple dispatch vector layout approach, method a is located at index 0 in its dispatch vector, and methods b and c are located at index 1 in their respective dispatch vectors. But then the indices of methods b and c collide in the dispatch vector of interface I4! There is no no way to have an object implement I4 yet satisfy both the I2 and I3 interfaces.

Hashing

A straightforward solution to the problem is to give up on simple indices, and instead using a hash table to look up methods in contexts where collisions are possible. This is the approach taken in Java for interface methods; if the type of the receiver object is an interface, the method is looked up by hashing its name. Note that since Java only supports single class inheritance, a regular dispatch vector is used when invoking a method through a receiver whose type is a class. Hashing is also an attractive technique for dynamically typed languages where the caller cannot be certain which methods the object supports.

For example, suppose that such a call is done to a Java method p.setX(42.0), where p has an interface type. The canonical name of the method will be an encoding of “setX(float)”. Let h_setX be the precomputed hash code of this string (there is no need to wait until run time to compute the hash code.) This hash code can then be taken modulo the number of entries in the object's dispatch vector to locate the appropriate method code pointer. We may want to allow the size of object's dispatch vectors to vary, so the dispatch vector can also record the number of entries it contains. Even faster is to use $2^{n}$ buckets, and store the bit mask $2^{n} - 1$ to allow a rapid modulo calculation. (Since we can afford to use a high-quality hash function, there is no problem with the number of buckets being a power of two.)

Of course, in general we cannot avoid collisions in the hash table. If, unluckily, two methods hash to the same hash table index, a way is needed to resolve the collision. Rather than resolve collisions with conventional hash table chaining or rehashing, a better trick is to instead have the bucket point to a collision resolution trampoline which figures out which method was meant to be invoked. In order for the trampoline to make the correct decision, however, it needs to receive the identity of the method to be invoked. For example, the hash code of the method can be passed as an additional argument.

Assuming the bitmask is stored at the first entry in the dispatch vector, we obtain a dispatch code sequence like the following:

mov t_DV, [p]
mov t₁, h_setX
mov rdi, p          // implicit receiver argument
mov rsi, t₁         // implicit method identity argument
and t₁, [t_DV]       // zero out high bits of hash code
call [t_DV + t₁ + 8]

Comparing this dispatch sequence to the original dispatch vector sequence, we can see that 3 instructions have ballooned to 6, and the original 1 memory load has increased to 3. However, note that the two loads from the dispatch vector are likely to be fast because (shared) dispatch vectors will tend to be in cache. Loading the dispatch vector pointer from the object is more likely to be a cache miss, although one that probably must be taken anyway in order to access the object. The loads from the dispatch vector are also reading immutable information that is relatively easy to optimize with common subexpression elimination.

For dynamically typed languages, all method code must be ready to do collision resolution, since there is always the possibility of collision with a method that is not supported by the receiver object. Invoking unsuppported methods could also cause the dispatch code to use entries in the hash table that do not correspond to any supported method; these entries can be initialized to point to code that raises an exception.

Inline caching

A classic way to speed up slow computations is caching popular results. The original Smalltalk implementation introduced inline caching as a way to accelerate otherwise slow method dispatch. Empirically, it was observed that about 90% of the method calls from a given call site are dispatched to the same code as the previous call from that location. This fact justifies caching the dispatch result per call site.

Assume that we have some slow method dispatch technique—call it slowDispatch(o, m), where o is the receiver object and m is the method identifier, and that at a particular location in the code—call it line 436—there is a call o.m() that we want to accelerate. We allocate two memory cells to help accelerate the line 436 call: one stores the unique identifier of the class of the previously used receiver,jand the other stores a pointer to the method code that was last used. The method dispatch code then starts by checking whether the object's class matches the stored identifier; if so, it uses the method code from the cache. If not, it calls the slowDispatch function and uses its results to update the cache.

The resulting code looks like the following:

L436:    mov rsi, [rdi]
         cmp rsi, [id436]
         jne miss436
         call [code436]
done436: 
         ...
miss436:
         call slowDispatch
         mov [id436], rdx     // extra return value
         mov [code436], r9    // second extra return value
         jmp done436
.data
id436:   dq 0
code436: dq 0

Storing the cache as global variables is not necessarily thread-safe. For multithreaded execution, one possibility is to place the cache in thread-local storage (which in effect consumes another register) or to pack the class id and method code into a single word so that both can be accessed atomically, depending on the target architecture.

Inline caching works well for many call sites, but a fraction of call sites are “polymorphic”, generating calls to multiple classes. An extension of inline caching is polymorphic inline caching, which maintains a small number of cache entries (2–4) instead of just one.

Sparse dispatch vectors

Hash tables are naturally sparse: they are arrays only a fraction of whose elements are occupied. Rather than choosing indices randomly by hashing, an alternative is to have sparse dispatch vectors in which indices are deliberately chosen not to collide.

Consider the following example class hierarchy in which the class Graphic inherits from both Shape and Color (argument and return types are omitted):

class Shape {
    bounds()
    int x, y, z
}
class Point extends Shape {
    getX()
    getY()
}
class Color {
    rgb()
    hsv()
    int r, g, b
}
class Graphic extends Shape, Color {
    draw()
    bounds() { ... }
}

Since Shape and Color have a common descendant and no shared methods, their method indices must be disjoint. We could achieve this goal by assigning the method bounds index 0 and giving the two methods of Color indices 1 and 2. The result will be that one entry in the dispatch vector of Color is unused, but other classes can have densely packed dispatch vectors:

A trivial way to avoid collisions between indices would be to assign every method a distinct index, but the dispatch vectors would become unnecessarily sparse. How do we automatically achieve a mostly packed layout like the one above?

The insight is that if two methods are present in the same class, they cannot be assigned the same method index. We then say that these method conflict, which we can represent as an interference graph. The problem of assigning nonconflicting method indices then becomes simply a problem of graph coloring. As with register allocation, although graph coloring is an NP-complete problem, it can be solved reasonably efficiently and well using heuristics. In this case, the problem is actually easier than graph coloring because there is no bound on the number of colors used. Our example code above results in the interference graph shown below, with assigned method indices in blue:

It is also not necessary to assign all methods small integer indices in order to achieve dense packing! As long as the method indices for each class are within a fairly tight range of indices, the base address of the dispatch vector can be offset to avoid sparsity with larger indices.

The main downside of this technique is that to construct the interference graph, the compiler or run-time system needs to see the whole program. Therefore method indices cannot be assigned during separate compilation. Instead the run-time system must generate method indices (and possibly dispatch code) when the program is loaded, and even regenerate indices and code if new code is dynamically loaded into the program.

Decision trees

Another approach to dispatch that is both general and potentially efficient is to construct decision trees. It is a fundamentally different approach from the dispatch mechanisms we have been discussing so far, which directly look up the address of method code to jump to. A call of this this form is a form of indirect jump—a jump to a computed address—which stalls the processor pipeline unless the hardware can predict where the jump is going. To do this prediction, modern processors use a branch target buffer (BTB), which records the target address of indirect jumps. Since the BTB stores a whole target address, an entry in the BTB is significantly more expensive than the hardware tables used to predict conditional branches.

The idea of dispatching via decision trees is to handle the dispatch entirely with conditional branches. Since there can be more than two targets for a given method call, in general the compiler needs to generate a decision tree. A simple form of decision tree relies on the first word of an object storing a class identifier—perhaps a small integer. The decision tree can then branch on the class identifier to find the right method code.

A numbered class hierarchy

For example, consider the class hierarchy shown above, where each of the classes has been assigned an identifier rather arbitrarily based on a traversal of the class hierarchy (some coherence in the numbering of classes will help keep the decision tree small). Suppose that RGBColor inherits its implementation of a method from Color and that Color and Square inherit it from Shape. Then the decision tree for dispatching might look as shown below. In this case, the indirect jump is replaced by two conditional branches.

Decision tree for dispatching in the example class hierarchy

Notice that this approach is quite general—it can handle complex hierarchies. It is also probably the best approach for more complex OO features like multimethods, where dispatch tables tend to blow up in size. However, dispatch code depends on knowing the entire class hierarchy, so this approach is more challenging to use with separate compilation. If new classes are dynamically loaded into a running program, it may be necessary to regenerate class indices and dispatch code.

Using multiple dispatch vectors for separate compilation

To deal with method index conflicts among superclasses, C++ may use multiple dispatch tables per object, and multiple references to the object. Which dispatch is to be used depends on which references to the object is used.

Different C++ implementations use different object layouts, but here is one possibility. Consider the following three classes:


class Shape {
  bounds()
  x,y,z: num
}
 class Color {
  rgb()
  hsv()
  r,g,b: num
}
 class Graphics extends Shape, Color {
  draw()
  location: int
}

For separate compilation the method indices for Shape and Color have to both be assigned independently. So both start methods in their dispatch table at zero:

We can merge both these layouts into a single object, but we need separate dispatch tables because bounds() and rgb() use the same method index:

There are two distinct “views” of the object, one as either a Graphic or a Shape, and one as a Color. To switch between these views, some computation is required. For example, we might subsumption to view a Graphic as a Color:


Graphic = new Graphic(); Color c = g;

We might expect that the assignment c=g involves no computation, but in fact it is necessary to add 40 to the address of g.

The result is fast dispatch in the usual case, but high per-object overhead, since we have two dispatch table pointers per object rather than just one. Supporting pointers to the interior of objects makes garbage collectors more complex and probably a little slower.

It's possible to put the methods of Color also into the Graphic dispath table, but since different class code expects different views of the receiver object, a trampoline is needed to bump the receiver pointer to the correct view.

This layout merges the dispatch tables for Graphic and Shape. In general, one can merge a class with that of one of its superclasses or implemented interfaces. There are more complex schemes for merging multiple dispatch tables more effectively, such as bidirectional dispatch tables. With a bidirectional layout, a class hierarchy that uses multiple inheritance only to allow a class to extend one other class and to implement one interface requires only a single dispatch table. This is possible by having the dispatch table grow in opposite directions.

Fields and multiple inheritance

Even the offsets to fields can conflict with with multiple inheritance. For eaxmple, consider this inheritance hierarchy:

The code of both Shape and Color might need access to the fields of Object. But in a Graphic object, those fields can't be located at the same offset from the Shape and Color fields as in the Shape and Color object layouts.Actually, C++ offers a version of “non-virtual” inheritance in which the fields are located at the same offset, but at the cost of duplicating the Object fields, which has strange semantics.

One way to solve the problem is to introduce internal pointers within the object between different views of the same object. This gives fast access to the fields of the current class view, and imposes no space or time overhead when inheritance is not being used. However, it has high per-object overhead even when single inheritance is being used. And internal pointers are a demanding feature that probably make the garbage collector slower.

A probably better idea is to store the offsets to fields in the dispatch table. For example, each field can be assigned a dispatch table index that is consulted to find the field. Dispatch table indices can be assigned using graph coloring or by using multiple dispatch tables. For example, the following figure shows how the object layout might look assuming that dispatch table indices are assigned using graph coloring, so that there is a single dispatch table. As the figure suggests, we don't actually need a distinct offset per field, since fields cannot be overridden by subclasses. It is enough to have an offset recorded per class or superclass of the object; all of the fields of each such suboject can be found relative to that offset.

The sequence to access a field is more expensive than the usual indexed load. Before multiple inheritance, an access like o.f could be implemented as a memory operand [t_o + k_f], where k_f is a compile-time constant offset for field f, and temporary t_o holds the address of the object. With the multiple-inheritance object layout, accesses are more complex:


mov t_DV, [t_o]
mov t_off, [t_DV + m_f]
mov t, [t_o + t_off + k_f]

Here, the offset m_f is the location in the dispatch vector of the offset to a given subobject; the offset k_f is the offset within the subobject of the particular field. Since values t_DV and t_off are constants, CSE can help avoid fetching them more than once.

This approach has much lower space overhead than using internal pointers, and access to fields from other class views are faster. However, access to fields of the current class view is slower, and there is a performance penalty even when inheritance is not being used.

Avoiding dispatch

We've talked about layouts and algorithms to speed up method dispatch. But the fastest way to do method dispatch is not to do it at all. If we can determine that there is only possible implementation of the method that is being invoked, the generated code can simply jump directly to the method code. Or the method code can be inlined at the call site, possibly enabling other optimizations. Similarly, if we know that a given field can only be at a particular fixed offset, more efficient access code can be generated.

Given a call o.m() where o has type $T$ , when can we avoid method dispatch? One simple case is the following. If we know from inspection of the class hierarchy that there is one class $C$ implementing $T$ implements m—that is, all subtypes of $T$ inherit from $C$ —the method code $C$ .m() is the only possible code that dispatch could reach. This optimization does require knowledge of the whole class hierarchy, so it is not compatible with separate compilation. It is also not compatible with dynamic linking, which might cause new implementations of $T$ to show up at run time. To support dynamic linking, it is necessary to have a run-time system, such as the HotSpot JVM, that can invalidate and regenerate code when the assumptions on which the original code is based are violated by newly loaded code.

A more sophisticated way to avoid dispatch is by acquiring more precise information about the class of an object than is present in the declared type. A variable has exact type $C$ if it is an instance exactly of $C$ and not of any subclass of $C$ . If we know the exact type of an expression, then any method call using the expression can be resolved at compile time, avoiding dispatch. For example, consider this code:

x: C = new C()
x.m()

Because of the constructor call, we know x has exact type C, sometimes written as x: C!.

An exact type analysis finds exact types for expressions in the program, by propagating information from new expressions to possible uses. This can be done by building directly on an inclusion-based pointer analysis, since each new allocation is a distinct “object” in pointer analysis. If the set of objects that a given pointer can point to all have the same class, the exact type of the pointer is known. Even if not, we may be able to determine that there is only one implementation for a given method. The analysis probably needs to be interprocedural to be effective.

Specialization

Inheritance is usually implemented by having the code for an inherited method shared across all classes that inherit it. Sometimes it is better to specialize the method code for particular inheriting classes, however. For example, consider two classes A and B:

class A {
  f() { ... g() ... }
  g() { A.g code }
}
class B {
    g() { B.g code }
}

Ordinarily we'd share the code for A.f with B. However, consider what happens if we instead specialize the method f to both A and B. Assuming there are no other implementations of g() in other subclasses, the version of A.f specialized to A knows that the exact type of this is A!. Therefore the call to g() must go to the “A.g code”. Similarly the version specialized to B can call the “B.g code” directly. The code for g can even be inlined inside f, possibly enabling further optimizations.

This optimization is a space–time tradeoff. If f is called infrequently, we don't want to waste memory and cache space on storing multiple versions of it. If f is frequently used and its code is not large, then it makes sense to specialize it. It is a good idea to couple this optimization to some method for determining which methods are “hot”—either a program analysis or, even better, run-time profiling.

Multimethods

In most object-oriented languages, method code are chosen according to the class of the receiver object. The receiver object is an argument to the method; why not choose method code based on the other arguments as well? This is the idea behind multimethods, also known as generic functions. Multimethods are a feature in Common Lisp (CLOS), MultiJava, Dylan, Cecil, and other languages. CLOS, in particular, is quite widely used in industry.

Multimethods are helpful for so-called “binary” methods, in which there is an explicit argument with the same type as the class. For example, suppose we want to implement a class Shape with an a method intersect(s: Shape): bool, where Shape has various subclasses: Box, Circle, Triangle, and so on. With multimethods, we can think of this method as a generic function of two arguments: intersect(Shape receiver, Shape s): bool.

We can imagine wanting to implement different algorithms for different combinations of shapes. For example, when intersecting two boxes, we can use this test: $b_{1} . x_{0} \leq b_{2} . x_{1} \land b_{2} . x_{0} \leq b_{1} . x_{1} \land b_{1} . y_{0} \leq b_{2} . y_{1} \land b_{2} . y_{0} \leq b_{1} . y_{1}$ On the other hand, to test whether two circles intersect, we test whether their centers are closer than the sum of their radii. And so on. With multimethods, we can add new shapes and new intersection algorithms in a modular way, e.g.:

intersect(b1: Box, b2: Box) : bool {
    return b1.x0 <= b2.x1 && b2.x0 <= b1.x1
	&& b1.y0 <= b2.y1 && b2.y0 <= b1.y1
}

Another place where multimethods are handy is in fact for compilers. Recall that visitors are an answer to the problem of how to write the code for a compiler pass in a modular way. With multimethods, we can define a function visit(Node, Pass) that specifies the boilerplate traversal behavior in the base implementation. We then override it in a modular way for particular (Node, Pass) pairs where there is something interesting going on. In fact, MultiJava has been used to build compilers in this way.

A related idea to multimethods is predicate dispatch, in which methods are chosen based on arbitrary properties of objects, rather than just their run-time class. The two ideas can be naturally combined to select on arbitrary properties of multiple arguments. For example, we might override intersect to give code that just works on squares:


intersect(b1: Box, b2: Box) : boolean
    where b1.width == b1.height && b2.width == b2.height
{...}

If the method is called on two boxes that don't satisfy the where clause, the ordinary intersect(Box, Box) implementation is called instead.

Predicate dispatch allows dispatching to acquire the power of pattern matching, though arguably in a more modular way, since the code for handling the different pattern cases can be implemented separately.

One problem with multimethods is implementing them efficiently. If there are $N$ classes and $k$ arguments, we can implement multimethods with a dispatch table containing $N^{k}$ entries. For $k > 1$ , this usually doesn't work very well, though there has been some work on compressing them.

The usual approach to implementing multimethods, though, is to implement the generic function as a decision tree (or DAG). Building the decision tree requires knowing about all the possible implementations. A decision tree also enables testing on conditions other than the run-time class, so it works for predicate dispatch too. For reasonable programs, the overhead of a decision tree is space and time is reasonable, no worse than implementing the dispatch in other ways.

CS 4120 Spring 2022 Introduction to Compilers