Previous Up Next

4  Tagged Unions

In addition to struct, enum, and union, Cyclone has tunion (for ``tagged union'') and xtunion (for ``extensible tagged union'') as ways to construct new aggregate types. Like a union type, each tunion and xtunion has a number of variants (or members). Unlike with union, an object of a tunion or xtunion type is exactly one variant, we can detect (or discriminate) that variant at run-time, and the language prevents using an object as though it had a different variant.

The difference between tunion and xtunion is that tunion is closed---a definition lists all possible variants. It is like the algebraic datatypes in ML. With xtunion, separately compiled files can add variants, so no code can be sure that it knows all the variants. There is a rough analogy with not knowing all the subclasses of a class in an object-oriented language.

For sake of specificity, we first explain how to create and use tunion types. We then explain xtunion by way of contrast with tunion. Because the only way to read parts of tunion and xtunion types is pattern-matching, it is hard to understand tunion without pattern-matching, but for sake of motivation and completeness, some of the examples in the explanation of pattern-matching use tunion! To resolve this circular dependency, we will informally explain pattern-matching as we use it here and we stick to its simplest uses.

4.1  tunion



Basic Type Declarations and Subtyping
[Warning: For expository purposes, this section contains a white lie that is exposed in the later section called ``regions for tunion''.]

A tunion type declaration lists all of its variants. At its simplest, it looks just like an enum declaration. For example, we could say:
  tunion Color { Red, Green, Blue };
As with enum, the declaration creates a type (called tunion Color) and three constants Red, Green, and Blue. Unlike enum, these constants do not have type tunion Color. Instead, each variant has its own type, namely tunion Color.Red, tunion Color.Green, and tunion Color.Blue. Fortunately these are all subtypes of tunion Color and no explicit cast is necessary. So you can write, as expected:
  tunion Color c = Red;
In this simple example, we are splitting hairs, but we will soon find all these distinctions useful. Unlike enum, tunion variants may carry any fixed number of values, as in this example:
  tunion Shape {
    Point,
    Circle(float),
    Ellipse(float,float),
    Polygon(int,float),
  };
A Point has no accompanying information, a Circle has a radius, an Ellipse has two axis lengths, and a (regular) Polygon has a number of sides and a radius. (The value fields do not have names, so it is often better style to have a variant carry one value of a struct type, which of course has named members.) This example creates five types: tunion Shape, tunion Shape.Point, tunion Shape.Circle, tunion Shape.Ellipse, and tunion Shape.Polygon. Like in our previous example, tunion Shape.Point is a subtype of tunion Shape and Point is a constant of type tunion Shape.Point.

Variants that carry one or more values are treated differently. Circle becomes a constructor; given a float it produces an object of type tunion Shape.Circle, for example Circle(3.0). Similarly, Ellipse(0,0) has type tunion Shape.Ellipse (thanks to implicit casts from int to float for 0) and Polygon(7,4.0) has type tunion Shape.Polygon. The arguments to a constructor can be arbitrary expressions of the correct type, for example, Ellipse(rand(), sqrt(rand())).

The second difference is that value-carrying variant types (e.g., tunion Shape.Circle) are not subtypes of the tunion type (e.g., tunion Shape). Rather non-null pointers to the value-carrying variant types are (e.g., tunion Shape.Circle @`H is a subtype of tunion Shape). So the following are correct initializations that use implicit subtyping:
  tunion Shape s1 = Point;
  tunion Shape s2 = new Circle(3.0);
tunion types are particularly useful for building recursive structures. For example, a small language of arithmetic expressions might look like this:
  enum Unops { Negate, Invert};
  enum Binops { Add, Subtract, Multiply, Divide };
  tunion Exp {
    Int(int),
    Float(float),
    Unop(enum Unops, tunion Exp),
    Binop(enum Binops, tunion Exp, tunion Exp)
  };
A function returning an expression representing the multiplication of its parameter by two could like this:
  tunion Exp double_exp(tunion Exp e) {
    return new Binop(Multiply, e, new Int(2));
  }
Accessing tunion Variants
Given a value of a tunion type, such as tunion Shape, we do not know which variant it is.

For non-value variants, we can use a standard comparison. Continuing the example from above, ``s1 == Point'' would be true whereas ``s2 == Point'' would be false.

Analogous comparisons would not work for value-carrying variants because these variants are pointers. Rather than provide predicates (perhaps of the form isCircle(s1)), Cyclone requires pattern-matching. For example, here is how you could define isCircle:
  bool isCircle(tunion Shape s) {
    switch(s) {
    case &Circle(r): return true;
    default: return false;
    }
  }
When a switch statement's argument has a tunion type, the cases describe variants. One variant of tunion Shape is a pointer to a Circle, which carries one value. The corresponding pattern has & for the pointer, Circle for the constructor name, and one identifier for each value carried by Circle. The identifiers are binding occurrences (declarations, if you will), and the initial values are the values of the fields of the Circle at which s points. The scope is the extent of the case clause. Pattern-matching works for non-value variants too, but there is no & because they are not pointers.

Here is another example:

[The reader is asked to indulge compiler-writers who have forgotten basic geometry.]
  extern area_of_ellipse(float,float);
  extern area_of_poly(int,float);
  float area(tunion Shape s) {
    float ans;
    switch(s) {
    case Point:
      ans = 0;
      break;
    case &Circle(r):
      ans = 3.14*r*r;
      break;
    case &Ellipse(r1,r2):
      ans = area_of_ellipse(r1,r2);
      break;
    case &Polygon(sides,r):
      ans = area_of_poly(sides,r);
      break;
    }
    return ans;
  }
The cases are compared in order against s. The following are compile-time errors: We emphasize that Cyclone has much richer pattern-matching support than we have used here.

Implementation
Non-value variants are translated to distinct small integers. Because they are small, they cannot be confused with pointers to value-carrying variants. Value-carrying variants have a distinct integer tag field followed by fields for the values carried. Hence all values of a tunion type occupy one word, either with a small number or with a pointer.

Regions for tunion
We have seen that non-null pointers to value-carrying variants are subtypes of the tunion type. For example, tunion Shape.Circle @`H is a subtype of tunion Shape. Because tunion Shape.Circle @`H is a pointer into the heap, it would seem that all values of type tunion Shape are either non-value variants or pointers into the heap, but this is true only when tunion Shape is itself shorthand for tunion `H Shape.

In other words, tunion types are region-polymorphic over the region into which the value-carrying variants point. An explicit region annotation goes after tunion, just like an explicit region annotation goes after * or @. Here is an example using a stack region:
  tunion Shape.Circle c = Circle(3.0);
  tunion _ Shape s = &c;
We use _ because we did not give an explicit name to the stack region. In fact, we can leave out the _ because region annotations for tunion types are filled in using the same rules as for pointer types.

We can now correct the white lie from the ``basic type declarations and subtyping'' section. A declaration tunion Foo {...} creates a type constructor which given a region creates a type. For any region `r, tunion `r Foo is a subtype of tunion Foo.Bar @`r if tunion Foo.Bar carries values. If tunion Foo.Bar does not carry values, then it is a subtype of tunion `r Foo for all `r.

Polymorphism and tunion
A tunion declaration may be polymorphic over types and regions just like a struct definition (see the section on polymorphism). For example, here is a declaration for binary trees where the leaves can hold some BoxKind `a:
  tunion Tree<`a>  {
    Leaf(`a);
    Node(tunion Tree<`a>, tunion Tree<`a>);
  };
In the above example, the root may be in any region, but all children will be in the heap. This version allows the children to be in any region, but they must all be in the same region. (The root can still be in a different region.)
  tunion Tree<`a,`r::R>  {
    Leaf(`a);
    Node(tunion `r Tree<`a,`r>, tunion `r Tree<`a,`r>);
  };
Future

4.2  xtunion



We now explain how an xtunion type differs from a tunion type. The main difference is that later declarations may continue to add variants. Extensible datatypes are useful for allowing clients to extend data structures in unforeseen ways. For example:
  xtunion Food;
  xtunion Food { Banana; Grape; Pizza(list_t<xtunion Food>) };
  xtunion Food { Candy; Broccoli };
After these declarations, Pizza(new List(Broccoli, null)) is a well-typed expression.

If multiple declarations include the same variants, the variants must have the same declaration (the number of values, types for the values, and the same existential type variables).

Because different files may add different variants and Cyclone compiles files separately, no code can know (for sure) all the variants of an xtunion. Hence all pattern-matches against a value of an xtunion type must end with a case that matches everything, typically default.

There is one built-in xtunion type: xtunion exn is the type of exceptions. Therefore, you declare new xtunion exn types like this:
  xtunion exn {BadFilename(string)};
The implementation of xtunion types is very similar to that of tunion types, but non-value variants cannot be represented as small integers because of separate compilation. Instead, these variants are represented as pointers to unique locations in static data. Creating a non-value variant still does not cause allocation.


Previous Up Next