Previous Up Next

4  Tagged Unions and Datatypes

In addition to struct, enum, and union, Cyclone provides @tagged union and datatype declarations as ways to construct new aggregate types. Like a union type, each @tagged union and datatype has a number of variants (or members). Unlike conventional unions, an object of a @tagged union or datatype type is exactly one variant, we can detect (or discriminate) that variant at run-time, and the language prevents using an object as though it had a different variant.

The difference between @tagged unions and datatypes is that the former look and behave much like traditional unions, whereas the latter look and behave more like the algebraic datatypes found in functional languages such as ML. Furthermore, datatypes can be either closed or @extensible. A closed datatype's members are specified all together when the datatype is declared, whereas an @extensible datatype supports adding new members after the fact (much like adding a new sub-class to a class-based OO language.)

In this section, we first discuss @tagged unions, then closed datatypes, and finally @extensible datatypes.

4.1  Tagged Unions

A tagged union declaration looks just like a C union, except that it you must specify the @tagged qualifier when declaring it. For example:
   @tagged union Foo {
      int i;
      double d;
      char *@fat s;
   };
The primary difference with C unions is that a tagged union includes a hidden tag. The tag indicates which member was last written. So, for example:
  union Foo x;
  x.i = 3;
  x.s = "hello";
causes the hidden tag to first indicate that the i member was written, and then is updated to record that the s member was written.

When you attempt to read a member of a tagged union, a check is done on the hidden tag to ensure that this was the last member written, and thus the union contains a valid object of that member's type. If some other member was last updated, then a Match_Exception will be thrown.

You can test the hidden tag of a tagged union by using the tagcheck operation. For example:
  void printFoo(union Foo x) {
    if (tagcheck(x.i))
      printf("%d",x.i);
    else if (tagcheck(x.d))
      printf("%g",x.d);
    else if (tagcheck(x.s))
      printf("%s",x.s);
  }
Alternatively, you can use pattern matching (described in the next section) which will ensure that you cover all of the cases properly. For instance, the function above may be rewritten as:
  void printFoo(union Foo x) {
    switch (x) {
    case {.i = i}: printf("%d",i); break;
    case {.d = d}: printf("%g",d); break;
    case {.s = s}: printf("%s",s); break;
    }
  }
If we failed to leave out one of the cases in the pattern match, then the compiler would warn us. This is particularly helpful when you add new variants to a tagged union, for then the compiler pinpoints the spots that you need to update in your code. Therefore, we encourage the use of pattern matching where possible.

4.2  Datatypes



At its simplest, a datatype looks just like an enum declaration. For example, we could say:
  datatype Color { Red, Green, Blue };
As with enum, the declaration creates a type (called datatype Color) and three constants Red, Green, and Blue. Unlike enum, these constants do not have type datatype Color. Instead, each variant has its own type, namely datatype Color.Red, datatype Color.Green, and datatype Color.Blue. However, a pointer to one of these values can be treated as a sub-type of a pointer to a datatype Color. So you can write:
  datatype Color.Red red = Red;
  datatype Color *c = &red;
In this simple example, we are splitting hairs, but we will soon find all these distinctions useful.

Unlike enum, datatype variants may carry any fixed number of values, as in this example:
  datatype Shape {
    Point,
    Circle(float),
    Ellipse(float,float),
    Polygon(int,float),
  };
A Point has no accompanying information, a Circle has a radius, an Ellipse has two axis lengths, and a (regular) Polygon has a number of sides and a radius. (The value fields do not have names, so it is often better style to have a variant carry one value of a struct type, which of course has named members.) This example creates five types: datatype Shape, datatype Shape.Point, datatype Shape.Circle, datatype Shape.Ellipse, and datatype Shape.Polygon. Like in our previous example, datatype Shape.Point* is a subtype of datatype Shape* and Point is a constant of type datatype Shape.Point.

Variants that carry one or more values are treated differently. Circle becomes a constructor; given a float it produces an object of type datatype Shape.Circle, for example Circle(3.0). Similarly, Ellipse(0,0) has type datatype Shape.Ellipse (thanks to implicit casts from int to float for 0) and Polygon(7,4.0) has type datatype Shape.Polygon. The arguments to a constructor can be arbitrary expressions of the correct type, for example, Ellipse(rand(), sqrt(rand())).

Here are some examples which allocate a Point and Circle respectively, but then use subtyping to treat the resulting values as if they are Shape pointers:
  datatype Shape *s1 = new Point;
  datatype Shape *s2 = new Circle(3.0);
Datatypes are particularly useful for building recursive structures. For example, a small language of arithmetic expressions might look like this:
  enum Unops { Negate, Invert};
  enum Binops { Add, Subtract, Multiply, Divide };
  typedef datatype Exp *@notnull exp_t;
  datatype Exp {
    Int(int),
    Float(float),
    Unop(enum Unops, exp_t),
    Binop(enum Binops, exp_t, exp_t)
  };
A function returning an expression representing the multiplication of its parameter by two can be written like this:
  exp_t double_exp(datatype Exp e) {
    return new Binop(Multiply, e, new Int(2));
  }
Accessing Datatype Variants
Given a value of a datatype type, such as datatype Shape, we do not know which variant it is. It could be a Circle or Shape, etc. In Cyclone, we use pattern matching to determine which variant a given datatype value actually is, and to extract the arguments that were used to build the datatype value. For example, here is how you could define isCircle:
  bool isCircle(datatype Shape *s) {
    switch(s) {
    case &Circle(r): return true;
    default: return false;
    }
  }
When a switch statement's argument is a pointer to a datatype, the cases describe variants. One variant of datatype Shape * is a pointer to a Circle, which carries one value. The corresponding pattern has & for the pointer, Circle for the constructor name, and one identifier for each value carried by Circle. The identifiers are binding occurrences (declarations, if you will), and the initial values are the values of the fields of the Circle at which s points. The scope is the extent of the case clause.

Here is another example:

[The reader is asked to indulge compiler-writers who have forgotten basic geometry.]
  extern area_of_ellipse(float,float);
  extern area_of_poly(int,float);
  float area(datatype Shape *s) {
    float ans;
    switch(s) {
    case &Point:
      ans = 0;
      break;
    case &Circle(r):
      ans = 3.14*r*r;
      break;
    case &Ellipse(r1,r2):
      ans = area_of_ellipse(r1,r2);
      break;
    case &Polygon(sides,r):
      ans = area_of_poly(sides,r);
      break;
    }
    return ans;
  }
The cases are compared in order against s. The following are compile-time errors: As you can discover in Section 575Pattern Matchingsection.5, Cyclone has much richer pattern-matching support than we have used here.

Polymorphism and Datatypes
A datatype declaration may be polymorphic over types and regions just like a struct or union definition (see the section on polymorphism). For example, here is a declaration for binary trees where the leaves can hold some BoxKind `a:
  datatype Tree<`a>  {
    Leaf(`a);
    Node(datatype Tree<`a>*, datatype Tree<`a>*);
  };
In the above example, the root may be in any region, but all children will be in the heap. The following version allows the children to be in any region, but they must all be in the same region. (The root can still be in a different region.)
  datatype Tree<`a,`r>  {
    Leaf(`a);
    Node(datatype Tree<`a,`r> *`r, 
         datatype Tree<`a,`r> *`r);
  };
Future

4.3  Extensible Datatypes



We now explain how an @extensible datatype type differs from a datatype. The main difference is that later declarations may continue to add variants. Extensible datatypes are useful for allowing clients to extend data structures in unforeseen ways. For example:
  @extensible datatype Food;
  datatype Food { Banana; Grape; 
                  Pizza(list_t<datatype Food*>) };
  datatype Food { Candy; Broccoli };
After these declarations, Pizza(new List(new Broccoli, NULL)) is a well-typed expression.

If multiple declarations include the same variants, the variants must have the same declaration (the number of values, types for the values, and the same existential type variables).

Because different files may add different variants and Cyclone compiles files separately, no code can know (for sure) all the variants of an @extensible datatype. Hence all pattern-matches against a value of an @extensible datatype type must end with a case that matches everything, typically default.

There is one built-in @extensible datatype type: @extensible datatype exn is the type of exceptions. Therefore, you declare new exn constructors like this:
  datatype exn {BadFilename(string)};
The implementation of @extensible datatype types is very similar to that of datatype types, but variant tags cannot be represented as small integers because of separate compilation. Instead, these tags are represented as pointers to unique locations in static data.


Previous Up Next