Previous Up Next

5  Pattern Matching

Pattern matching provides a concise, convenient way to bind parts of large objects to new local variables. Two Cyclone constructs use pattern matching, let declarations and switch statements. Although the latter are more common, we first explain patterns with let declarations because they have fewer complications. Then we describe all the pattern forms. Then we describe switch statements.

You must use patterns to access values carried by tagged unions, including exceptions. In other situations, patterns make code more readable and less verbose.

Note that this section does not include rules for matching against unique pointers; this is explained in Section 8.4.3113Pattern Matching on Unique Pointerssubsubsection.8.4.3.

5.1  Let Declarations



In Cyclone, you can write
  let x = e;
as a local declaration. The meaning is the same as t x = e; where t is the type of e. In other words, x is bound to the new variable. Patterns are much more powerful because they can bind several variables to different parts of an aggregate object. Here is an example:
  struct Pair {  int x; int y; };
  void f(struct Pair pr) {
    let Pair(fst,snd) = pr;
    ...
  }
The pattern has the same structure as a struct Pair with parts being variables. Hence the pattern is a match for pr and the variables are initialized with the appropriate parts of pr. Hence ``let Pair(fst,snd) = pr'' is equivalent to ``int fst =pr.x; int snd = pr.y''. A let-declaration's initializer is evaluated only once.

Patterns may be as structured as the expressions against which they match. For example, given type
  struct Quad { struct Pair p1; struct Pair p2; };
patterns for matching against an expression of type struct Quad could be any of the following (and many more because of constants and wildcards---see below): In general, a let-declaration has the form ``let p = e;'' where p is a pattern and e is an expression. In our example, the match always succeeds, but in general patterns can have compile-time errors or run-time errors.

At compile-time, the type-checker ensures that the pattern makes sense for the expression. For example, it rejects ``let Pair(fst,snd) = 0'' because 0 has type int but the pattern only makes sense for type struct Pair.

Certain patterns are type-correct, but they may not match run-time values. For example, constants can appear in patterns, so ``let Pair(17,snd) = pr;'' would match only when pr.x is 17. Otherwise the exception Match_Exception is thrown. Patterns that may fail are rarely useful and poor style in let-declarations; the compiler emits a warning when you use them. In switch statements, possibly-failing patterns are the norm---as we explain below, the whole point is that one of the cases' patterns should match.

5.2  Pattern Forms



So far, we have seen three pattern forms: variables patterns, struct patterns, and constant patterns. We now describe all the pattern forms.1 For each form, you need to know: There is one compile-time rule that is the same for all forms: All variables (and type variables) in a pattern must be distinct. For example, ``let Pair(fst,fst) = pr;'' is not allowed.

You may want to read the descriptions for variable and struct patterns first because we have already explained their use informally.

5.3  Switch Statements



In Cyclone, you can switch on a value of any type and the case ``labels'' (the part between case and the colon) are patterns. The switch expression is evaluated and then matched against each pattern in turn. The first matching case statement is executed. Except for some restrictions, Cyclone's switch statement is therefore a powerful extension of C's switch statement.

Restrictions
An Extension of C
Except for the above restrictions, we can see Cyclone's switch is an extension of C's switch. For example, consider this code (which has the same meaning in C and Cyclone):
  int f(int i) {
    switch(i) {
    case 0:  f(34); return 17;
    case 1:  return 17;
    default: return i;
    }
  }
In Cyclone terms, the code tries to match against the constant 0. If it does not match (i is not 0), it tries to match against the pattern 1. Everything matches against default; in fact, default is just alternate notation for ``case _'', i.e., a case with a wildcard pattern. For performance reasons, switch statements that are legal C switch statements are translated to C switch statements. Other switch statements are translated to, ``a mess of tests and gotos''.

We now discuss some of the restrictions in terms of the above example. Because there is no ``implicit fallthrough'' in non-empty cases, the return statement in case 0 cannot be omitted. However, we can replace the ``return 17;'' with ``fallthru;'' a special Cyclone statement that immediately transfers control to the next case. fallthru does not have to appear at the end of a case body, so it acts more like a goto than a fallthrough. As in our example, any case that matches all values of the type switched upon (e.g., default:, case _:, case x:) must appear last, otherwise later cases would be unreachable. (Note that other types may have even more such patterns. For example Pair(x,y) matches all values of type struct Pair int x; int y;).

Much More Powerful
Because Cyclone case labels are patterns, a switch statement can match against any expression and bind parts of the expression to variables. Also, fallthru can (in fact, must) bind values to the next case's pattern variables. This silly example demonstrates all of these features:
  extern int f(int);}
  int g(int x, int y) {
    // return f(x)*f(y), but try to avoid using multiplication
    switch($(f(x),f(y))) {
    case $(0,_): fallthru;
    case $(_,0): return 0;
    case $(1,b): fallthru(b+1-1);
    case $(a,1): return a;
    case $(a,b): return a*b;
    }
  }
The only part of this example using a still-unexplained feature is ``fallthru(b)'', but we explain the full example anyway. The switch expression has type $(int,int), so all of the cases must have patterns that match this type. Legal case forms for this type not used in the example include ``case $(_,id):'', ``case $(id,_):'', ``case id:'', ``case _:'', and ``default:''.

The code does the following: Note that the switch expression is evaluated only once. Implementation-wise, the result is stored in a compiler-generated local variable and the value of this variable is used for the ensuring pattern matches.

The general form of fallthrus is as follows: If the next case has no bindings (i.e., identifiers in its pattern), then you must write fallthru;. If the next case has n bindings, then you must write fallthru(e1,...,en) where each ei is an expression with the appropriate type for the ith binding in the next case's pattern, reading from left to right. (By appropriate type, we mean the type of the expression that would be bound to the ith binding were the next case to match.) The effect is to evaluate e1 through en, bind them to the identifiers, and then goto the body of the next case. fallthru is not allowed in the last case of a switch, not even if there is an enclosing switch.

We repeat that fallthru may appear anywhere in a case body, but it is usually used at the end, where its name makes the most sense. ML programmers may notice that fallthru with bindings is strictly more expressive than or-patterns, but more verbose.

Case Guards
We have withheld the full form of Cyclone case labels. In addition to case p: where p is a pattern, you may write case p && e: where p is a pattern and e is an expression of type int. (And since e1 && e2 is an expression, you can write case p && e1 && e2: and so on.) Let's call e the case's guard.

The case matches if p matches the expression in the switch and e evaluates to a non-zero value. e is evaluated only if p matches and only after the bindings caused by the match have been properly initialized. Here is a silly example:
extern int f(int);
int g(int a, int b) {
  switch ($(a,b-1)) {
  case $(0,y) && y > 1: return 1;
  case $(3,y) && f(x+y) == 7 : return 2;
  case $(4,72): return 3;
  default: return 3;
  }
}
The function g returns 1 if a is 0 and b is greater than 2. Else if x is 3, it calls the function f (which of course may do arbitrary things) with the sum of a and b. If the result is 7, then 2 is returned. In all other cases (x is not 3 or the call to f does not return 7), 3 is returned.

Case guards make constant patterns unnecessary (we can replace case 3: with case x && x==3:, for example), but constant patterns are better style and easier to use.

Case guards are not interpreted by the compiler when doing exhaustiveness and overlap checks, as explained below.

Exhaustiveness and Useless-Case Checking
As mentioned before, it is a compile-time error for the type of the switch expression to have values that none of the case patterns match or for a pattern not to match any values that earlier patterns do not already match. Rather than explain the precise rules, we currently rely on your intuition. But there are two rules to guide your intuition: We emphasize that checking does not just involve the ``top-level'' of patterns. For example, the compiler rejects the switch below because the third case is redundant:
  enum Color { Red, Green };
  void f(enum Color c1, enum Color c2) {
    switch ($(c1,c2)) {
    case $(Red,x): return;
    case $(x,Green): return;
    case $(Red,Green): return;
    default: return;
    }
  }
Rules for No Implicit Fall-Through
As mentioned several times now, Cyclone differs from C in that a case body may not implicitly fall-through to the next case. It is a compile-time error if such a fall-through might occur. Because the compiler cannot determine exactly if an implicit fall-through could occur, it uses a precise set of rules, which we only sketch here. The exact same rules are used to ensure that a function (with return type other than void) does not ``fall off the bottom.'' The rules are very similar to the rules for ensuring that Java methods do not ``fall off the bottom.''

The general intuition is that there must be a break, continue, goto, return, or throw along all control-flow paths. The value of expressions is not considered except for numeric constants and logical combinations (using &&, ||, and ? :) of such constants. The statement try s catch ... is checked as though an exception might be thrown at any point while s executes.


Previous Up Next