Previous Up Next

2  Cyclone for C Programmers

We begin with a quick overview of Cyclone, suitable for those who already know how to program in C. We'll explain some of the ways that Cyclone differs from C and some of the reasons why; you should come away with enough knowledge to start writing, compiling, and running your own Cyclone programs. We assume that the Cyclone compiler is already installed on your system (see Appendix E293Installing Cyclonesection.E if you need to install the compiler).

2.1  Getting Started

Here's a Cyclone program that prints the string ``hello, world.''
    #include <stdio.h>

    int main() {
      printf("hello, world\n");
      return 0;
    }
It looks rather like a C program---in fact, a C compiler will happily compile it. The program uses #include to tell the preprocessor to import some standard definitions, it defines a distinguished function main that serves as the entry point of the program, and it uses the familiar printf function to handle the printing; all of this is just as in C.

To compile the program, put it into a file hello.cyc, and run the command
    cyclone -o hello hello.cyc 
This tells the Cyclone compiler (cyclone) to compile the file hello.cyc; the -o flag tells the compiler to leave the executable output in the file hello (or, in Windows, hello.exe). If all goes well you can execute the program by typing
    hello
and it will print
    hello, world
It's interesting to compare our program with a version that omits the return statement:
    #include <stdio.h>

    int main() {
      printf("hello, world\n");
    }
A C compiler will compile and run this version without warning. In contrast, Cyclone will warn that you have failed to return an int. Cyclone only warns you when you fail to return an integral type (char, short, int, etc.) but it gives an error if you fail to return other types (e.g., pointer types). This requirement of definite return ensures type safety while imposing minimal constraints on a programmer porting C code to Cyclone.

Definite return reflects Cyclone's concern with safety. The caller of the function expects to receive a value of the return type; if the function does not execute a return statement, the caller will receive some incorrect value instead. If the returned value is supposed to be a pointer, the caller might try to dereference it, and dereferencing an arbitrary address can cause the program to crash. So, Cyclone requires a return statement with a value of the return type whenever type safety can be compromised.

2.2  Pointers

Programs that use pointers properly in C can be both fast and elegant. But when pointers are used improperly in C, they cause core dumps and buffer overflows. To prevent this, Cyclone introduces different kinds of pointers and either (a) puts some restrictions on how you can use pointers of a given kind or (b) places no restrictions but may insert additional run-time checks.

Nullable Pointers

The first kind of pointer is indicated with a *, as in C. For example, if we declare
    int x = 3;
    int *y = &x;
then y is a pointer to the integer 3 (the contents of x). The pointer, y, is represented by a memory address, namely, the address of x. To refer to the contents of y, you use *y, so, for example, you can increment the value of x with an assignment like
    *y = *y + 1;
This much is just as in C. However, there are some differences in Cyclone: These are drastic differences from C, particularly the restriction on pointer arithmetic. The benefit is that you can't cause a crash using * pointers in Cyclone.

Fat Pointers

If you need to do pointer arithmetic in Cyclone, you can use a second kind of pointer, called a fat pointer and indicated by writing the qualifier @fat after the *. For example, here is a program that echoes its command-line arguments:
    #include <stdio.h>

    int main(int argc, char *@fat *@fat argv) {
      argc--; argv++; /* skip command name */
      if (argc > 0) {
        /* print first arg without a preceding space */
        printf("%s",*argv);
        argc--; argv++;
      }
      while (argc > 0) {
        /* print other args with a preceding space */
        printf(" %s",*argv);
        argc--; argv++;
      }
      printf("\n");
      return 0;
    }
Except for the declaration of argv, which holds the command-line arguments, the program looks just like you would write it in C: pointer arithmetic (argv++) is used to move argv to point to each argument in turn, so it can be printed.

In C, argv would typically be declared with type char **, a pointer to a pointer to a character, which is thought of as an array of an array of characters. In Cyclone, argv is instead declared with type char *@fat*@fat, which is thought of in the same way: it is a (fat) pointer to a (fat) pointer to characters. The difference between an unqualified pointer and a @fat pointer is that a @fat pointer comes with bounds information and is thus ``fatter'' than a traditional pointer. Each time a fat pointer is dereferenced or its contents are assigned to, Cyclone inserts both a null check and a bounds check. This guarantees that a @fat pointer can never cause a buffer overflow.

Because of the bounds information contained in @fat pointers, argc is superfluous: you can get the size of argv by writing numelts(argv). We've kept argc as an argument of main for backwards compatibility.

It's worth remarking that you can always cast a * pointer to a @fat pointer (and vice-versa). So, it is possible to do pointer arithmetic on a value of type *, but only when you insert the appropriate casts to convert from one pointer type to another. Note that some of these casts can fail at run-time. For instance, if you try to cast a fat pointer that points to an empty sequence of characters to char *, then the cast will fail since the sequence doesn't contain at least one character.

Finally, @fat pointers are used so frequently in Cyclone, that there is special character, ? (question mark) that you can use as an abbreviation for *@fat. For instance, we could write the prototype for main as:
  int main(int argc, char ?? argv);
instead of the more verbose:
  int main(int argc, char *@fat *@fat argv);

Non-NULL Pointers

Another kind of pointer in Cyclone is the non-NULL pointer. A non-NULL pointer is indicated by the qualifier @notnull. A @notnull pointer is like an unqualified pointer, except that it is guaranteed not to be NULL. This means that when you dereference a @notnull pointer or assign to its contents, a null check is sometimes unnecessary.

@notnull pointers are useful in Cyclone both for efficiency and as documentation. This can be seen at work in the standard library, where many functions take @notnull pointers as arguments, or return @notnull pointers as results. For example, the getc function that reads a character from a file is declared,
    int getc(FILE *@notnull);
This says that getc expects to be called with a non-NULL pointer to a FILE. Cyclone guarantees that, in fact, when the getc function is entered, its argument is not NULL. This means that getc does not have to test whether it is NULL, or decide what to do if it is in fact NULL.

In C, the argument of getc is declared to have type FILE *, and programmers can call getc with NULL. So for safety, C's getc ought to check for NULL. In practice, many C implementations omit the check; getc(NULL) is an easy way to crash a C program.

In Cyclone, you can still call getc with a possibly-NULL FILE pointer (a FILE *). However, Cyclone insists that you insert a check before the actual call:
    FILE *f = fopen("/etc/passwd","r");
    int c = getc((FILE *@notnull)f);
Here f will be NULL if the file /etc/passwd doesn't exist or can't be read. So, in Cyclone f must be cast to FILE *@notnull before the call to getc. The cast causes a null check. If you try to call getc without the cast, Cyclone will insert one for you automatically, and warn you that it is doing so.

These warnings do not mean that your program is unsafe---after all, Cyclone has inserted the check for you. However, you should pay attention to the warnings because they indicate a place where your program could suddenly halt (if the check fails), and because the inserted checks can slow down your program. It's worth rewriting your code to handle the error case better, or even eliminate the null check. For instance, if we rewrite the code above so that we explicitly test whether or not fopen succeeds in returning a non-NULL file descriptor:
    FILE *f = fopen("/etc/passwd","r");
    if (f == NULL) {
      fprintf(stderr,"cannot open passwd file!");
      exit(-1);
    }
    int c = getc(f);
then Cyclone no longer issues a warning at the call to getc and the resulting code does not have to do a null check.

If you call getc with a FILE *@notnull, of course, no check is required. For example, stdin is a FILE *@notnull in Cyclone, so you can simply call getc(stdin). In Cyclone you will find that many functions return *@notnull pointers, so many of the pointers you deal with will already be *@notnull pointers, and neither the caller nor the called function needs to do null checks---and this is perfectly safe.

Like @fat pointers, @notnull pointers are so useful, Cyclone provides an abbreviation. Instead of writing FILE *@notnull, you can simply write FILE @ when you want to write the type of a non-NULL pointer to a FILE.

Zero-Terminated Pointers

Fat pointers support arbitrary pointer arithmetic and subscripting, but they don't have the same representation as pointers in C. This is because we need extra information to determine the bounds and ensure that a subscript or dereference is in bounds. Unfortunately, this change in representations can make it difficult to interface with legacy C code where the representations might not be easily changed.

Fortunately, Cyclone supports one more pointer type where the representation matches C's and yet supports a limited form of pointer arithmetic and subscripting: the zero-terminated pointer. A zero-terminated pointer is a pointer to a sequence of elements that are guaranteed to be terminated with a zero. C's strings are a good example. In Cyclone, the type of C's strings can be written as char *@zeroterm. The @zeroterm qualifier indicates that the pointer points to a zero-terminated sequence. The qualifier is orthogonal to other qualifiers, such as @fat or @notnull, so you can freely combine them.

Because C strings arise so frequently, the types char *, char *@notnull, and char *@fat are by default qualified with @zeroterm. You can override the @zeroterm qualifier on char pointers by putting in an explicit @nozeroterm qualifier (e.g., char *@nozeroterm). Pointers to other types (e.g., int *) have a default qualifier of @nozeroterm.

If x is a * @zeroterm pointer, you can use pointer arithmetic on it, as in x+i. However, the compiler inserts checks to ensure that (a) i is non-negative and (b) there is no zero between x[0] and x[i-1] inclusive. This ensures that you can't read past the terminating zero. In addition, when writing to a zero-terminated pointer, the compiler inserts checks to ensure that you don't replace the final zero with some other value. This is crucial for ensuring that a buffer overrun cannot occur. As in C, x[i] is equivalent to x+i, so subscripts come with the same checks.

Because of these checks, subscripts and pointer arithmetic on * @zeroterm can be fairly expensive. In particular, if you are not careful, you can turn what appears to be an O(n) algorithm into an O(n-squared) one. You can avoid this overhead by casting the pointer to a @fat zero-terminated pointer. This computes the length of the sequence once and then uses the bounds information associated with the fat pointer to do any bounds checks.

Cyclone's constraints on zero-terminated pointers mean that you have to be careful when porting code from C. For instance, consider the following function:
  void foo(char *s, int offset) {
     unsigned int len = strlen(s);
     for (unsigned int i = 0; offset+i < len; i++)
        s[offset+i] = 'a';
  }
This code can be quite expensive when offset is large because the compiler must check that there is no intervening zero between s[0] and s[offset+i] for each iteration of the loop. You can get rid of this overhead by rewriting the code as follows:
  void foo(char *s, int offset) {
     unsigned int len = strlen(s);
     s += offset;
     for (unsigned int i = 0; offset+i < len; i++, s++)
        *s = 'a';
  }
Now the compiler is only checking that *s is not zero when it does the increment s++. In addition, however, the compiler is checking each time you do *s = 'a' that *s is not zero, because then you could overwrite the zero with an 'a' and potentially step outside the bounds of the buffer.

One way to get rid of all of these checks is to cast s to a non-zero-terminated fat pointer before entering the loop. When you cast a zero-terminated pointer to a non-zero-terminated fat pointer, the compiler calculates the length of the sequence once, decrements it by one, and then builds an appropriate fat pointer with this bounds information. When you write using the fat pointer, bounds checks (not zero checks) keep you from writing any value over the zero. Furthermore, if you write the code in a straightforward fashion using subscripting, the compiler is more likely to eliminate the bounds checks. Here is an example:
  void foo(char *s, int offset) {
    char *@fat @nozeroterm fat_s = (char *@fat @nozeroterm)s;
    unsigned int len; 
    fat_s += offset;
    len = numelts(fat_s);    
    for (unsigned int i = 0; i < len; i++)
      fat_s[i] = 'a';
  }
The Cyclone compiler generates code that works like the following C code:
struct _tagged_arr { 
  char *base;
  char *curr;
  char *last;
};

void Cyc_foo(char *s,int offset){
  struct _tagged_arr fat_s = {s, s, s+strlen(s)};
  unsigned int len;
  fat_s.curr += offset;
  if (fat_s.curr < fat_s.base || fat_s.curr >= fat_s.last) 
    len = 0;
  else 
    len = fat_s.last - fat_s.curr;
  { unsigned int i = 0;
    for(0; i < len; i++)
      fat_s.curr[i] = 'a';
  }
}
Notice that here, the compiler is able to eliminate all bounds checks within the loop and still ensure safety.

Initializing Pointers

Pointers must be initialized before they are used to ensure that unknown bits do not get used as a pointer. This requirement goes for variables that have pointer type, as well for arrays, elements of arrays, and for fields in structures. Conversely, data that does not have pointer type need not be initialized before it is used, since doing so cannot result in a violation of safety. This decision adheres to the philosophy of C, but diverges from that of traditional type-safe languages like Java and ML.

Other features of pointers

There's much more to Cyclone pointers than we've described here.

In particular, a pointer type can also specify that it points to a sequence of a particular (statically known) length using the @numelts qualifier. For instance, we can write:
void foo(int *@numelts(4) arr);
Here, the parameter arr is a pointer to a sequence of four integer values. Both the non-null and nullable pointers support explicit sequence bounds that are tracked statically. Indeed, both pointer kinds always have length information and when you write ``int *'' this is just short-hand for ``int *@numelts(1)''.

We explain pointers in more detail in Section 357Pointerssection.3.

2.3  Regions

Another potential way to crash a program or violate security is to dereference a dangling pointer: a pointer to storage that has been deallocated. These are particularly insidious bugs because the error might not manifest itself immediately. For example, consider the following C code:
struct Point {int x; int y;};

struct Point *newPoint(int x,int y) {
  struct Point result = {x,y};
  return &result;
}

void foo(struct Point *p) {
  p->y = 1234;
  return;
}

void bar() {
  struct Point *p = newPoint(1,2);
  foo(p);
}
The code has an obvious bug: the function newPoint returns a pointer to a locally-defined variable (result), even though the storage for that variable is deallocated upon exit from the function. That storage may be re-used (e.g., by a subsequent procedure call) leading to subtle bugs or security problems. For instance, in the code above, after bar calls newPoint, the storage for the point is reused to store information for the activation record of the call to foo. This includes a copy of the pointer p and the return address of foo. Therefore, it may be that p->y actually points to the return address of foo. The assignment of the integer 1234 to that location could then result in foo ``returning'' to an arbitrary hunk of code in memory. Nevertheless, the C type-checker readily admits the code.

In Cyclone, this code would be rejected by the type-checker to avoid the kind of problems mentioned above. The reason the code is rejected is that the Cyclone compiler tracks object lifetimes and ensures that a pointer to an object can only be dereferenced if that object has not been deallocated.

Cyclone achieves this by assigning each object a symbolic region that corresponds to the lexical block in which the object is declared. Cyclone also tracks, for every pointer, what region it points into. The region pointed to can be written as part of the pointer type, but usually the region can be omitted---the compiler is smart enough to discover the region automatically in most cases.

For example, the variable result in our code above lives within a region that corresponds to the invocation of the function newPoint. We write the name of the region explicitly using a back-quote, as in `newPoint. Because result lives in region `newPoint, the expression &result is a pointer into region `newPoint. The full Cyclone type of &result, with the explicit region, is struct Point * @region(`newPoint).

When control flow exits a block, the storage (i.e., the region) for that block is deallocated. Cyclone keeps track of the set of regions that are allocated and deallocated at every control-flow point and ensures that you only dereference pointers to allocated regions. For example, consider the following fragment of (bad) Cyclone code:
1 int f() {
2    int x = 0;
3    int *@region(`f) y = &x;
4    L:{ int a = 0;
5        y = &a;
6      }
7    return *y;
8 }
In the function f above, the variables x and y live within the region `f because they are declared in the outermost block of the function, and because the default region name for the block of a function is `<function name>. The storage for those variables will live as long as the invocation of the function. Note that since y is a pointer to x, the type of y is int *@region(`f), indicating that y points into region `f.

The variable a does not live in region `f, because it is declared in an inner block, which we have labeled with L. The storage for the inner block L may be deallocated upon exit of the block, before the function itself returns. To be more precise, the storage for a is deallocated at line 7 in the code. Thus, it is an error to try to access this storage in the rest of the computation, as is done on line 7.

Cyclone detects the error because it gives the expression &a the type int *@region(`L), meaning that the value is a pointer into region `L. So, the assignment y = &a fails to type-check because y expects to hold a pointer into region `f, not region `L. The restriction, compared to C, is that a pointer's type indicates one region instead of all regions.

Region Inference

If you had to write a @region qualifier on every pointer type, then writing code would be far from pleasant. Fortunately, Cyclone provides a number of mechanisms to cut down on the region annotations you have to write.

First off, you can omit the @region qualifier keyword and simply write the region name (e.g., `r) as long as you put the region name after any other qualifiers. For instance, instead of writing ``int *@notnull @region(`r)'' we can simply write ``int @`r''. In this document, we will use an explicit @region qualifier, but you'll find that the libraries and other example programs tend to use the abbreviations.

In addition, Cyclone often figures out the region of a pointer without the programmer providing the information. This is called region inference. For instance, we can rewrite the function f above without any region annotations, and without labelling the blocks:
1 int f() {
2    int x = 0;
3    int *y = &x;
4    { int a = 0;
5      y = &a;
6    }
7    return *y;
8 }
Cyclone can still figure out that y is a pointer into region `f, and &a is a pointer into a different (now anonymous) region, so the code should be rejected.

As we will show below, occasionally you will need to put explicit region annotations into the code to convince the type-checker that something points into a particular region, or that two things point into the same region. In addition, it is sometimes useful to put in the region annotations for documentation purposes, or to make type errors a little less cryptic.

You need to understand a few more details about regions to be an effective Cyclone programmer: the heap region, growable regions, region polymorphism, dynamic regions, and default region annotations for function parameters. The following sections give a brief overview of these details.

The Heap Region

There is a special region for the heap, written `H, that holds all of the storage for top-level variables, and for data allocated via new or malloc. For instance, if we write the following declarations at the top-level:
struct Point p = {0,1};
struct Point *ptr = &p;
then Cyclone figures out that ptr points into the heap region. To reflect this explicitly, we can put the region in the type of ptr if we like:
struct Point p = {0,1};
struct Point *@region(`H) ptr = &p;
As another example, the following function heap-allocates a point and returns it to the caller. We put the regions in here to be explicit:
struct Point *@region(`H) good_newPoint(int x,int y) {
  struct Point *@region(`H) p = 
    malloc(sizeof(struct Point));
  p->x = x;
  p->y = y;
  return p;
}
Alternatively, we can use new to heap-allocate and initialize the result:
struct Point *@region(`H) good_newPoint(int x,int y) {
  return new Point{x,y};
}

Growable Regions

Storage on the stack is implicitly allocated and recycled when you enter and leave a block. Storage in the heap is explicitly allocated via new or malloc, but there is no support in Cyclone for explicitly freeing an object in the heap. The reason is that Cyclone cannot accurately track the lifetimes of individual objects within the heap, so it can't be sure whether dereferencing a pointer into the heap would cause problems. Instead, a conservative garbage collector is used to reclaim the data allocated in the heap.

Using a garbage collector to recycle memory is the right thing to do for most applications. For instance, the Cyclone compiler uses heap-allocated data and relies upon the collector to recycle most objects it creates when compiling a program. But a garbage collector can introduce pauses in the program, and as a general purpose memory manager, might not be as space- or time-efficient as routines tailored to an application.

To address these applications, Cyclone provides support for growable regions and dynamic regions. A growable region is similar to the region associated with a code block. In particular, when you execute:
{ region<`r> h;
   ...
}
this declares a new region `r along with a region handle h. The handle can be used for dynamically allocating objects within the region `r. All of the storage for the region is deallocated at the point of the closing brace. Unlike block regions, the number (and size) of objects that you allocate into the region is not fixed at compile time. In this respect, growable regions are more like the heap. You can use the rnew(h) and rmalloc(h,...) operations to allocate objects within a growable region, where h is the handle for the region.

For instance, the following code takes an integer n, creates a new dynamic region and allocates an array of size n within the region using rnew.
int k(int n) {
  int result;
  { region<`r> h;
    int ?arr = rnew(h) {for i < n : i};
    result = process(h, arr);
  }
  return result;
}
It then passes the handle for the region and the array to some processing function. Note that the processing function is free to allocate objects into the region `r using the supplied handle. After processing the array, we exit the region which deallocates the array, and then return the calculated result.

It is worth remarking that the heap is really just a growable region with global scope, and you can use the global variable Core::heap_region as a handle on the heap. Indeed, new and malloc(...) are just abbreviations for rnew(Core::heap_region) and rmalloc(Core::heap_region,...) respectively.

Dynamic Regions

Block regions and growable regions allow you to create arenas that are lexically scoped. The lifetimes of these regions begin and end in a last-in-first-out (LIFO) manner that follows the block structure of the code. They are particularly good for allocating temporary data during a computation. But they are not so good when we cannot statically bound the lifetime of an object that we wish to allocate.

For example, consider an event loop where each event contains some data that should be passed to the event handler when the event is triggered. And assume that the event handler wishes to deallocate the data when it is invoked. Since we cannot determine when (or even if) the event handler will be invoked, we cannot statically determine the lifetime of its associated data.

In these situations, you can use a dynamic region. A dynamic region can be deallocated at (almost) any point within a program. However, before the data within a dynamic region can be accessed and before you can allocate new data into a dynamic region, the region must be opened. When you open a dynamic region, this checks that the region has not been deallocated, and also prevents someone from deallocating the region as long as it is open. In other words, once open, you are free to access or allocate data within the region, but you cannot free the region.

If you attempt to open a dynamic region that has already been freed, then an exception will be thrown. Dually, if you attempt to free a dynamic region that is currently open, then an exception will be thrown.

A dynamic region is created as follows:
  let NewRegion{<`d> dh} = Core::rnew_dynregion(rh);
where rh is a handle for a region `r, `d is the name of the new dynamic region, and dh is the dynamic region handle. The dynamic region handle dh has type dynregion_t<`d,`r>. We say that it is a handle for `d and is contained in the region `r.

To access or allocate data within a dynamic region, you must open the region as follows:
  { region sh = open(dh);
    ...
  }
In this case, the dynamic region handle dh is opened and we are given a static region handle sh. The region remains opened throughout the scope of the static region handle. That is, the code in the curly braces (...) can access the region, but outside the curly braces, the data within the region may not be accessed. The static region handle sh can be used to allocate data within the dynamic region `d.

If the region `d has already been freed, then the open will fail by throwing the exception Core::Open_Region.

You can free a dynamic region (as long as it is not open) by calling the following function and passing in the dynamic region handle:
  Core::free_dynregion(dh);
If the dynamic region has already been freed or if it is currently open, the free_dynregion will throw the exception Core::Free_Region.

The dynamic region handle dh is contained in the region `r. When `r is deallocated, then dh becomes inaccessible and thus `d is deallocated as well.

Region Polymorphism

Another key concept you need to understand is called region polymorphism. This is just a fancy way of saying that you can write functions in Cyclone that don't care which specific region a given object lives in, as long as it's still alive. For example, the function foo from the beginning of this section is a region-polymorphic function. To make this clear, let us re-write the function making the regions explicit:
void foo(struct Point *@region(`r) p) {
  p->y = 1234;
  return;
}
The function is parameterized by a region variable `r, and accepts a pointer to a Point that lives in region `r. Note that `r can be instantiated with any region you like, including the heap, or a region local to a function. So, for instance, we can write the following:
void g() {
  struct Point p = {0,1};
  struct Point *@region(`g) ptr1 = &p;
  struct Point *@region(`H) ptr2 = new Point{2,3};
  foo(ptr1);
  foo(ptr2);
}
Note that in the first call to foo, we are passing a pointer into region `g, and in the second call to foo, we are passing in a pointer into the heap. In the first call, `r is implicitly instantiated with `g, and in the second call, with `H.

Cyclone automatically inserts region parameters for function arguments, so you rarely have to write them. For instance, foo can be written simply as:
void foo(struct Point * p) {
  p->y = 1234;
  return;
}
As another example, if you write the following:
void h(struct Point * p1, struct Point * p2) {
  p1->x += p2->x;
  p2->x += p2->y;
}
then Cyclone fills in the region parameters for you by assuming that the points p1 and p2 can live in any two regions `r1 and `r2. To make this explicit, we would write:
void h(struct Point *@region(`r1) p1, 
       struct Point *@region(`r2) p2) {
  p1->x += p2->x;
  p2->x += p2->y;
}
Now we can call h with pointers into any two regions, or even two pointers into the same region. This is because the code is type-correct for all regions `r1 and `r2.

Occasionally, you will have to put region parameters in explicitly. This happens when you need to assert that two pointers point into the same region. Consider for instance the following function:
void j(struct Point * p1, struct Point * p2) {
  p1 = p2;
}
Cyclone will reject the code because it assumes that in general, p1 and p2 might point into different regions. That is, Cyclone fills in the missing regions as follows:
void j(struct Point *@region(`r1) p1, 
       struct Point *@region(`r2) p2) {
  p1 = p2;
}
Now it is clear that the assignment does not type-check because the types of p1 and p2 differ. In other words, `r1 and `r2 might be instantiated with different regions, in which case the code would be incorrect. But you can make them the same by putting in the same explicit region for each pointer. Thus, the following code does type-check:
void j(struct Point *@region(`r) p1, 
       struct Point *@region(`r) p2) {
  p1 = p2;
}
So, Cyclone assumes that each pointer argument to a function is in a (potentially) different region unless you specify otherwise. The reason we chose this as the default is that (a) it is often the right choice for code, (b) it is the most general type in the sense that if it does work out, clients will have the most lattitude in passing arguments from different regions or the same region to the function.

What about the results? Here, there is no good answer because the region of the result of a function cannot be easily determined without looking at the body of the function, which defeats separate compilation of function definitions from their prototypes. Therefore, we have arbitrarily chosen the heap as the default region for function results. Consequently, the following code:
struct Point * good_newPoint(int x,int y) {
  return new Point{x,y};
}
type-checks since the new operator returns a pointer to the heap, and the default region for the return type is the heap.

This explains why the original bad code for allocating a new point does not type-check:
struct Point *newPoint(int x,int y) {
  struct Point result = {x,y};
  return &result;
}
The value &result is a pointer into region `newPoint but the result type of the function needs to be a pointer into the heap (region `H).

If you want to return a pointer that is not in the heap region, then you need to put the region in explicitly. For instance, the following code:
int * id(int *x) {
  return x;
}
will not type-check. To see why, let us rewrite the code with the default region annotations filled in. The argument is assumed to be in a region `r, and the result is assumed to be in the heap, so the fully elaborated code is:
int *@region(`H) id(int *@region(`r) x) {
  return x;
}
Now the type-error is manifest. To fix the code, we must put in explicit regions to connect the argument type with the result type. For instance, we might write:
int *@region(`r) id(int *@region(`r) x) {
  return x;
}
or using the abbreviation:
int *`r id(int *`r x) {
  return x;
}

Region Summary

In summary, each pointer in Cyclone points into a given region and this region is reflected in the type of the pointer. Cyclone won't let you dereference a pointer into a deallocated region. The lexical blocks declared in functions correspond to one type of region, and simply declaring a variable within that block allocates storage within the region. The storage is deallocated upon exit of the block. Growable regions are similar, except that a dynamic number of objects can be allocated within the region using the region's handle. Both block and growable regions have structured lifetimes. Dynamic regions, in contrast, support arbitrary lifetimes, but must be opened to be accessed. Finally, the heap is a special growable region that is garbage collected.

Region polymorphism and region inference make it possible to omit many region annotations on types. Cyclone assumes that pointers passed to functions may live in distinct regions, and assumes that result pointers are in the heap. These assumptions are not perfect, but (a) programmers can fix the assumptions by providing explicit region annotations, (b) it permits Cyclone files to be separately compiled.

The region-based type system of Cyclone is perhaps the most complicated aspect of the language. In large part, this is because memory management is a difficult and tricky business. We have attempted to make stack allocation and region polymorphic functions simple to use without sacrificing programmer control over the lifetimes of objects and without having to resort to garbage collection. As more advanced features, we also provide even finer control over object lifetimes, but at the expense of more work from the programmer, using the unique region and reference-counted regions. For more information on these, and regions in general, see Section 894Memory Management Via Regionssection.8.

2.4  Tagged Unions and Datatypes

It's often necessary to write a function that accepts an argument with more than one possible type. For example, in
    printf("%d",x);
x should be an integer, but in
    printf("%s",x);
x should be a pointer to a sequence of characters.

If we call printf("%s",x) with an integer x, instead of a pointer x, the program will likely crash. To prevent this, most C compilers treat printf specially: they examine the first argument and require that the remaining arguments have the appropriate types. However, a compiler can't check this if printf isn't called with a literal format string:
    printf(s,x);
where s is a string variable. This means that in C, programs that use printf (or scanf, or a number of related functions) are vulnerable to crashes and corrupted memory. In fact, it's possible for someone else to crash your program by causing it to call printf with arguments that don't match the format string. This is called a format string attack, and it's an increasingly common exploit.

Cyclone provides tagged unions so that you can safely write functions that accept an argument with more than one possible type. Like a C union, a Cyclone @tagged union is a type that has several possible cases. Here's a simple example:
    @tagged union T {
      int Integer;
      const char *@fat String;
    };
    union T x = {.Integer = 3};
    union T y = {.String = "hello, world"};
This declares a new tagged union type T, that can hold either an integer or a string (remember, a string is a char *@fat in Cyclone). It also declares to union T values x and y and initializes them with an integer and string respectively.

Just as with C unions, you can read and write any member of a tagged union. However, to prevent security holes, Cyclone enforces the property that you can only read the last member written. This prevents you from accidentally treating an integer as if it's a string or some other kind of pointer.

Cyclone enforces this safety property by inserting a hidden tag into the union (hence the @tagged qualifier.) You can test the tag by using the built-in tagcheck function. For instance, here is a function that uses the real printf to safely print out the contents of a union T value, regardless of its contents:
    bool printT(union T w) {
      if (tagcheck(w.Integer))
        printf("%d",w);
      else 
        printf("%s",w);
    }
Note that tagcheck(w.Integer) does not return the value of the Integer member, but rather returns true if and only if the Integer member was the last member written (and is thus safe to read.)

Each write to a tagged union member causes the hidden tag to be updated, and each read is preceded by a check to ensure that the member was the last one written. If you attempt to read some member other than the last one written, than the Match exception is thrown. For example, the following code writes the String member and then attempts to read the Integer member, so it will throw a Match exception:
    union T a;
    int x;
    a.String = "hello, world";
    /* Next line fails */
    x = a.Integer + 3;
When you have a big union, it can be awkward to use tagcheck to test the hidden tag. You might accidentally test the wrong member or forget to cover a member. In these cases, its probably best to use pattern matching to determine the tag and to extract the underlying value. For example, here is the function printT coded with pattern matching:
    void printT(union T a) {
      switch (a) {
      case {.Integer = i}: printf("%d",i); return;
      case {.String = s}: printf("%s",s); return;
      }
    }
The argument a has type union T, so it is either an Integer or String. Cyclone extends switch statements with patterns that distinguish between the cases. The first case,
   case {.Integer = i}: printf("%d",i); return;
contains a pattern, {Integer = i}, that will match only T values where the Integer member was the last one written. The variable i is bound to the underlying integer, and it can be used in the body of the case. For example, printT(x) will print 3, since x holds {.Integer = 3}, and printT(y) will print hello, world. You can find out more about patterns in Section 575Pattern Matchingsection.5;

Cyclone also supports untagged unions, but there are restrictions on how they may be used to ensure safety. In particular, you can write any value you like into a union, but you can only read out values that do not contain pointers. This ensures that you don't ``spoof'' a pointer with an integer or some other bogus value. So, the general rule is that you can use a normal C union if you aren't using pointers, but you must use a @tagged union if you are using pointers.

Cyclone provides another alternative to tagged unions for supporting hetrogenous values called a datatype. Tagged unions require space proportional to the largest member (plus room for the tag.) In contrast, a datatype only requires space for the member being used. However, datatypes cannot be updated with a different member and require a level of indirection.

Here is our example type re-coded using a datatype declaration:
    datatype T {
      Integer(int);
      String(const char *@fat);
    };

    datatype T.Integer x = Integer(3);
    datatype T.String y = String("hello, world");

    void printT(datatype T@ a) {
      switch (a) {
      case &Integer(i): printf("%d",i); return;
      case &String(s): printf("%s",s); return;
      }
    }
In general, a datatype declaration includes a set of constructors which can be used to build datatype values. In this case, the constructors are Integer and String. The Integer constructor takes an int and returns a value of type datatype T.Integer. The String constructor takes a string and returns a datatype T.String value.

Note that the types of x and y are not the same so we can't interchange them, nor can we pass them directly to the printT function. In particular, their types reveal which constructor was used to build them. However, we can manipulate pointers to these values in an abstract way. In particular, we can pass a pointer to a datatype T.Integer value or a pointer to a datatype T.String value anywhere that expects a datatype T. For instance, we can write printT(&x) to print out the integer value in x, and we can write printT(&y) to print out the "hello, world" string in y.

For more on datatypes, see Section 465Tagged Unions and Datatypessection.4.

2.5  Exceptions

So far we've glossed over what happens when you try to dereference a null pointer, or assign to an out-of-bounds @fat pointer. We've said that Cyclone inserts checks to make sure the operation is safe, but what if the checks fail? For safety, it would be sufficient to halt the program and print an error message---a big improvement over a core dump, or, worse, a program with corrupted data that keeps running.

In fact, Cyclone does something a bit more general than halting with an error message: it throws an exception. The advantage of exceptions is that they can be caught by the programmer, who can then take corrective action and perhaps continue with the program. If the exception is not caught, the program halts and prints an error message. Consider our earlier example:
    FILE *f = fopen("/etc/passwd","r");
    int c = getc((FILE @)f);
Suppose that there is no file /etc/passwd; then fopen will return NULL, and when f is cast to FILE *@notnull, the implied null check will fail. The program will halt with an error message,
    Uncaught exception Null_Exception
Null_Exception is one of a handful of standard exceptions used in Cyclone. Each exception is like a case of a datatype: it can carry along some values with it. For example, the standard exception InvalidArg carries a string. Exceptions can be handled in try-catch statements, using pattern matching:
    FILE *f = fopen("/etc/passwd","r");
    int c;
    try {
      c = getc((FILE *@notnull)f);
    }
    catch {
    case &Null_Exception:
      printf("Error: can't open /etc/passwd\n");
      exit(1);
    case &InvalidArg(s):
      printf("Error: InvalidArg(%s)\n",s);
      exit(1);
    }
Here we've ``wrapped'' the call to getc in a try-catch statement. If f isn't NULL and the getc succeeds, then execution just continues, ignoring the catch. But if f is NULL, then the null check will fail and the exception Null_Exception will be thrown; execution immediately continues with the catch (the call to getc never happens). In the catch, the thrown exception is pattern matched against the cases. Since the thrown exception is Null_Exception, the first case is executed here.

There is one important difference between an exception and a case of a datatype: with a datatype, all of the cases have to be declared at once, while a new exception can be declared at any time. So, exceptions are an extensible datatype. You can specify that a datatype is extensible when you declare it, using the @extensible qualifier. For example, here's how to declare a new exception:
    @extensible datatype exn {
      My_Exception(char *@fat);
    };
The type @extensible datatype exn is the type of exceptions, and this declaration introduces a new case for the @extensible datatype exn type: My_Exception, which carries a single value (a string). Exception values are created just like datatype values, and are thrown with a throw statement. For example,
    throw new My_Exception("some kind of error");
or
    throw new Null_Exception;
In practice, ``@extensible datatype'' is quite a mouthful. So, Cyclone allows you abbreviate it with just datatype, as long as you've declared a datatype as @extensible once. So a more typical way to declare a new exception in Cyclone is
    datatype exn {
      My_Exception(char ?);
    };

2.6  Additional Features of Cyclone

So far we've mentioned a number of advanced features of Cyclone that provide facilities needed to avoid common bugs or security holes in C. But there are many other features in Cyclone that are aimed at making it easier to write code, ranging from convenient expression forms, to advanced typing constructs. For instance, like GCC and C99, Cyclone allows you declare variables just about anywhere, instead of at the top of a block. As another example, like Java, Cyclone lets you declare variables within the initializer of a for-statement.

In addition, Cyclone adds advanced typing support in the form of (a) parametric polymorphism, (b) structural subtyping, and (c) some unification-based, local-type inference. These features are necessary to type-check or port a number of (potentially) unsafe C idioms, usually involving ``void *'' or the like. Similarly, @tagged union types and datatypes can be used to code around many of the uses for C's union types -- another potential source of unsoundness. The rest of this section is a brief overview of these added features.

2.7  GCC and C99 Additions

GCC and the ISO C99 standard have some useful new features that we have adopted for Cyclone. Some of the ones that we currently support are: We expect to follow the C99 standard fairly closely.

2.8  Tuples

Tuples are like lightweight structs. They need not be declared in advance, and have member or field names that are implicitly 0, 1, 2, 3, etc. For example, the following code declares x to be a 3-tuple of an integer, a character, and a boolean, initialized with the values 42, 'z', and true respectively. It then checks to see whether the third component in the tuple is true (it is) and if so, increments the first component in the tuple.
  $(int,char,bool) x = $(42,'z',true)

  if (x[2])
    x[0]++;
The above code would be roughly equivalent to writing:
  struct {int f0; char f1; bool f2;} x = {42,'z',true};
  if (x.f2)
    x.f1++;
Thus, tuple types are written $(type1,...,typen), tuple constructor expressions are written $(exp1,...,expn), and extracting the ith component of a tuple is written using subscript notation exp[i-1]. Note that, consistent with the rest of C, the members start with 0, not 1.

Unlike structs, tuple types are treated equivalent as long as they are structurally equivalent. As in C, struct types are equivalent only if they have the same tag or name. (Note that in C, all struct declarations have a tag, even if the compiler has to gensym one.)

2.9  Creating Arrays

There are about four ways to create arrays in Cyclone. One can always declare an array and provide an initializer as in C. For instance:
  int foo[8] = {1,2,3,4,5,6,7,8};
  char s[4] = "bar";
are both examples from C for creating arrays. Note that Cyclone follows C's conventions here, so that if you declare arrays as above within a function, then the lifetime of the array coincides with the activation record of the enclosing scope. In other words, such arrays will be stack allocated.

To create heap-allocated arrays (or strings) within a Cyclone function, you should either use ``new'' operator with either an array initializer or an array comprehension. The following code demonstrates this:
  // foo is a pointer to a heap-allocated array
  int *{8}foo = new {1,2,3,4,5,6,7,8};

  // s is a checked pointer to a heap-allocated string
  char ?s = new {'b','a','r',0};

  // a non-null pointer to the first 100 even numbers
  int @{100}evens = new {for i < 100 : 2*i};

2.10  Subtyping

Cyclone supports ``extension on the right'' and ``covariant depth on const'' subtyping for pointers. This simply means that you can cast a value x from having a type ``pointer to a struct with 10 fields,'' to ``pointer to a struct having only the first 5 fields.'' For example, if we have the following definitions:
  typedef struct Point {float x,y;} *point;

  typedef struct CPoint {float x,y; int color;} *cpoint;

  float xcoord(point p) {
    return p->x;
  }
then you can call xcoord with either a point or cpoint object. You can also cast a pointer to a tuple having 3 fields (e.g., $(int,bool,double)*) to a pointer to a tuple having only 2 fields (e.g., $(int,bool)*). In other words, you can forget about the ``tail'' of the object. This allows a degree of polymorphism that is useful when porting C code. In addition, you can do ``deep'' casts on pointer fields that are const. (It is unsafe to allow deep casts on non-const fields.) Also, you can cast a field from being non-const to being const. You can also cast a constant-sized array to an equivalent pointer to a struct or tuple. In short, Cyclone attempts to allow you to cast one type to another as long as it is safe. Note, however, that these casts must be explicit.

We expect to add more support for subtyping in the future (e.g., subtyping on function pointers, bounded subtyping, etc.)

2.11  Let Declarations

Sometimes, it's painful to declare a variable because you have to write down its type, and Cyclone types can be big when compared to their C counterparts since they may include bounds information, regions, etc. Therefore, Cyclone includes additional support for type inference using let declarations. In particular, you can write:
  int foo(int x) {
    let y = x+3;
    let z = 3.14159;
    return (int)(y*z);
  }
Here, we declared two variables y and z using ``let.'' When you use let, you don't have to write down the type of the variable. Rather, the compiler infers the type from the expression that initializes the variable. More generally, you can write ``let pattern = exp;'' to destructure a value into a bunch of variables. For instance, if you pass a tuple to a function, then you can extract the components as follows:
  int sum($(int,int,int) args) {
    let $(x,y,z) = args;
    return (x+y+z);
  }

2.12  Polymorphic Functions

As mentioned above, Cyclone supports a limited amount of subtyping polymorphism. It also supports a fairly powerful form of parametric polymorphism. Those of you coming from ML or Haskell will find this familiar. Those of you coming from C++ will also find it somewhat familiar. The basic idea is that you can write one function that abstracts the types of some of the values it manipulates. For instance, consider the following two functions:
  $(char*,int) swap1($(int,char*) x) {
     return $(x[1], x[0]);
  }
  $(int,int) swap2($(int,int) x) {
     return $(x[1], x[0]);
  }
The two functions are quite similar: They both take in a pair (i.e., a 2-tuple) and return a pair with the components swapped. At the machine-level, the code for these two functions will be exactly the same, assuming that ints and char *s are represented the same way. So it seems silly to write the code twice. Normally, a C programmer would replace the definition with simply:
  $(void *,void *) swap1($(void *,void *) x) {
     return $(x[1], x[0]);
  }
(assuming you added tuples to C). But of course, this isn't type-safe because once I cast the values to void *, then I can't be sure what type I'm getting out. In Cyclone, you can instead write something like this:
  $(`b,`a) swap($(`a,`b) x) {
     return $(x[1],x[0]);
  }
The code is the same, but it abstracts what the types are. The types `a and `b are type variables that can be instantiated with any word-sized, general-purpose register type. So, for instance, you can call swap on pairs of integers, pairs of pointers, pairs of an integer and a pointer, etc.:
  let $(x,y) = swap($("hello",3));  // x is 3, y is hello
  let $(w,z) = swap($(4,3));        // w is 3, z is 4
Note that when calling a polymorphic function, you need not tell it what types you're using to instantiate the type variables. Rather, Cyclone figures this out through unification.

C++ supports similar functionality with templates. However, C++ and Cyclone differ considerably in their implementation strategies. First, Cyclone only produces one copy of the code, whereas a C++ template is specialized and duplicated at each type that it is used. This approach requires that you include definitions of templates in interfaces and thus defeats separate compilation. However, the approach used by Cyclone does have its drawbacks: in particular, the only types that can instantiate type variables are those that can be treated uniformly. This ensures that we can use the same code for different types. The general rule is that values of the types that instantiate a type variable must fit into a machine word and must be passed in general-purpose (as opposed to floating-point) registers. Examples of such types include int, pointers, datatype, and xdatatype types. Other types, including char, short, long long, float, double, struct, and tuple types violate this rule and thus values of these types cannot be passed to a function like swap in place of the type variables. In practice, this means that you tend to manipulate a lot of pointers in Cyclone code.

The combination of parametric polymorphism and sub-typing means that you can cover a lot of C idioms where void* or unsafe casts were used without sacrificing type-safety. We use polymorphism a lot when coding in Cyclone. For instance, the standard library includes many container abstractions (lists, sets, queues, etc.) that are all polymorphic in the element type. This allows us to re-use a lot of code. In addition, unlike C++, those libraries can be compiled once and need not be specialized. On the downside, this style of polymorphism does not allow you to do any type-specific things (e.g., overloading or ad-hoc polymorphism.) Someday, we may add support for this, but in the short run, we're happy not to have it.

2.13  Polymorphic Data Structures

Just as function definitions can be parameterized by types, so can struct definitions, datatype definitions, and even typedefs. For instance, the following struct definition is similar to the one used in the standard library for lists:
  struct List<`a> {`a hd; struct List<`a> * tl; };
  typedef struct List<`a> *list_t<`a>;
Here, we've declared a struct List parameterized by a type `a. The hd field contains an element of type `a and the tl field contains a possibly-null pointer to a struct List with elements of type `a. We then define list_t<`a> as an abbreviation for struct List<`a>*. So, for instance, we can declare both integer and string lists like this:
  list_t<int> ilist = new List{1,new List{2,null}};
  list_t<string_t> slist = new List{.hd = "foo",
                                  .tl = new List{"bar",null}};
Note that we use ``new'' as in C++ to allocate a new struct List on the heap and return a pointer to the resulting (initialized) List object. Note also that the field designator (``.hd'', ``.tl'') are optional.

Once you have polymorphic data structures, you can write lots of useful polymorphic code and use it over and over again. For instance, the standard list library (see lib/list.h) includes functions for mapping over a list, looking up items in a list, concatenating two lists, copying lists, sorting lists, etc.

2.14  Abstract and Existential Types

Suppose you want to declare an abstract type for implementing stacks. In Cyclone, the way this is accomplished is by declaring a struct that encapsulates the implementation type, and by adding the ``abstract'' qualifier to the struct definition. For instance, if we write:
  abstract struct Queue<`a> { list_t<`a> front, rear; };
then this declares a polymorphic Queue implementation that is abstract. The definition of the struct is available within the unit that declares the Queue, but will not be made available to the outside world. (This will be enforced by a link-time type-checker that we are currently putting together.) Typically, the provider of the Queue abstraction would write in an interface file:
  extern struct Queue<`a>;
The abstract keyword in the implementation ensures that the definition does not leak to a client.

Typedefs can be made abstract by writing:
typedef _ foo_t;
However, our current implementation does not support later redefining foo_t as a non-abstract typedef. The default kind for the type is B; you can write an explicit kind like this:
typedef _::A bar_t;
Generally abstract structs are sufficient. An abstract typedef is useful in some cases, though, such as when the abstracted type is actually int.

It's also possible to code up ``first-class'' abstract data types using structs with existentially quantified type variables. Existential types are described in Section 12150Advanced Featuressection.12.

For an example of the use of existential types, see the fn.h and fn.cyc files in the standard library, which implement first-class closures.

2.15  Restrictions

Though Cyclone adds many new features to C, there are also a number of restrictions that it must enforce to ensure code does not crash. Here is a list of the major restrictions: In addition, there are a number of shortcomings of the current implementation that we hope to correct in the near future. For instance:
Previous Up Next