This directory contains sources for a compiler from a small, safe subset

of C (called popcorn) to x86 typed asssembly.  Popcorn is more like Java

in that all structured values (i.e., arrays, structs, unions, and tuples)

are heap-allocated and the language is strongly typed.  Popcorn is like

C in that functions can only be defined at the top-level, and programmers

have fairly good control over data layout.  Furthermore, compilation units,

scoping, and linking happen just as in C.  Finally, popcorn makes it very

easy to link against C code or libraries.



The command line program popcorn.exe is a compiler in the spirit of

cc.  So, for example, cd'ing to the test directory and typing:



	popcorn -o test_list.exe list.pop test_list.pop



will compile the two files list.pop and test_list.pop, producing

TAL and object files for these two modules, and then link the two

together (with the runtime routines) to generate an executable.



Use popcorn -help to find out about more options.  Currently, the

compiler is very simple-minded but we hope to remedy this soon.



The runtime is minimal but it's easy to add new functions to it.

(See ..\runtime\pop_runtime.tali and pop_runtime.c).  The runtime

includes the Boehm-Demers-Weiser conservative collector.  



LANGUAGE FEATURES:

  - base types:  void, int, char, boolean, string

	- most operations are as in Java

	- strings are immutable but are treated much like arrays,

		so if x is a string, x[i] returns the i^th character of x.

	- strings only support ascii for now.

  - control constructs:

	- if:  optional else -- as in C/Java

	- switch:  on either ints, chars, (or unions -- see below)

		- note that cases on a switch do not fall through so there is

		  no need to use break to get to the end of the switch.

	- while,do:  as in C/Java

	- break/continue:  as in C for loops

	- for:  close to C/Java but you're restricted in the number

		of expressions you can put in.  

	- function call: as in C

  - arrays:  

	- the syntax for arrays is close to C.  For example:

		int x[];   /* array of integers */

		int y[][]; /* array of array of integers */

	- functions that return arrays put the "[]" at the end:

		int sort(int x[])[]  /* takes and returns an integer array */

	- subscript and update syntax are as in Java (0-based):

		z = x[i];  /* extract i^th element of x */

		x[i+j] = 3;  /* set i^th element of x to 3 */

		z = size(x) /* returns the # of elements in x */

	- subscript and update indices are checked to make sure they're in

	  bounds.

	- array literals are created by:

		new {1,2,3,4,5}   /* creates an array whose elements are 

				   * 1,2,3,4, and 5 */

		new {:int}	  /* creates an empty integer array */

  - structs:

	- structs must be declared (no anonymous structs) and behave much

	  like (method-less and subtyping-less) Java objects or ML-style

	  records.

	- structs can only be declared at top-level.

	- structs can have constant (i.e., immutable) fields.  By default,

		fields are mutable.

	- structs can be optionally null (by using the "?" qualifier).  

		(see below).

	- by default, struct declarations have public scope.  The static

		qualifier restricts their scope to the current module.

		The extern qualifier imports the definition from another

		module.  (At most one module can define the struct publically.)

	- structs are created by using "new <sid>(e1,...,en)"

		where <sid> is the name of the struct and e1,...,en are

		the values of the fields.

	- you must explicitly say which struct type when using "null"

	- struct fields are accessed using "." notation (e.g., s.f)

	- if you attempt to access a field of a "?" struct, then the

		compiler automatically inserts a null check to ensure

		that the struct is not null.

	- struct definitions can be recursive

	- example definitions:



		struct int_pair {     // pair of integers with integer fields 

		  int first;	      // first and second 

		  int second;

		}



		struct int_pair {     // equivalent to above

		  int first, second;

		}

		

		static struct int_pair {

		  int first, second;     // only accessible in this module

		}

		

		extern struct int_pair {

		  int first, second;     // declared elsewhere

		}

	

		struct int_pair {

		  const int first;       // first field immutable

		  int second;		 // second field mutable

		}



		?struct int_list {      // a value of type int_list can be

		  int i;		// either "null int_list" or a struct

		  next int_list;	// containing an int and a pointer to

	        }			// an int_list.  



		?struct my_struct {

		  int i[];		// field is an integer array

		  your_struct y;	// pointer to a your_struct value

		}			// (never null)



		struct your_struct {

		  string z;		// field is a string

		  my_struct m;		// points to a my_struct value

		}			// (possibly null)



	- example uses:



		/* declares and initializes x to be an int_pair */

		int_pair x = new int_pair(3,4);



		x.first++;   // increments value in first field of x

		x.first += 1; // ditto

		x.first = x.first + 1;  // ditto



		/* prints the sum of the components of x */

		print_int(x.first + x.second)	



		/* non-destructive append */

		int_list append(int_list x,int_list y) 

		{

		  if (x == null int_list) 

		     return(y);

		  else

		     return(new int_list(x.i,append(x.next,y)));

		}

  - tuples:

	- like anonymous structs with immutable fields.  

		*(int,int) x = new(3,4);  // x is a pointer to a pair of ints

		

		print_int(x.1 + x.2);  // fields have names 1,2,3,etc.

  - unions:

	- more like ML datatypes (i.e., tagged unions or tagged variant 

	  records) than C unions.

	- unions can be recursive.

	- types are declared the same way as structs (except no ? option).

	- "members" or fields cannot be directly accessed -- must do a

		switch to determine which case.

	- same scope qualifiers (static, extern) as structs.

	- fields with void type don't carry values:

	- example type definitions:



		// like an enumeration -- represented as integers

		union weekday {

		  void MON; void TUE; void WED; void THU; 

		  void FRI; void SAT; void SUN;

		}



		// like:  datatype exp = Var of string | App of exp*exp |

		//		 Lambda of string * exp

		union exp {

		  string        var;

		  *(exp,exp)    app;

		  *(string,exp) lambda;

		}



	- there's no need for the "?" qualifier since you can declare

	  a null field having void type:



		union nexp {

		  void           null_exp;

		  string         var;

		  *(nexp,nexp)   app;

		  *(string,nexp) lambda;

		}

		  

	- To create a union value, you must give the union name, the field 

	  name, and any arguments (none only if the type of the field is void):



		weekday today = new weekday.SUN;



		exp e1 = new exp.var("x");

		exp e2 = new exp.app(e1,e1);

		exp e3 = new exp.lambda("x",e2);



	- To deconstruct a union value, you must use a switch.  The cases

	  of the switch define a variable which will be bound to the contents

	  of the field (if any).  For instance, the following function will

	  print out an exp value:



		void print_exp(exp e) 

		{

		   switch e {

		     case var(x): 

			print_string("x");   // x has type string in this block

		     case app(x): 

			print_string("(");   // x has type *(exp,exp)

			print_exp(x.1);      // in this block

			print_string(" ");

			print_exp(x.2);

			print_string(")");

		     case lambda(l):           // could call the variable

			print_string("(fn ");  // something else -- say l

			print_string(l.1);

			print_string(" => ");

			print_exp(l.2);

			print_string(")");

		     }

		}



	- A default: case can be used in the switch.  The switch must cover

	  all of the cases (unless there's a default) and no case can be 

	  duplicated.  

	- Note that unlike C switches, no break is needed to transfer control

	  to the end of the switch.  (That is, cases do not fall through to

	  the next case.)

	- A case should not declare a variable if the field has no type.

	  For instance, the same code for nexp:



		void print_nexp(nexp e) 

		{

		   switch e {

		     case null_exp:   // notice no variable declared here

			print_string("ERROR -- null expression")

		     case var(x): 

			print_string("x");   

		     case app(x): 

			print_string("(");   

			print_nexp(x.1);      

			print_string(" ");

			print_nexp(x.2);

			print_string(")");

		     case lambda(l):         

			print_string("(fn ");

			print_string(l.1);

			print_string(" => ");

			print_nexp(l.2);

			print_string(")");

		     }

		}



	- it's more space efficient to use ?structs when possible, but

	  more time efficient (i.e., fewer null checks) to use unions

	  and explicit switches.  Both inefficiencies will be remedied

	  at some point.



  - functions:

	For the most part, functions are declared and used the same way

	as in C.  The syntax for function types is quite different as

	we wanted to support first-class functions in a bit more clean

	manner.  See the grammar or the example tests\map.pop for an

	example.



	Like structs and unions, function declarations can be modified

	by "extern" or "static".  Extern declarations should have no

	body.



	The order of function declarations does not matter.



	The compiler is pretty anal about making sure you return something.

	So, for instance, it will reject the following code:



		int foo(int x) {



		  while (true) {

		    x++;

		  }

		}



	There is a distinction between function labels and function variables.

	The former are what you declare (as in foo above) and the latter

	are what you assign or have as parameters to functions, structs,

	etc.  (This is not currently enforced by the type checker at the

	popcorn level but is as the TAL level.)



  - abstract types:



	You can declare an external, abstract type:



		extern hidden_type;



	As long as you link against something that provides a definition

	for this type, things are cool.  As the type is abstract, you

	cannot directly manipulate values of this type.  



	(Hmmm.  How to declare that a type is meant to be abstract?)



PLANNED LANGUAGE FEATURES:

	- top-level variables

	- Trevor Jim and Luke Hornoff at UPenn are working on runtime

	  code generation extensions to both TAL and popcorn.  Some

	  of those features have already started to creep in.

	- exceptions (a la ML)

	- polymorphism:

		- first-class Forall and Exist types

		- polymorphic type constructors (e.g., <T>list)

		- minimal support for local type inference

	- subtyping:

		- width on structs, covariant immutable fields, invariant

			on mutable fields

		- usual contra/co on functions, etc.

		- perhaps some form of F-bounded quantifiers

	- floats

	- threads

	- perhaps some form of objects



PLANNED IMPLEMENTATION FEATURES (all in the works):

	- real register allocator

	- some optimization (e.g., constant propagation, loop invariant removal

		especially of null checks.)

	- array bound check elimination

	- our own assembler so we can chuck MASM

		- NB: MASM dies when we feed it types that are too large

		  This goes away when we integrate our own assembler.

		- Also, then we can provide support for Linux/BSD as well

		  as Win32.

