In the second and third lectures, we started a deep dive into the OCaml syntax, semantics, and types. Additional references: - last semester's slides (WARNING: evaluation rules have changed! see note below) http://www.cs.cornell.edu/Courses/cs3110/2014fa/lectures/2/lec02.pdf http://www.cs.cornell.edu/Courses/cs3110/2014fa/lectures/3/lec03.pdf http://www.cs.cornell.edu/Courses/cs3110/2014fa/lectures/4/lec04.pdf http://www.cs.cornell.edu/Courses/cs3110/2014fa/lectures/5/lec05.pdf Syntax ====== Syntax is kind of boring; each language has its own, and as part of learning the language, you have to learn the syntax. We will present syntax using BNF, which stands for "Backus Naur Form". A value is one of the following: v ::= n | fpn | str | chr | b Here the vertical bar can be read "or"; this is a list of possible values. n stands for any integer: n ::= ... | -2 | -1 | 0 | 1 | 2 | ... fpn stands for a floating point number, which must contain a decimal point. Str stands for a string, which must be surrounded by double quotes, while characters must be surrounded by single quotes. b is a boolean: b ::= true | false OCaml programs are given by expressions. Here are the expressions we have presented so far: e ::= v | e1 + e2 | if e1 then e2 else e3 | let x = e1 in e2 e1 + e2 is a stand-in for many different operators. There are the typical arithmetic operators +, -, *, and /; these only operate on integers. There are separate operators for floating point numbers: "+." "*." "-." and "/.". The ^ operator concatenates strings. The @ operator appends lists (we'll see this later). An if expression evaluates to one of two subexpressions depending on the condition. Unlike Java, if is an expression; it is analogous to the ?: operator in Java or C++. let expressions introduce local shorthands for values. You may also see "let"s without "in"s: let x = 5;; These are only allowed at the top level, and they are shorthand for let x = 5 in (... everything that follows in my program ...) Evaluation rules ================ Every OCaml program is an expression. The execution of the program consists of repeatedly simplifying (or "reducing") the expression according to evaluation rules. Once the program reduces to a value (which is just an expression that can't be reduced any further), the program terminates and outputs that value. This process can be started in the ocaml interpreter by typing an expression followed by ;;. We write down the simplification rules using an -->, which can be read "steps to". Here are the rules we have covered in class: e1 + e2 --> e1' + e2 if e1 --> e1' v1 + e2 --> v1 + e2' if e2 --> e2' n1 + n2 --> n3 where n3 is the sum of n1 and n2 if e1 then e2 else e3 --> if e1' then e2 else e3 if e1 --> e1' if true then e2 else e3 --> e2 if false then e2 else e3 --> e3 let x = e1 in e2 --> let x = e1' in e2 if e1 --> e1' let x = v in e2 --> e2{v/x} e{v/x} means "e with v plugged in for x", and is defined below e1 e2 --> e1' e2 if e1 --> e1' v e2 --> v e2' if e2 --> e2' (fun x -> e) v2 --> e{v2/x} match e with | p1 -> e1 | p2 -> e2 | ... --> match e' with | p1 -> e1 | p2 -> e2 | ... if e --> e' match v with | p1 -> e1 | p2 -> e2 | ... --> e1{v1/x1}{v2/x2}...{vn/xn} if v = p1{v1/x1}{v2/x2}...{vn/xn} match v with | p1 -> e1 | p2 -> e2 | ... --> match v with | p2 -> e2 | ... if v does not match p1 !!! WARNING: the evaluation rules we're using this semester are DIFFERENT from past semesters. In the past, the --> symbol gives the final value after an expression is evaluated; in this semester, the --> symbol only gives you the very next step. For example, last semester we would write let x = (3 + 5) in (x + x) --> 16 but this semester we instead write let x = (3 + 5) in (x + x) --> let x = 8 in (x + x) The former style is called "big step semantics"; our semester's model is called "small step semantics". We hope that our style this semester is a little more natural. !!! Substitution ============ The rules we are presenting here are called the "substitution model", because variables are handled by plugging the values in for the variables as the expression is simplified. For example, we have (let x = 3 in x + x) --> (3 + 3) Here we've plugged in 3 for every occurence of x. We have to be a little careful when substituting values in expressions that themselves define new variables. Here are the rules: n{v'/x} = n x{v/x} = v y{v/x} = y if y is not equal to x (e1 + e2) {v/x} = e1{v/x} + e2{v/x} (let x = e1 in e2){v/x} = let x = e1{v/x} in e2 (let y = e1 in e2){v/x} = let y = e1{v/x} in e2{v/x} if x is not equal to y It is a good exercise to write these down without looking at the notes. For practice, you can work out the value of (fun x -> e){v/y} in the cases where x = y and in the cases where x <> y. Type checking ============= In OCaml, every expression has a type. Before evaluating any expression, OCaml first checks to ensure that it can be given a type. Unlike many languages with static types (types that are assigned before the program runs), OCaml often allows you to leave out the type annotations. When talking about types, we will write e : t to indicate "e has type t". You can also write this in an OCaml program: e ::= ... | (e : t) Writing e : t tells the compiler to check that the inferred type for e is compatible with the type you wrote down. There are the usual types: t ::= int | char | bool | float | string With rules for type checking the values of those expressions: n : int true : bool false : bool "..." : string 'x' : char Other expressions are type checked by looking at sub expressions: e1 *. e2 : float if e1 : float and e2 : float if e1 then e2 else e3 : t if e1 : bool and e2 : t and e3 : t let x = e1 in e2 : t if e1 : t1 and (under the assumption that x : t1) e2 : t Functions ========= OCaml is a functional language; which means that functions are values: v ::= ... | (fun x -> e) The value (fun x -> e) is a function that takes an argument, and when applied, evaluates to the e with its argument plugged in for x. For example, here is a function that squares its argument: fun x -> x * x Note that this function has no name; functions are anonymous. However, we can give them a name using a let expression: let square = fun x -> x * x This syntax is a little cumbersome. OCaml has a much nicer shorthand for defining functions; the above can be written as let square x = x * x You can read this as "whenever you see square x, replace it with x*x", but keep in mind that this is just shorthand for the expression above using the fun keyword. Functions are applied by separating them from their arguments using whitespace: e ::= ... | e1 e2 This is different from how you are used to writing function calls, using parentheses; in Java you would write e1 e2 as e1(e2). Functions are evaluated using substitution: (fun x -> e) v --> e {v/x} As with the other kinds of expressions, there are also substitution model rules for evaluating functions and their arguments if they are not values: e1 e2 --> e1' e2 if e1 --> e1' v1 e2 --> v1 e2' if e2 --> e2' Function types are written using an ->. t1 -> t2 is the type of functions that take an argument of type t1 and produce a value of type t2. For example: (fun x -> x + 3) : int -> int because the function can only take an int as an argument (because + only operates on ints), and if passed an int, it will evaluate to an int. Similarly float_of_int : int -> float In general, the rule for type checking a function is (fun x -> e) : t1 -> t2 if e : t2 under the assumption that x : t1 Variant types ============= OCaml has a very nice feature for defining your own types. You can define a new type name by writing "type = " at the top level (note that this is not an expression). For example: type foo = int type bar = char This defines a type synonym; from now on you can use "foo" and "int" interchangably. This helps with documentation. This feature can also be used to define new types that can be one of a list of options. For example: type suit = Hearts | Spades | Diamonds | Clubs This defines a new type suit that has 4 values (Hearts, Spades, Diamonds and Clubs). These types are called variant types, algebraic datatypes, or sum types. You can also associate data with the different values of a variant type using the "of" keyword: type x = Foo of float | Int of int This type now has the following values: Foo 3.0, Foo 0.1, ..., Int 0, Int 1, ... The different labels that are used to construct variant values are called constructors. They must start with capital letters. v ::= ... | C v e ::= ... | C e Pattern matching ================ Variants (and other values) can be pulled apart using a match statement. A match expression compares a value with a collection of patterns; the first pattern that matches is selected, and the whole match expression evaluates to the associated branch. For example, consider the following match expression: match x with | Foo y -> y | Int z -> float_of_int z if x is actually Foo 3.5, then this expression matches the first pattern (Foo y) if we plug 3.5 in for y. Therefore, we evaluate the whole expression to the right hand side of the Foo y pattern, with 3.5 substituted for y: match Foo 3.5 with Foo y -> y | Int z -> float_of_int z --> 3,5 However, if x was Int 9, then there is no choice of y that will make Int 9 match Foo y, so we proceed to the second pattern. Here, z must be 9 to make the match work, so the whole expression evaluates to float_of_int z with 9 substituted for z: match Int 9 with Foo y -> y | Int z -> float_of_int z --> float_of_int 9 In general, a pattern is a value with some parts replaced by variables. The pattern matches a value if there is some assignment of values to those variables that makes the pattern equal to the value. That is, p matches v if p{v1/x1}{v2/x2}...{vn/xn} = v. Using this definition, the rules described above can be formalized: match e with | p1 -> e1 | p2 -> e2 | ... --> match e' with | p1 -> e1 | p2 -> e2 | ... if e --> e' match v with | p1 -> e1 | p2 -> e2 | ... --> e1{v1/x1}{v2/x2}...{vn/xn} if v = p1{v1/x1}{v2/x2}...{vn/xn} match v with | p1 -> e1 | p2 -> e2 | ... --> match v with | p2 -> e2 | ... if v does not match p1