Introduction to 3110

# Introduction to 3110 * * * <i> Topics: * what 3110 is and is not about * functional programming and why we study it * the features and history of OCaml </i> * * * You might think this course is about OCaml. It's not. You might think this course is about data structures. It's not. You might think this course is about "weeding out" from the CS major. It's not. > Nan-in, a Japanese master during the Meiji era (1868-1912), > received a university professor who came to inquire about Zen. > Nan-in served tea. He poured his visitor's cup full, and then > kept on pouring. The professor watched the overflow until he > no longer could restrain himself. "It is overfull. No more will go in!" > "Like this cup," Nan-in said, "you are full of your own opinions > and speculations. How can I show you Zen unless you first empty your cup?" **This course is about making you a better programmer.** It's been observed that there is a [10x difference][10x] between professional programmers' productivity. Programming isn't hard. Programming well is very hard. [10x]: http://www.construx.com/10x_Software_Development/Productivity_Variations_Among_Software_Developers_and_Teams__The_Origin_of_10x/ ## Programming languages A great general-purpose programming language... - lets you say things concisely and understandably at the right level of abstraction - lets you extend the language with new features that are specific to a domain but blend in well with the rest of the language. - makes it easy to write *correct* code, with good performance - makes it easy to change the code when you find out the specification has changed - makes it easy to re-use code - is easy to learn. There are probably thousands of general purpose languages. But there are no universally great programming languages. General-purpose languages come and go. In your life you'll likely learn a handful. Today, it's Java and C++. Yesterday, it was Pascal and C. Before that, it was Fortran and Lisp. Who knows what it will be tomorrow? And you'll likely use dozens of special-purpose languages for particular projects. In this fast changing field you need to be able to rapidly adapt. **A good programmer has to learn *how to learn* new languages.** We use a zillion different programming languages to communicate with machines and one another: - general purpose and scripting: Fortran, Lisp, Basic, C, Pascal, Scheme, C++, Java, C#, Visual Basic, Perl, Python, Ruby, PHP, Javascript, Clojure, Scala, Erlang, Swift, ... - tools: awk, sed, tcl, sh, csh, bash, ... - search: regular expressions, browser queries, SQL, ... - display and rendering: PostScript, PDF, HTML, XML, ... - hardware: CCS, VHDL, Verilog, ... - theorem proving and mathematics: Mathematica, Maple, Matlab, R, NuPRL, Isabelle/HOL, ACL2, Coq It's crucial that you understand the *principles* behind programming that transcend the specifics of any specific language. There's no better way to get at these principles than to approach programming from a completely different perspective. ## OCaml We begin this course by studying OCaml for that very reason: it's a vastly different perspective from what most of you will have seen in previous programming courses. Since you've already taken 1110 and 2110, you have learned how to program. This course gives you the opportunity to now learn a new language from scratch and reflect along the way about the difference between *programming* and *programming in a language.* > "A language that doesn't affect the way you think about > programming is not worth knowing." > —Alan J. Perlis (1922-1990), first recipient of the Turing Award **OCaml will change the way you think about programming.** OCaml is a *functional* programming language. The key linguistic abstraction of functional languages is the mathematical function. A function maps an input to an output; for the same input, it always produces the same output. That is, mathematical functions are *stateless*: they do not maintain any extra information or *state* that persists between usages of the function. Functions are *first-class*: you can use them as input to other functions, and produce functions as output. Expressing everything in terms of functions enables a uniform and simple programming model that is easier to reason about than the procedures and methods found in other families of languages. OCaml supports a number of advanced features, some of which you will have encountered before, and some of which are likely to be new: - **Algebraic datatypes:** You can build sophisticated data structures in OCaml easily, without fussing with pointers and memory management. *Pattern matching* makes them even more convenient. - **Type inference:** You do not have to write type information down everywhere. The compiler automatically figures out most types. This can make the code easier to read and maintain. - **Parametric polymorphism:** Functions and data structures can be parameterized over types. This is crucial for being able to re-use code. - **Garbage collection:** Automatic memory management relieves you from the burden of memory allocation and deallocation, a common source of bugs in languages such as C. - **Modules:** OCaml makes it easy to structure large systems through the use of modules. Modules (called *structures*) are used to encapsulate implementations behind interfaces (called *signatures*). OCaml goes well beyond the functionality of most languages with modules by providing functions that manipulate modules (called *functors*). OCaml is a *statically typed* and *type-safe* programming language. A statically typed language detects type errors at compile time, so that programs with type errors cannot be executed. A type-safe language ensures that you don't apply operations to the wrong data. In practice, this prevents a lot of silly errors (e.g., treating an integer as a function) and also prevents a lot of security problems: over half of the reported break-ins at the Computer Emergency Response Team (CERT, a US government agency tasked with cybersecurity) were due to buffer overflows, something that's impossible in a type-safe language. Some languages, like Scheme and Lisp, are type-safe but *dynamically typed*. That is, type errors are caught only at run-time. Other languages, like C and C++, are statically typed but not type-safe. There's no guarantee that a type error won't occur. Genealogically, OCaml comes from the line of programming languages whose grandfather is Lisp and includes modern languages such as Clojure, F#, Haskell, and Racket. Functional languages have a surprising tendency to predict the future of more mainstream languages. Java brought garbage collection into the mainstream in 1995; Lisp had it in 1958. Java didn't have generics until version 5 in 2004; the ML family had it in 1990. First-class functions and type inference have been incorporated into mainstream languages like Java, C#, and C++ over the last 10 years, long after functional languages introduced them. By studying functional programming, you get a taste of what might be coming down the pipe next. Who knows what it might be? (My bet would be pattern matching.) * * * <i> **A digression on the history of OCaml.** Robin Milner and others at the Edinburgh Laboratory for Computer Science in Scotland were working on theorem provers in the late '70s and early '80s. Traditionally, theorem provers were implemented in languages such as Lisp. Milner kept running into the problem that the theorem provers would sometimes put incorrect "proofs" (i.e., non-proofs) together and claim that they were valid. So he tried to develop a language that only allowed you to construct valid proofs. ML, which stands for "Meta Language", was the result of that work. The type system of ML was carefully constructed so that you could only construct valid proofs in the language. A theorem prover was then written as a program that constructed a proof. Eventually, this "Classic ML" evolved into a full-fledged programming language. In the early '80s, there was a schism in the ML community with the French on one side and the British and US on another. The French went on to develop CAML and later Objective CAML (OCaml) while the Brits and Americans developed Standard ML. The two dialects are quite similar. Microsoft introduced its own variant of OCaml called F# in 2005. Milner received the Turing Award in 1991 in large part for his work on ML. The award citation includes this praise: "ML was way ahead of its time. It is built on clean and well-articulated mathematical ideas, teased apart so that they can be studied independently and relatively easily remixed and reused. ML has influenced many practical languages, including Java, Scala, and Microsoft's F#. Indeed, no serious language designer should ignore this example of good design." </i> * * * ## Mutability *Imperative* programming languages such as C and Java involve *mutable* state that changes throughout execution. *Commands* specify how to compute by destructively changing that state. Procedures (or methods) can have *side effects* that update state in addition to producing a return value. The **fantasy of mutability** is that it's easy to reason about: the machine does this, then this, etc. The **reality of mutability** is that whereas machines are good at complicated manipulation of state, humans are not good at understanding it. The essence of why that's true is that mutability breaks *referential transparency*: the ability to replace expression with its value without affecting the result of a computation. In math, if f(x)=y, then you can substitute y anywhere you see f(x). In imperative languages, you cannot: f might have side effects, so computing f(x) at time t might result in different value than at time t'. It's tempting to believe that there's a single state that the machine manipulates, and that the machine does one thing at a time. Computer systems go to great lengths in attempting to provide that illusion. But it's just that: an illusion. In reality, there are many states, spread across threads, cores, processors, and networked computers. And the machine does many things concurrently. Mutability makes reasoning about distributed state and concurrent execution immensely difficult. *Immutability*, however, frees the progammer from these concerns. It provides powerful ways to build correct and concurrent programs. OCaml is primarily an immutable language, like most functional languages. It does support imperative programming with mutable state, but we won't use those features until about two months into the course—in part because we simply won't need them, and in part to get you to quit "cold turkey" from a dependence you might not have known that you had. This freedom from mutability is one of the biggest changes in perspective that 3110 can give you. ## Industry OCaml and other functional languages are nowhere near as popular as C, C++, and Java. OCaml's real strength lies in language manipulation (i.e., compilers, analyzers, verifiers, provers, etc.). This is not surprising, because OCaml evolved from the domain of theorem proving. That's not to say that functional languages aren't used in industry. There are many [industry projects using OCaml][ocaml-industry] and [Haskell][haskell-industry], among other languages. A Cornellian, Yaron Minsky (PhD '02), wrote a paper about [using OCaml in the financial industry][minsky] (that link must be accessed from inside Cornell's network). It explains how the features of OCaml make it a good choice for quickly building complex software that works. [minsky]: http://dx.doi.org/10.1017/S095679680800676X [ocaml-industry]: https://ocaml.org/learn/companies.html [haskell-industry]: https://wiki.haskell.org/Haskell_in_industry But ultimately this course is about your education as a programmer, not about finding you a job. > "Education is what remains after one has forgotten everything one learned > in school." > —Albert Einstein OCaml does a great job of clarifying and simplifying the essence of functional programming in a way that other languages that blend functional and imperative programming (like Scala) or take functional programming to the extreme (like Haskell) do not. Having learned OCaml, you'll be well equipped to teach yourself any other functional(-inspired) language. ## Beauty A final, non-scientific, subjective reason to study OCaml that I will put forth as my own opinion: OCaml is beautiful. > "Beauty is our Business" > —title of a book in honor of Edsger W. Dijkstra (Dijkstra was the recipient of the Turing award in 1972 for "fundamental contributions to programming." David Gries was an editor of the book.) OCaml is elegant, simple, and graceful. The code you write can be stylish and tasteful. At first, this might not be apparent. You are learning a new language after all—you wouldn't expect to appreciate Sanskrit poetry on day 1 of [SANSK 1131][sansk1131]. In fact, you'll likely feel frustrated for awhile as you struggle to express yourself in a new language. So give it some time. I've lost track of how many students have come back to tell me in future semesters how "ugly" other languages felt after they went back to writing in them after 3110. [sansk1131]: http://lrc.cornell.edu/asian/courses/sa/sansk131 Aesthetics do matter. Code isn't written just to be executed by machines. It's also written to communicate to humans. Elegant code is easier to read and maintain. It isn't necessarily easier to write. ## What about data structures? That phrase is in the course title for historical reasons, as I understand it. There was once a 4000-level course titled Data Structures. It got split up and some of it injected into CS 2110 (Object-oriented Programming and Data Structures) and some of it into CS 3110 (Data Structures and Functional Programming). Over time some of the data structures content has migrated into 2110. We'll use data structures as guiding examples in this course, especially lists and trees. But they aren't the primary content. ## Summary This course is about becoming a better programmer. Studying functional programming will help with that. The biggest obstacle in our way is the frustration of speaking a new language, particularly letting go of mutable state. But the benefits will be great: a discovery that programming transcends programming in any particular language or family of languages, an exposure to advanced language features, and an appreciation of beauty. ## Terms and concepts * dynamic typing * first-class functions * functional programming languages * immutability * Lisp * ML * OCaml * referential transparency * side effects * state * static typing * type safety ## Further reading * [Introduction to Objective Caml](http://courses.cms.caltech.edu/cs134/cs134b/book.pdf), chapters 1 and 2, a freely available textbook that is recommended for this course * [OCaml from the Very Beginning](http://ocaml-book.com/), chapter 1, a relatively inexpensive PDF textbook that is very gentle and recommended for this course * [A guided tour [of OCaml]](https://realworldocaml.org/v1/en/html/a-guided-tour.html): chapter 1 of *Real World OCaml*, a more agressive book written by some Cornellians that some students might enjoy reading * [The history of Standard ML](http://sml-family.org/history/): though it focuses on the SML variant of the ML language, it's relevant to OCaml * [The value of values](https://www.infoq.com/presentations/Value-Values): a lecture by the designer of Clojure (a modern dialect of Lisp) on how the time of imperative programming has passed