Loading [MathJax]/jax/output/HTML-CSS/jax.js

Introduction to 3110


Topics:


You might think this course is about OCaml. It's not.

You might think this course is about data structures. It's not.

You might think this course is about "weeding out" from the CS major. It's not.

Nan-in, a Japanese master during the Meiji era (1868-1912), received a university professor who came to inquire about Zen. Nan-in served tea. He poured his visitor's cup full, and then kept on pouring. The professor watched the overflow until he no longer could restrain himself. "It is overfull. No more will go in!" "Like this cup," Nan-in said, "you are full of your own opinions and speculations. How can I show you Zen unless you first empty your cup?"

This course is about making you a better programmer.

It's been observed that there is a 10x difference between professional programmers' productivity. Programming isn't hard. Programming well is very hard.

Programming languages

A great general-purpose programming language...

There are probably thousands of general-purpose languages. But there are no universally great programming languages.

General-purpose languages come and go. In your life you'll likely learn a handful. Today, it's Java and C++. Yesterday, it was Pascal and C. Before that, it was Fortran and Lisp. Who knows what it will be tomorrow? And you'll likely use dozens of special-purpose languages for particular projects. In this fast changing field you need to be able to rapidly adapt.

A good programmer has to learn how to learn new languages.

We use a zillion different programming languages to communicate with machines and one another:

It's crucial that you understand the principles behind programming that transcend the specifics of any specific language. There's no better way to get at these principles than to approach programming from a completely different perspective.

OCaml

We begin this course by studying OCaml for that very reason: it's a vastly different perspective from what most of you will have seen in previous programming courses. Since you've already taken 1110 and 2110, you have learned how to program. This course gives you the opportunity to now learn a new language from scratch and reflect along the way about the difference between programming and programming in a language.

"A language that doesn't affect the way you think about programming is not worth knowing." —Alan J. Perlis (1922-1990), first recipient of the Turing Award

OCaml will change the way you think about programming.

OCaml is a functional programming language. The key linguistic abstraction of functional languages is the mathematical function. A function maps an input to an output; for the same input, it always produces the same output. That is, mathematical functions are stateless: they do not maintain any extra information or state that persists between usages of the function. Functions are first-class: you can use them as input to other functions, and produce functions as output. Expressing everything in terms of functions enables a uniform and simple programming model that is easier to reason about than the procedures and methods found in other families of languages.

OCaml supports a number of advanced features, some of which you will have encountered before, and some of which are likely to be new:

OCaml is a statically-typed and type-safe programming language. A statically-typed language detects type errors at compile time, so that programs with type errors cannot be executed. A type-safe language limits which kinds of operations can be performed on which kinds of data. In practice, this prevents a lot of silly errors (e.g., treating an integer as a function) and also prevents a lot of security problems: over half of the reported break-ins at the Computer Emergency Response Team (CERT, a US government agency tasked with cybersecurity) were due to buffer overflows, something that's impossible in a type-safe language.

Some languages, like Scheme and Lisp, are type-safe but dynamically typed. That is, type errors are caught only at run time. Other languages, like C and C++, are statically typed but not type safe. There's no guarantee that a type error won't occur.

Genealogically, OCaml comes from the line of programming languages whose grandfather is Lisp and includes modern languages such as Clojure, F#, Haskell, and Racket. Functional languages have a surprising tendency to predict the future of more mainstream languages. Java brought garbage collection into the mainstream in 1995; Lisp had it in 1958. Java didn't have generics until version 5 in 2004; the ML family had it in 1990. First-class functions and type inference have been incorporated into mainstream languages like Java, C#, and C++ over the last 10 years, long after functional languages introduced them. By studying functional programming, you get a taste of what might be coming down the pipe next. Who knows what it might be? (My bet would be pattern matching.)


A digression on the history of OCaml.

Robin Milner and others at the Edinburgh Laboratory for Computer Science in Scotland were working on theorem provers in the late '70s and early '80s. Traditionally, theorem provers were implemented in languages such as Lisp. Milner kept running into the problem that the theorem provers would sometimes put incorrect "proofs" (i.e., non-proofs) together and claim that they were valid. So he tried to develop a language that only allowed you to construct valid proofs. ML, which stands for "Meta Language", was the result of that work. The type system of ML was carefully constructed so that you could only construct valid proofs in the language. A theorem prover was then written as a program that constructed a proof. Eventually, this "Classic ML" evolved into a full-fledged programming language.

In the early '80s, there was a schism in the ML community with the French on one side and the British and US on another. The French went on to develop CAML and later Objective CAML (OCaml) while the Brits and Americans developed Standard ML. The two dialects are quite similar. Microsoft introduced its own variant of OCaml called F# in 2005.

Milner received the Turing Award in 1991 in large part for his work on ML. The award citation includes this praise: "ML was way ahead of its time. It is built on clean and well-articulated mathematical ideas, teased apart so that they can be studied independently and relatively easily remixed and reused. ML has influenced many practical languages, including Java, Scala, and Microsoft's F#. Indeed, no serious language designer should ignore this example of good design."


Mutability

Imperative programming languages such as C and Java involve mutable state that changes throughout execution. Commands specify how to compute by destructively changing that state. Procedures (or methods) can have side effects that update state in addition to producing a return value.

The fantasy of mutability is that it's easy to reason about: the machine does this, then this, etc.

The reality of mutability is that whereas machines are good at complicated manipulation of state, humans are not good at understanding it. The essence of why that's true is that mutability breaks referential transparency: the ability to replace expression with its value without affecting the result of a computation. In math, if f(x)=y, then you can substitute y anywhere you see f(x). In imperative languages, you cannot: f might have side effects, so computing f(x) at time t might result in different value than at time t.

It's tempting to believe that there's a single state that the machine manipulates, and that the machine does one thing at a time. Computer systems go to great lengths in attempting to provide that illusion. But it's just that: an illusion. In reality, there are many states, spread across threads, cores, processors, and networked computers. And the machine does many things concurrently. Mutability makes reasoning about distributed state and concurrent execution immensely difficult.

Immutability, however, frees the progammer from these concerns. It provides powerful ways to build correct and concurrent programs. OCaml is primarily an immutable language, like most functional languages. It does support imperative programming with mutable state, but we won't use those features until about two months into the course—in part because we simply won't need them, and in part to get you to quit "cold turkey" from a dependence you might not have known that you had. This freedom from mutability is one of the biggest changes in perspective that 3110 can give you.

Industry

OCaml and other functional languages are nowhere near as popular as C, C++, and Java. OCaml's real strength lies in language manipulation (i.e., compilers, analyzers, verifiers, provers, etc.). This is not surprising, because OCaml evolved from the domain of theorem proving.

That's not to say that functional languages aren't used in industry. There are many industry projects using OCaml and Haskell, among other languages. A Cornellian, Yaron Minsky (PhD '02), wrote a paper about using OCaml in the financial industry (that link must be accessed from inside Cornell's network). It explains how the features of OCaml make it a good choice for quickly building complex software that works.

But ultimately this course is about your education as a programmer, not about finding you a job.

"Education is what remains after one has forgotten everything one learned in school." —Albert Einstein

OCaml does a great job of clarifying and simplifying the essence of functional programming in a way that other languages that blend functional and imperative programming (like Scala) or take functional programming to the extreme (like Haskell) do not. Having learned OCaml, you'll be well equipped to teach yourself any other functional(-inspired) language.

Beauty

A final, non-scientific, subjective reason to study OCaml that I will put forth as my own opinion: OCaml is beautiful.

"Beauty is our Business" —title of a book in honor of Edsger W. Dijkstra

(Dijkstra was the recipient of the Turing award in 1972 for "fundamental contributions to programming." David Gries was an editor of the book.)

OCaml is elegant, simple, and graceful. The code you write can be stylish and tasteful. At first, this might not be apparent. You are learning a new language after all—you wouldn't expect to appreciate Sanskrit poetry on day 1 of SANSK 1131. In fact, you'll likely feel frustrated for awhile as you struggle to express yourself in a new language. So give it some time. I've lost track of how many students have come back to tell me in future semesters how "ugly" other languages felt after they went back to writing in them after 3110.

Aesthetics do matter. Code isn't written just to be executed by machines. It's also written to communicate to humans. Elegant code is easier to read and maintain. It isn't necessarily easier to write.

What about data structures?

That phrase is in the course title for historical reasons, as I understand it. There was once a 4000-level course titled Data Structures. It got split up and some of it injected into CS 2110 (Object-oriented Programming and Data Structures) and some of it into CS 3110 (Data Structures and Functional Programming). Over time more of the data structures content has migrated into 2110. We'll use data structures as guiding examples in this course, especially lists, trees, and dictionaries. But they aren't the primary content.

Summary

This course is about becoming a better programmer. Studying functional programming will help with that. The biggest obstacle in our way is the frustration of speaking a new language, particularly letting go of mutable state. But the benefits will be great: a discovery that programming transcends programming in any particular language or family of languages, an exposure to advanced language features, and an appreciation of beauty.

Terms and concepts

Further reading