# Introduction to 3110
* * *
Topics:
* what 3110 is and is not about
* functional programming and why we study it
* the features and history of OCaml
* * *
You might think this course is about OCaml. It's not.
You might think this course is about data structures. It's not.
You might think this course is about "weeding out" from the CS major. It's not.
> Nan-in, a Japanese master during the Meiji era (1868-1912),
> received a university professor who came to inquire about Zen.
> Nan-in served tea. He poured his visitor's cup full, and then
> kept on pouring. The professor watched the overflow until he
> no longer could restrain himself. "It is overfull. No more will go in!"
> "Like this cup," Nan-in said, "you are full of your own opinions
> and speculations. How can I show you Zen unless you first empty your cup?"
**This course is about making you a better programmer.**
It's been observed that there is a [10x difference][10x] between
professional programmers' productivity. Programming isn't hard.
Programming well is very hard.
[10x]: http://www.construx.com/10x_Software_Development/Productivity_Variations_Among_Software_Developers_and_Teams__The_Origin_of_10x/
## Programming languages
A great general-purpose programming language...
- lets you say things concisely and understandably at the right level
of abstraction
- lets you extend the language with new features that are specific to
a domain but blend in well with the rest of the language.
- makes it easy to write *correct* code, with good performance
- makes it easy to change the code when you find out the specification
has changed
- makes it easy to re-use code
- is easy to learn.
There are probably thousands of general purpose languages.
But there are no universally great programming languages.
General-purpose languages come and go. In your life you'll likely learn
a handful. Today, it's Java and C++. Yesterday, it was Pascal and C.
Before that, it was Fortran and Lisp. Who knows what it will be tomorrow?
And you'll likely use dozens of special-purpose languages for particular
projects. In this fast changing field you need to be able to
rapidly adapt.
**A good programmer has to learn *how to learn* new languages.**
We use a zillion different programming languages to communicate with
machines and one another:
- general purpose and scripting: Fortran, Lisp, Basic, C, Pascal, Scheme, C++,
Java, C#, Visual Basic, Perl, Python, Ruby, PHP, Javascript, Clojure, Scala,
Erlang, Swift, ...
- tools: awk, sed, tcl, sh, csh, bash, ...
- search: regular expressions, browser queries, SQL, ...
- display and rendering: PostScript, PDF, HTML, XML, ...
- hardware: CCS, VHDL, Verilog, ...
- theorem proving and mathematics: Mathematica, Maple, Matlab, R, NuPRL,
Isabelle/HOL, ACL2, Coq
It's crucial that you understand the *principles* behind programming
that transcend the specifics of any specific language. There's no better
way to get at these principles than to approach programming from a
completely different perspective.
## OCaml
We begin this course by studying OCaml for that very reason:
it's a vastly different perspective from what most of you will have seen
in previous programming courses. Since you've already taken 1110 and 2110, you
have learned how to program. This course gives you the opportunity to
now learn a new language from scratch and reflect along the way about
the difference between *programming* and *programming in a language.*
> "A language that doesn't affect the way you think about
> programming is not worth knowing."
> —Alan J. Perlis (1922-1990), first recipient of the Turing Award
**OCaml will change the way you think about programming.**
OCaml is a *functional* programming language. The key linguistic
abstraction of functional languages is the mathematical function. A
function maps an input to an output; for the same input, it always
produces the same output. That is, mathematical functions are
*stateless*: they do not maintain any extra information or *state* that
persists between usages of the function. Functions are *first-class*:
you can use them as input to other functions, and produce functions as
output. Expressing everything in terms of functions enables a uniform
and simple programming model that is easier to reason about than the
procedures and methods found in other families of languages.
OCaml supports a number of advanced features, some of which you will
have encountered before, and some of which are likely to be new:
- **Algebraic datatypes:** You can build sophisticated data
structures in OCaml easily, without fussing with pointers and
memory management. *Pattern matching* makes them even more
convenient.
- **Type inference:** You do not have to write type information
down everywhere. The compiler automatically figures out most
types. This can make the code easier to read and maintain.
- **Parametric polymorphism:** Functions and data
structures can be parameterized over types. This is crucial for
being able to re-use code.
- **Garbage collection:** Automatic memory
management relieves you from the burden of memory allocation and deallocation,
a common source of bugs in languages such as C.
- **Modules:** OCaml makes it easy to structure large
systems through the use of modules. Modules (called *structures*)
are used to encapsulate implementations behind interfaces (called
*signatures*). OCaml goes well beyond the functionality of most
languages with modules by providing functions that manipulate
modules (called *functors*).
OCaml is a *statically typed* and *type-safe* programming language. A
statically typed language detects type errors at compile time, so that
programs with type errors cannot be executed. A type-safe language
ensures that you don't apply operations to the wrong data. In practice,
this prevents a lot of silly errors (e.g., treating an integer as a
function) and also prevents a lot of security problems: over half of
the reported break-ins at the Computer Emergency Response Team (CERT, a
US government agency tasked with cybersecurity) were due to buffer
overflows, something that's impossible in a type-safe language.
Some languages, like Scheme and Lisp, are type-safe but *dynamically
typed*. That is, type errors are caught only at run-time. Other
languages, like C and C++, are statically typed but not type-safe.
There's no guarantee that a type error won't occur.
Genealogically, OCaml comes from the line of programming languages whose
grandfather is Lisp and includes modern languages such as Clojure, F#, Haskell,
and Racket. Functional languages have a surprising tendency to
predict the future of more mainstream languages. Java brought garbage
collection into the mainstream in 1995; Lisp had it in 1958. Java didn't
have generics until version 5 in 2004; the ML family had it in 1990.
First-class functions and type inference have been incorporated into
mainstream languages like Java, C#, and C++ over the last 10 years, long
after functional languages introduced them. By studying functional
programming, you get a taste of what might be coming down the pipe next.
Who knows what it might be? (My bet would be pattern matching.)
* * *
**A digression on the history of OCaml.**
Robin Milner and others at the Edinburgh Laboratory for Computer Science
in Scotland were working on theorem provers in the late '70s and early
'80s. Traditionally, theorem provers were implemented in languages such
as Lisp. Milner kept running into the problem that the theorem provers
would sometimes put incorrect "proofs" (i.e., non-proofs) together and
claim that they were valid. So he tried to develop a language that only
allowed you to construct valid proofs. ML, which stands for "Meta
Language", was the result of that work. The type system of ML was
carefully constructed so that you could only construct valid proofs in
the language. A theorem prover was then written as a program that
constructed a proof. Eventually, this "Classic ML" evolved into a
full-fledged programming language.
In the early '80s, there was a schism in the ML community with the
French on one side and the British and US on another. The French went
on to develop CAML and later Objective CAML (OCaml) while the Brits and
Americans developed Standard ML. The two dialects are quite similar.
Microsoft introduced its own variant of OCaml called F# in 2005.
Milner received the Turing Award in 1991 in large part for his work on ML.
The award citation includes this praise: "ML was way ahead of its time.
It is built on clean and well-articulated mathematical ideas, teased apart
so that they can be studied independently and relatively easily remixed and
reused. ML has influenced many practical languages, including Java, Scala,
and Microsoft's F#. Indeed, no serious language designer should ignore
this example of good design."
* * *
## Mutability
*Imperative* programming languages such as C and Java involve *mutable*
state that changes throughout execution. *Commands* specify how to
compute by destructively changing that state. Procedures (or methods)
can have *side effects* that update state in addition to producing a
return value.
The **fantasy of mutability** is that it's easy to reason about: the
machine does this, then this, etc.
The **reality of mutability** is that whereas machines are good at
complicated manipulation of state, humans are not good at understanding
it. The essence of why that's true is that mutability breaks
*referential transparency*: the ability to replace expression with its
value without affecting the result of a computation. In math, if f(x)=y,
then you can substitute y anywhere you see f(x). In imperative
languages, you cannot: f might have side effects, so computing f(x) at
time t might result in different value than at time t'.
It's tempting to believe that there's a single state that the machine
manipulates, and that the machine does one thing at a time. Computer
systems go to great lengths in attempting to provide that illusion. But
it's just that: an illusion. In reality, there are many states, spread
across threads, cores, processors, and networked computers. And the
machine does many things concurrently. Mutability makes reasoning about
distributed state and concurrent execution immensely difficult.
*Immutability*, however, frees the progammer from these concerns. It provides
powerful ways to build correct and concurrent programs. OCaml is primarily
an immutable language, like most functional languages. It does support
imperative programming with mutable state, but we won't use those features
until about two months into the course—in part because we simply won't need
them, and in part to get you to quit "cold turkey" from a dependence you might
not have known that you had. This freedom from mutability is one of the biggest
changes in perspective that 3110 can give you.
## Industry
OCaml and other functional languages are nowhere near as popular
as C, C++, and Java. OCaml's real strength lies in language manipulation
(i.e., compilers, analyzers, verifiers, provers, etc.). This is not
surprising, because OCaml evolved from the domain of theorem proving.
That's not to say that functional languages aren't used in industry.
There are many [industry projects using OCaml][ocaml-industry]
and [Haskell][haskell-industry], among other languages. A Cornellian,
Yaron Minsky (PhD '02), wrote a paper about [using OCaml in the financial
industry][minsky] (that link must be accessed from inside Cornell's network).
It explains how the features of OCaml make it a good choice for quickly
building complex software that works.
[minsky]: http://dx.doi.org/10.1017/S095679680800676X
[ocaml-industry]: https://ocaml.org/learn/companies.html
[haskell-industry]: https://wiki.haskell.org/Haskell_in_industry
But ultimately this course is about your education as a programmer, not
about finding you a job.
> "Education is what remains after one has forgotten everything one learned
> in school."
> —Albert Einstein
OCaml does a great job of clarifying and simplifying the essence of
functional programming in a way that other languages that blend
functional and imperative programming (like Scala) or take functional
programming to the extreme (like Haskell) do not. Having learned OCaml,
you'll be well equipped to teach yourself any other
functional(-inspired) language.
## Beauty
A final, non-scientific, subjective reason to study OCaml that I will put forth as
my own opinion: OCaml is beautiful.
> "Beauty is our Business"
> —title of a book in honor of Edsger W. Dijkstra
(Dijkstra was the recipient of the Turing award in 1972 for "fundamental
contributions to programming." David Gries was an editor of the book.)
OCaml is elegant, simple, and graceful. The code you write can be stylish and
tasteful. At first, this might not be apparent. You are
learning a new language after all—you wouldn't expect to appreciate
Sanskrit poetry on day 1 of [SANSK 1131][sansk1131]. In fact, you'll likely
feel frustrated for awhile as you struggle to express yourself in a new language.
So give it some time. I've lost track of how many students have come back to tell
me in future semesters how "ugly" other languages felt after they went back to
writing in them after 3110.
[sansk1131]: http://lrc.cornell.edu/asian/courses/sa/sansk131
Aesthetics do matter. Code isn't written just to be executed by machines.
It's also written to communicate to humans. Elegant code is easier to
read and maintain. It isn't necessarily easier to write.
## What about data structures?
That phrase is in the course title for historical reasons, as I understand it.
There was once a 4000-level course titled Data Structures. It got split up
and some of it injected into CS 2110 (Object-oriented Programming and
Data Structures) and some of it into CS 3110 (Data Structures and Functional
Programming). Over time some of the data structures content has migrated
into 2110. We'll use data structures as guiding examples in this course,
especially lists and trees. But they aren't the primary content.
## Summary
This course is about becoming a better programmer. Studying functional
programming will help with that. The biggest obstacle in our way is
the frustration of speaking a new language, particularly letting go of
mutable state. But the benefits will be great: a discovery that
programming transcends programming in any particular language or family
of languages, an exposure to advanced language features, and an appreciation
of beauty.
## Terms and concepts
* dynamic typing
* first-class functions
* functional programming languages
* immutability
* Lisp
* ML
* OCaml
* referential transparency
* side effects
* state
* static typing
* type safety
## Further reading
* [Introduction to Objective Caml](http://courses.cms.caltech.edu/cs134/cs134b/book.pdf),
chapters 1 and 2, a freely available textbook that is recommended for this course
* [OCaml from the Very Beginning](http://ocaml-book.com/), chapter 1, a relatively
inexpensive PDF textbook that is very gentle and recommended for this course
* [A guided tour [of OCaml]](https://realworldocaml.org/v1/en/html/a-guided-tour.html):
chapter 1 of *Real World OCaml*, a more agressive book written by some Cornellians
that some students might enjoy reading
* [The history of Standard ML](http://sml-family.org/history/): though it focuses
on the SML variant of the ML language, it's relevant to OCaml
* [The value of values](https://www.infoq.com/presentations/Value-Values): a lecture
by the designer of Clojure (a modern dialect of Lisp) on how the time of
imperative programming has passed