Lecture 1: Introduction to CS 3110
What is CS 3110 About?
Course overview information can be found here.
Background on OCaml
Our first order of business in this course is to learn how to use OCaml. Why
learn another language?
We use a zillion different programming languages to communicate with machines
and each other:
-
general purpose programming: Fortran, Lisp, Basic, C, Pascal, C++,
Java, etc.
-
scripting: Visual Basic, awk, sed, perl, tcl, sh, csh, bash, Rexx,
Scheme, etc.
-
search: regular expressions, browser queries, SQL, etc.
-
display and rendering: PostScript/PDF, HTML, XML, VRML, etc.
-
hardware: CCS, VHDL, Esterelle
-
mathematics: Mathematica, Maple, Matlab
-
others?
Though there are only a handful of general-purpose languages that you will
learn and use, you'll be learning and using special-purpose languages for the
rest of your life. Even general-purpose languages come and go.
Today, it's Java and C++. Yesterday, it was Pascal and C, before that
Fortran and Lisp. Who knows what it will be like tomorrow? You have to
learn how to learn new languages.
In addition, some projects will require that you build "little"
languages for gluing things together.
-
Javascript grew out of a little language to make web pages interactive
-
protocols, like HTTP or TCP are little languages that allow devices to
talk to one another
-
the command prompt of DOS handles a little shell language
-
search engines on the web accept queries in a little language
-
others?
We gain a lot of leverage by having good notation and good language support
for a given domain.
-
perl is extremely useful for searching through documents because of its
built-in support for regular expressions
-
SQL is a very high-level language that makes it easy to do database
transactions in a scalable way.
So it's important to understand programming models and programming paradigms
because in this fast changing field, you need to be able to rapidly adapt.
It's crucial that you understand the principles behind programming
that transcend the specifics of today.
There's no better way to get at these principles than to approach programming
from a completely different perspective.
This is one reason why we're using ML -- it's different from what most
of you will have seen.
A great general-purpose programming language:
-
lets you say things concisely and understandably at the right level of
abstraction
-
has good support for functional-style programming -- programs without the use of state or assignment
-
supports paradigms that are widely used in concurrent and massively parallel programming such as map-reduce
-
lets you extend the language with new features that are specific to a
domain but blend in well with the rest of the language.
-
makes it easy to write correct code, with good performance
-
makes it easy to change the code when you find out the specification has
changed
-
makes it easy to re-use code
-
is easy to learn
Fact: there are hundreds of general purpose languages.
Corollary: there are no great programming languages.
But there are some pretty good ones. Java and OCaml are pretty good
general-purpose languages (at least when compared to their predecessors.)
OCaml is a functional programming language.
-
genealogically, it fits into the Lisp, Scheme, Standard ML, Miranda, Hope, Haskell,
etc. line of programming languages.
-
Lisp vs. FORTRAN: functional vs. imperative
-
the key linguistic abstraction of this family: programmers can
build new functions
-
forms the core of almost any general-purpose language
-
all computation is done with functions, no state (locations with mutable values), which has the benefits of being uniform and simple
-
functions are first-class: you can pass them to other functions,
return "new" functions from functions, put functions in data
structures, compose new functions out of old ones, etc.
-
you don't need many special constructs in the language, such as for iteration (e.g., while-loops,
for-loops, do-loops, iterators, etc.) because these can be coded easily using functions (uniformity).
-
constructing models of and reasoning about functional languages is
generally easier than for other languages
-
OCaml does support imperative programming, but doesn't encourage it.
Initially we will use a subset that is (nearly) purely functional.
- OCaml has support for object-oriented programming (that's the "O" in OCaml).
OCaml is a statically typed, type-safe programming language.
-
a type-safe language ensures that you don't apply the wrong operations to
the wrong data (e.g., dividing two strings).
-
In practice, this prevents a lot of silly errors (e.g., treating an
integer as a function) and also prevents a lot of security problems -- over
half of the reported break-ins at CERT were due to buffer overflows --
something that's impossible in a type-safe language.
-
Functional languages like Scheme and Lisp are type-safe, but dynamically
typed. That is, type-errors are caught only at run-time.
-
C and C++ are statically typed but not type-safe.
There's no guarantee that a type-error won't occur.
-
Java and OCaml are type-safe and statically typed. This means
that most errors are caught before running the program.
-
Fact: statically determining whether a program will have a
type-error is impossible.
-
Corollary: all statically-typed languages are conservative
and may reject some programs that are perfectly okay.
-
A good statically-typed language rules out lots of bad code, while
admitting lots of good code.
OCaml supports a number of advanced features.
-
garbage collection: as in Java, the automatic memory
management of OCaml lifts the burden of having to worry about memory
management -- a common source of bugs in languages such as C or C++.
-
type inference: you do not have to write type
information down everywhere. The compiler automatically figures out
most types. This makes the code a bit more terse which can make it
easier to read and maintain. (But this is a double-edged sword.
Too little type information can make code harder to read.)
-
parametric polymorphism: OCaml lets you write functions
and data structures that can be used with any type. This is crucial
for being able to re-use code. Java provides a form of subtype
polymorphism which also lets you re-use code. We'll learn more
about parametric and subtype polymorphism and their relative strengths and
weaknesses in class.
-
algebraic datatypes: you can build sophisticated data
structures in OCaml very easily, without fussing with pointers and memory
management. Pattern matching makes them even more convenient.
-
exceptions and threads: as in Java,
OCaml supports exceptions and threads, which are crucial for building real
systems.
-
advanced modules: OCaml makes it easy to structure
large systems through the use of modules. OCaml has a module language
that is used to encapsulate implementations behind interfaces.
OCaml goes well beyond the functionality of many languages with modules by
providing functions that manipulate modules (functors).
Some history
(See Paulson's book for more info)
Robin Milner and others at the Edinburgh (Scotland) Laboratory for Computer
Science were working on theorem provers in the late '70s and early '80s.
Traditionally, theorem provers were implemented in languages such as Lisp.
Milner kept running into the problem that the theorem provers would sometimes
put incorrect "proofs" (i.e., non-proofs) together and claim that they
were valid.
So he tried to develop a language that only allowed you to construct valid
proofs.
"ML" which stands for "Meta Language" was the result of
his (and others') work. The type system of ML was carefully constructed so
that you could only construct valid proofs in the language. A theorem
prover was then written as a program that constructed a proof.
Milner also formulated the type-inference system of ML, and proved its
soundness.
(It should be noted that Milner also worked on concurrent programming
languages, such as CCS, CSP, and the pi-Calculus and later went to receive the
Turing Award -- the computer science equivalent of a Nobel Prize -- in large
part for his work on ML.)
Eventually, this Classic ML evolved into a full-fledged programming language.
In the early '80s, there was a schism in the ML community with the French on
one side and the British and US on another. The French went on to develop
Caml and later Objective Caml (OCaml) while the Brits and Americans developed
Standard ML (SML). The two languages are actually quite similar.
What is OCaml used for today?
-
theorem provers (e.g., NuPRL, HOL, Coq, etc.)
-
compilers (e.g., SML/NJ, OCaml, C-kit, Twelf, Lambda-Prolog, Pict, etc.)
-
mathematics
-
hardware verification
-
advanced protocols (Ensemble, Fox, PLAN)
-
financial systems
-
genealogical database
-
signal processing
-
bioinformatics
-
scripting
-
latex to HTML translation
-
smartcards
There's a nice paper about using OCaml in the financial industry (must be accessed from inside Cornell):
Minsky et al. It explains how the features of OCaml make it a good
choice for quickly building complex software that works.
OCaml is used for a variety of purposes, but it's nowhere near as popular as
C, C++, and Java.
OCaml's real strength lies in language manipulation (i.e., compilers, analyzers,
verifiers, provers, etc.) This is not surprising since OCaml evolved from the
domain of theorem proving.