CS312 Lecture 1: Course Overview, Background on ML

Who are we?

Prof. Zabih and a staff of about 14. See web page for details: http://www.cs.cornell.edu/courses/cs312 (single most important piece of info in today's lecture). You will meet the rest of the staff in section and in consulting.

What is CS312 About?

CS312 is the third programming course in the Computer Science curriculum, following CS100 and CS211. The primary goal of the course is to give students a firm foundation in the fundamental principles of programming and computer science. Consequently, CS312 covers a broad set of topics including

alternative programming paradigms (beyond imperative and object-oriented programming)
key data structures and algorithms
reasoning about program behavior and complexity
type systems and data abstraction
the design and implementation of programming languages

A major goal in CS312 is to teach you how to program well. Just about anyone can learn how to throw code together and get simple programs running, but it takes a deep understanding of the principles of computer science to write truly elegant and efficient programs with lasting value. We will try to give you that understanding and teach you some of the craft of programming as well. And practice makes perfect.

Some notes on programming and programming languages

Lots of people vastly overstate the importance of knowing 1 computer language versus another. In particular, students tend to want to know "What language is the course in?" This is actually a fairly dull question. It's like worrying about what book you use when you first learn to read (Dick and Jane? Dr. Seuss?) In fact, it's actually fairly silly to even list the computer languages you know on your CV; most professors can't answer this question!

There is an important reason for this: computer languages have a lot in common. If you know almost any language really well, you can pick up any other language in a few days. If only foreign languages were so easy (based upon my expertise in Portugese, I will learn Chinese in under a week...)

The key words here, though, are really well.This means having a good mental model of what the computer does with a program. How can you tell if you don't have a good model? Suppose your program isn't quite working (it gives the number you want, but is often off by 1, i.e. fencepost error somewhere). If you don't have a good mental model, you tend to change some <'s to <='s. Why is this bad? There are LOTS of wrong programs out there, and very few right ones. What are the chances that you will stumble onto the right one? The lottery has MUCH better odds.

If you have a good mental model, you can just look at the program and think. It's harder, but much more likely to succeed.

VERY important piece of advice: if you find yourself typing instead of thinking, stop. This is probably your first course with just CS majors, and therefore the assignments will be a lot more work. You can trash your life (i.e., pull multiple all-nighters) if you try typing instead of thinking.

CS312 slogan #1 (first of many): Thinking is better than typing.

Many of the problem sets, and all of the exams (very important!) have short, elegant answers. It's in your interest to find it.

Given this, you should learn one language well. Ideally, that language should have a simple and elegant model!

Our choice: SML

We use the Standard ML (SML) programming language throughout the course. SML is a modern functional programming language with an advanced type and module system. The course is not about programming in SML. Rather, SML provides a convenient framework in which we can achieve the objectives of the course. Like the object-oriented model of Java, the functional paradigm of SML is an important programming model with which all students should be familiar, as it underlies the core of almost any high-level programming language. In addition the SML type and module systems provide frameworks for ensuring code is modular, correct, reusable, and elegant. The lessons you learn in programming with SML will be applicable to other programming languages such as Java. By studying alternative ways to write programs, you will be better equipped to use, implement or even design future programming environments.

Another important reason we use SML is that it has a relatively clean and simple model that makes it easier to reason about the correctness of programs. Indeed, SML was one of the first major programming languages to have a formal semantic definition. In our studies, we will see that we can reason formally about the functional correctness of code, and also about the space, time, and other resources used in a computation.

Lectures and Recitations

Lectures are Tuesday and Thursday, 10:10 to 11am in Kimball B11. Recitations are Monday and Wednesday at four times (see web page). You are expected to attend both lectures and recitations. You may attend any recitation you want to, but it's probably in your interest to stick with one. Feel free to load-balance.

Course Materials

There is no official textbook for the course. The following books are useful and on reserve at the Engineering Library:

The Little MLer, Matthias Felleisen and Daniel P. Friedman, MIT Press, 1998. ISBN 0 262 56114 X
ML for the Working Programmer, L. C. Paulson, 2nd ed., Cambridge Univ. Press, 2000. ISBN 0 521 56543 X
Elements of ML Programming, ML97 Edition, Jeffrey D. Ullman, Prentice Hall, 1998. ISBN 0 13 790387 1

Two convenient online sources that we will be using from time to time are:

Programming in Standard ML, Robert Harper
Notes on Programming in SML/NJ, Riccardo Pucella

Communication

Course web site

The course web site is at http://www.cs.cornell.edu/courses/cs312. You should keep a close eye on this web page. We will post announcements about the course there. The programming assignments will all be posted there too.

Newsgroup

The best way to reach the course staff is by posting questions or comments to the course newsgroup, cornell.class.cs312. There are many members of the course staff reading the newsgroup who can answer your questions. Read the guidelines on the web page for some tips about the newsgroup etiquette.

Email

For questions that would be inappropriate to post to the newsgroup, you can also reach the course staff by sending mail to cs312@cs.cornell.edu. The newsgroup is preferred, however.

Consulting hours

The TAs have regular office hours during the day, consultants have evening consulting hours. Office hours are on the web. Consulting hours are 7-10pm Sunday through Wednesday in Upson 304A, unless otherwise announced. The night before every project is due (not the night that it is due), we will hold extended consulting hours from 7pm-12 midnight. Consulting hours will not be held the day after a problem set is due.

Coursework

Problem Sets

The work in this class will consist of five problem sets. The first of these problem sets will be available on the course web site Monday. It is due in one week: 11:59PM the next Tuesday 9/10. Some problems sets will have written exercises as well as programs to write. The written exercises will in general be due at 4pm on the due date.

Software

You can download a copy of SML of New Jersey from the course web site. This include the Emacs editing environment that you will use to interact with SML and do your programming and debugging.

We will have four sessions demoing this environment next week. Keep your eye on the course web site for updates about the demos.

Prelims & Final

There will be two prelims, October 17 and November 19, held in the evenings. Location is on web site.

The final is December 13.

Make-up exams are oral; let's try not to have them.

Grading

Last year: 30% A, 40% B, 30% C or less (mostly C). Past performance is no guarantee of future outcomes.

Everything counts, but exams count more. Especially the final, since I have it in front of me when I assign grades.

Background on ML

Our first order of business in this course is to learn how to use ML. Why learn another language?

We use a zillion different programming languages to communicate with machines and each other:

general purpose programming: Fortran, Lisp, Basic, C, Pascal, C++, Java, etc.
scripting: Visual Basic, awk, sed, perl, tcl, sh, csh, bash, REXX, Scheme, etc.
search: regular expressions, browser queries, SQL, etc.
display and rendering: PostScript, HTML, XML, VRML, etc.
hardware: CCS, VHDL, Esterelle
theorem proving and mathematics: Mathematica, Maple, Matlab, NuPRL, Coq
others?

Though there are only a handful of general-purpose languages that you will learn and use, you'll be learning and using special-purpose languages for the rest of your life. Even general-purpose languages come and go. Today, it's Java and C++. Yesterday, it was Pascal and C, before that Fortran and Lisp. Who knows what it will be like tomorrow? You have to learn how to learn new languages.

In addition, many projects will require that you build "little" languages for gluing things together.

Javascript grew out of a little language to make web pages interactive
protocols, like HTTP or TCP are little languages that allow devices to talk to one another
the command prompt of DOS handles a little shell language
search engines on the web accept queries in a little language
others?

We gain a lot of leverage by having good notation and good language support for a given domain.

perl is extremely useful for searching through documents because of its built-in support for regular expressions
SQL is a very high-level language that makes it easy to do database transactions in a scalable way.

So it's important to understand programming models and programming paradigms because in this fast changing field, you need to be able to rapidly adapt.

It's crucial that you understand the principles behind programming that transcend the specifics of today.

There's no better way to get at these principles than to approach programming from a completely different perspective.

This is one reason why we're using ML -- it's very different from what most of you will have seen.

A great general-purpose programming language:

lets you say things concisely and understandably at the right level of abstraction
lets you extend the language with new features that are specific to a domain but blend in well with the rest of the language.
makes it easy to write correct code, with good performance
makes it easy to change the code when you find out the specification has changed
makes it easy to re-use code
is easy to learn

Fact: there are thousands of general purpose languages.

Corollary: there are no great programming languages.

But there are some pretty good ones. Java and ML are pretty good general-purpose languages (at least when compared to their predecessors.)

SML is a functional programming language.

genealogically, it fits in to the Lisp, Scheme, Miranda, Hope, Haskell, etc. line of programming languages.
Lisp vs. FORTRAN: functional vs. imperative
the key linguistic abstraction of this family: programmers can build new functions
forms the core of almost any general-purpose language
casting everything in terms of functions has its benefits: uniform, simple
functions are first-class: you can pass them to other functions, return "new" functions from functions, put functions in data structures, compose new functions out of old ones, etc.
you don't need to build in loops (e.g., while-loops, for-loops, do-loops, iterators, etc.) because these can be coded easily using functions.
constructing models of and reasoning about functional languages is generally easier than for other languages (since you have to at least model the functional subset)
SML does support imperative programming, but doesn't encourage it.
SML is not object-oriented, although there are versions of ML

SML is a statically typed, type-safe programming language.

a type-safe language ensures that you don't apply the wrong operations to the wrong data.
In practice, this prevents a lot of silly errors (e.g., treating an integer as a function) and also prevents a lot of security problems -- over half of the reported break-ins at CERT were due to buffer overflows -- something that's impossible in a type-safe language.
Functional languages like Scheme and Lisp are type-safe, but dynamically typed. That is, type-errors are caught only at run-time.
C and C++ are statically typed but not type-safe. There's no guarantee that a type-error won't occur.
Java and SML are type-safe and statically typed. This means that most errors are caught before running the program.
Fact: statically determining whether a program will have a type-error is impossible.
Corollary: all statically-typed languages are conservative and may reject some programs that are perfectly okay.
A good statically-typed language rules out lots of bad code, while admitting lots of good code.

SML (and SML/NJ in particular) supports a number of advanced features.

garbage collection: as in Java, the automatic memory management of SML lifts the burden of having to worry about memory management -- a common source of bugs in languages such as C or C++.
type inference: you do not have to write type information down everywhere. The compiler automatically figures out most types. This makes the code a bit more terse which can make it easier to read and maintain. (But this is a double-edged sword. Too little type information can make code harder to read.)
parametric polymorphism: ML lets you write functions and data structures that can be used with any type. This is crucial for being able to re-use code. Java provides a form of subtype polymorphism which also lets you re-use code. We'll learn more about parametric and subtype polymorphism and their relative strengths and weaknesses in class.
algebraic datatypes: you can build sophisticated data structures in ML very easily, without fussing with pointers and memory management. Pattern matching makes them even more convenient.
exceptions, threads, and continuations: as in Java, SML/NJ supports exceptions and threads, which are crucial for building real systems. The thread model of SML/NJ is radically different from that of Java, however. In addition, SML/NJ supports continuations, which are an advanced control construct out of which you can build things like loops, exceptions, and threads.
advanced modules: SML makes it easy to structure large systems through the use of modules. Modules (called structures) are used to encapsulate implementations behind interfaces (called signatures). SML goes well beyond the functionality of most languages with modules by providing functions that manipulate modules (functors), module variables, multiple interfaces per module, and nested modules.

Some history

(see Paulson's book for more info):

Robin Milner and others at the Edinburgh (Scotland) Laboratory for Computer Science were working on theorem provers in the late '70s and early '80s.

Traditionally, theorem provers were implemented in languages such as Lisp.

Milner kept running into the problem that the theorem provers would sometimes put incorrect "proofs" (i.e., non-proofs) together and claim that they were valid.

So he tried to develop a language that only allowed you to construct valid proofs.

"ML" which stands for "Meta Language" was the result of his (and others') work. The type system of ML was carefully constructed so that you could only construct valid proofs in the language. A theorem prover was then written as a program that constructed a proof.

Milner also formulated the type-inference system of ML, and proved its soundness.

(It should be noted that Milner also worked on concurrent programming languages, such as CCS, CSP, and the pi-Calculus and later went to receive the Turing Award -- the computer science equivalent of a Nobel Prize -- in large part for his work on ML

Eventually, this Classic ML evolved into a full-fledged programming language.

In the early '80s, there was a schism in the ML community with the French on one side and the British and US on another. The French went on to develop CAML and later Objective CAML (O'caml) while the Brits and Americans developed Standard ML. The two languages are actually quite similar.

What is ML used for today?

theorem provers (e.g., NuPRL, HOL, Coq, etc.)
compilers (e.g., SML/NJ, O'caml, C-kit, Twelf, Lambda-Prolog, Pict, etc.)
mathematics
hardware verification
advanced protocols (Ensemble, Fox, PLAN)
financial systems
genealogical database
signal processing
bioinformatics
scripting
latex to HTML translation
smartcards

In truth, not a lot when compared to something like C, C++, or Java. ML's real strength lies in language manipulation (i.e., compilers, analyzers, verifiers, provers, etc.) This is not surprising since ML evolved from the domain of theorem proving.