\documentclass{article}
\usepackage[instructions]{hw}
\newcommand{\hw}{{\lang} Type System Specification}
\newcommand{\topic}{Cornell University}
\usepackage{import}
\subimport{../hw/}{hw-header}

\usepackage{mathpazo}
\usepackage{amssymb}

\usepackage{xr}
\externaldocument[:language:]{language}

\newcommand\hmul{\texttt{*\char62\char62}}

\newcommand{\kw}[1]{\ensuremath{\mathtt{#1}}}
\newcommand{\pr}{\ensuremath{^{\prime}}}
\newcommand{\dpr}{\ensuremath{^{\prime\prime}}}
\newcommand\unit{()}
\newcommand\Grho{Γ(ρ)}
\newcommand\Gbeta{Γ(β)}

\newcommand{\PC}{\ensuremath{Γ ⊢}}
\newcommand{\PCs}[1]{\ensuremath{Γ#1\rho\beta ⊢_s}}
\newcommand{\PCloop}{\ensuremath{Γ\rho~\kw{true} ⊢_s}}

\newcommand\typeof{\mathit{typeof}}
\newcommand\varsof{\mathit{varsof}}

\newcommand\fd{\mathit{gd}}
\newcommand\rulesep{1em}

\allowdisplaybreaks

\begin{document}
\headerBase{\\Version of \today}

\section{Changes}
\begin{itemize}
\item 2/20: Updated rules for checking multiple assignment and
simplified handling of function and procedure types.
% \item None yet; watch this space.  % Do not delete this line; comment instead.
\end{itemize}

\section{Types}
The {\lang} type system uses a somewhat bigger set of types than can
be expressed explicitly in the source language:
\[
\begin{aligned}[t]
t &::= \kw{int} \\
& \bnf  \kw{bool} \\
& \bnf \arrayof{t}
\end{aligned}
\qquad\qquad
\begin{aligned}[t]
T &::= (t_1,t_2,\ldots,t_n) ~~ ^{n≥0}
\end{aligned}
\qquad\qquad
\begin{aligned}[t]
R &::= \kw{unit} \\
& \bnf \kw{void}
\end{aligned}
\qquad\qquad
\begin{aligned}[t]
\sigma & ::= "var"~t \\
& \bnf "ret"~T \\
& \bnf "fn"~T \to T' 
\end{aligned}
\]
%
Ordinary types expressible in the language are denoted by the metavariable $t$,
which can be \kw{int}, \kw{bool}, or an array type.

The metavariable $T$ denotes a possibly empty sequence of types, which
is useful for procedures, functions, and multiple assignments.

The metavariable $R$ represents the outcome of evaluating a statement,
which can be either \kw{unit} or \kw{void}.
The \kw{unit} type is the
the type of statements that _might_ complete normally and permit the
following statement to execute.  The type \kw{void} is the
type of statements such as \kw{return} that _never_ pass control to the following
statement. The \kw{void} type should not be confused with the C/Java type \kw{void}, which
is actually closer to \kw{unit}.

The set $\sigma$ is used to represent typing-environment entries,
which can either be normal variables (bound to $"var"~t$ for some type $t$),
return types (bound to $"ret"~T$),
functions (bound to $"fn"~T\to T'$ where $T'≠()$),
or procedures (bound to $"fn"~T\to ()$),
where the ``result type'' $()$ indicates that the procedure
result contains no information other than that the procedure call
terminated.

\section{Type-checking expressions}

To type-check expressions, we need to know what bound variables and functions
are in scope; this is represented by the typing context $Γ$,
which maps names $x$ to types $\sigma$.

The judgment $Γ ⊢ e:t$ is the rule for the type of an expression;
it states that with bindings $Γ$ we can conclude that $e$ has the type $t$.

We use the metavariable symbols $x$ or $f$ to represent arbitrary
identifiers, $\nm{n}$ to represent an integer literal constant,
$\nm{string}$ to represent a string literal constant, and
$\nm{char}$ to represent a character literal constant.
Using these conventions, the expression typing rules are:
\begin{gather*}
  \infer{Γ ⊢ \nm{n}: \kw{int}}{} \qquad
  \infer{Γ ⊢ \kw{true}: \kw{bool}}{} \qquad
  \infer{Γ ⊢ \kw{false}: \kw{bool}}{} \qquad
  \infer{Γ ⊢ \nm{string}: \arrayof{\kw{int}}}{} \qquad
  \infer{Γ ⊢ \nm{char}: \kw{int}}{}\\[\rulesep]
  \infer{Γ ⊢ x: t}{Γ(x) = "var"~t} \qquad
  \infer{Γ ⊢ e_1 \oplus e_2: \kw{int}}
            {Γ ⊢ e_1: \kw{int} & Γ ⊢ e_2: \kw{int} &
                    \oplus \in \set{"+","-","*",\hmul,"/","\%" }}
                    \\[\rulesep]
  \infer{Γ ⊢ "-$e$":\kw{int}}{Γ ⊢ e: \kw{int}} \qquad
  \infer{Γ ⊢ e_1 \ominus e_2: \kw{bool}}
            {Γ ⊢ e_1: \kw{int} & Γ ⊢ e_2: \kw{int} &
                    \ominus \in \{"==","!=","<","<=",">",">="\}}
                    \\[\rulesep]
% boolean ops.
  \infer{Γ ⊢ "!$e$":\kw{bool}}{Γ ⊢ e: \kw{bool}} \qquad
  \infer{Γ ⊢ e_1 \ominus e_2: \kw{bool}}
            {Γ ⊢ e_1: \kw{bool} & Γ ⊢ e_2: \kw{bool} &
                    \ominus \in \{"==","!=","\&","|"\}} \\[\rulesep]
% array ops.
  \infer{Γ⊢ "\kw{length}($e$)":\kw{int}}{Γ ⊢ e:\arrayof{t}} \qquad
  \infer{Γ ⊢ e_1 \ominus e_2: \kw{bool}}
            {Γ ⊢ e_1: \arrayof{t} &
                    Γ ⊢ e_2: \arrayof{t} & \ominus \in \{"==","!="\}}
                    \\[\rulesep]
  \infer{Γ⊢"\{$e_1$,$\ldots$,$e_n$\}": \arrayof{t}}
            {Γ⊢  e_1:t & \ldots & Γ⊢  e_n:t & n ≥ 0} \qquad
  \infer{Γ⊢ "$e_1$[$e_2$]":t}{Γ ⊢ e_1:\arrayof{t} & Γ ⊢ e_2:\kw{int}} \qquad
  \infer{Γ⊢ "$e_1$ + $e_2$":\arrayof{t}}{Γ ⊢ e_1:\arrayof{t} & Γ ⊢
  e_2:\arrayof{t}} \\[\rulesep]
  \infer{Γ⊢"$f$($e_1$,$\ldots$,$e_n$)":t'}
        {Γ(f) = \kw{fn}~(t_1,\ldots,t_n)\to (t') & Γ⊢ e_i:t_i ~~
            ^{(∀i∈1..n)} & n ≥ 0}
\end{gather*}

\section{Type-checking statements}

To type-check statements, we need all the information used to type-check
expressions, plus the types of procedures, which are included in $Γ$.
In addition, we extend the domain of
$Γ$ a little to include a special symbol $ρ$.
To check the $\kw{return}$ statement we need to know what the return type of the
current function is or if it is a procedure.
Let this be denoted by $\Grho="ret"~T$,
where $T≠\unit$ if the statement is part of a function, 
or $T=\unit$ if the statement is part of a procedure.  Since
statements include declarations, they can also produce new variable bindings,
resulting in an updated typing context which we will denote as $Γ'$.
To update typing contexts, we write $Γ[x\mapsto "var"~t]$,
which is an environment exactly like $Γ$ except that it maps $x$ to $"var"~t$.
We use the metavariable $s$ to denote a statement, so
the main typing judgment for statements has the form
$Γ ⊢ s : R ⊣ Γ'$.

Most of the statements are fairly straightforward and do not change $Γ$:
\begin{gather*}
  \infer[\rnm{Seq}]
            {\PC "\{$s_1$ $s_2$ $\ldots$ $s_n$\}" : R⊣Γ}
            {\PC s_1 : \kw{unit}⊣Γ_1 & Γ_1⊢s_2:\kw{unit}⊣Γ_2 &\ldots&
            Γ_{n-1}⊢ s_n : R⊣Γ_{n}}
  \qquad
  \infer[\rnm{Empty}]
            {\PC "\{\}" : \kw{unit}⊣Γ}{}
            \\[\rulesep]
  \infer[\rnm{If}]{\PC "\kw{if} ($e$) $s$" : \kw{unit}⊣Γ}{Γ ⊢
  e:\kw{bool} & \PC s : R⊣Γ'} \qquad
  \infer[\rnm{IfElse}]
            {\PC "\kw{if} ($e$) $s_1$ \kw{else} $s_2$" :
            \nm{lub}(R_1,R_2)⊣Γ}
            {Γ ⊢ e:\kw{bool} & \PC s_1 : R_1⊣Γ' & \PC s_2 : R_2⊣Γ''}
            \\[\rulesep]
  \infer[\rnm{While}]{\PC "\kw{while} ($e$) $s$" : \kw{unit}⊣Γ}
            {Γ ⊢ e:\kw{bool} & Γ⊢ s : R⊣Γ'} \\[\rulesep]
  \infer[\rnm{PrCall}]{\PC f"("e_1","\ldots","e_n")" :
  \kw{unit}⊣Γ}
        {Γ(f) = \kw{fn}~(t_1,\ldots,t_n)\to() & Γ⊢ e_i:t_i ~~
        ^{(∀i∈1..n)} & n≥0} \\[\rulesep]
  \infer[\rnm{Return}]
        {\PC "\kw{return} $e_1$, $e_2$, $\ldots$, $e_n$" : \kw{void}⊣Γ}
        {\Grho = "ret"~(t_1,t_2,\ldots,t_n) & Γ⊢ e_i:t_i ~~
        ^{(∀i∈1..n)} & n≥0}
\end{gather*}
The function \nm{lub} is defined as follows:
\[
\nm{lub}(R,R)=R \qquad
\nm{lub}(\kw{unit},R)=\nm{lub}(R,\kw{unit})=\kw{unit}
\]
Therefore, the type of an "if" is \kw{void} only if
all branches have that type.

Assignments require checking the left-hand side to make sure it
is assignable:
\begin{gather*}
  \infer[\rnm{Assign}]{Γ⊢"$x$ = $e$" : \kw{unit}⊣Γ}{Γ(x) = "var"~t & Γ⊢ e:t} \qquad
  \infer[\rnm{ArrAssign}]{Γ⊢"$e_1$[$e_2$] = $e_3$" : \kw{unit}⊣Γ}
            {Γ⊢ e_1 : \arrayof{t} & Γ ⊢ e_2:\kw{int} & Γ ⊢ e_3 : t}
\end{gather*}

Declarations are the source of new bindings. Three kinds of declarations
can appear in the source language: variable declarations,
multiple assignments, and function/procedure declarations. We are only
concerned with the first two kinds within a function body.

\begin{gather*}
  \infer[\rnm{VarDecl}]{Γ⊢"$x$:$t$" : \kw{unit}⊣Γ[x \mapsto "var"~t]}{x∉\dom{Γ}} \qquad
  \infer[\rnm{VarInit}]{Γ⊢"$x$:$t$ = $e$" : \kw{unit}⊣Γ[x \mapsto
  "var"~t]}{x∉\dom{Γ} & Γ⊢ e: t} \\[\rulesep]
  %\infer[\rnm{ExprStmt}]{Γ⊢"_ = $e$" : \kw{unit}⊣Γ}{Γ⊢ e: t} \\[\rulesep]
  \infer[\rnm{ArrayDecl}]
        {Γ⊢"$x$:$t$[$e_1$]$\ldots$[$e_n$]$\underbrace{"[]"\ldots"[]"}_{m}$" : 
            \kw{unit}⊣Γ[x \mapsto "var"~t\underbrace{"[]"\ldots"[]"}_{n+m}]}
        {x∉\dom{Γ} & Γ⊢ e_i:\kw{int} ~~ ^{(∀i∈1..n)} & n ≥ 1 & m ≥ 0 &
        t∈\{"int", "bool"\}}
\end{gather*}
With respect to \rnm{ArrayDecl}, note
that the case of declaring an array with no dimension sizes specified
($n=0$) is already covered by \rnm{VarDecl}.

To handle
multiple assignments, we define an assignable expression $d$
(destination), which may be a variable declaration:
%
\[ d ::= x":"t \bnf x \bnf e_1 "[" e_2 "]" \bnf "_" \]
%
We define a new ``helper'' judgment $Γ, Γ' ⊢ d :: t \dashv Γ''$ that
determines the type of an assignable expression and also
extends the context as necessary. Here, $Γ$ represents the original
context before a set of declarations is processed, and $Γ'$ represents
the context including the declarations previously brought into
scope; $Γ''$ represents the context after the destination is
taken into account, possibly extending $Γ'$ with another binding.

\begin{gather*}
\infer{Γ, Γ' ⊢ x\ty t ~~ :: t \dashv Γ'[x ↦ "var"~t]}{x ∉ \dom{Γ'}} \quad
\infer{Γ, Γ' ⊢ x ~~:: t \dashv Γ'}{Γ(x) = "var"~t} \quad
\infer{Γ, Γ' ⊢ "_" ~~:: t \dashv Γ'}{} \quad
\infer{Γ, Γ' ⊢ e_1"["e_2"]" ~~:: t \dashv Γ'}{Γ⊢e_1 : t"[]" & Γ⊢e_2 : "int"}
\end{gather*}

Note that in the rule for "_", any type $t$ can be selected. This
makes the type system slightly non-syntax-directed, but a type
checker can represent this as a special symbol that can be equated
with any possible type.

Using this judgment, we have the following rules for multiple assignment:
\begin{gather*}
  \infer[\rnm{MultiAssign}]
        {Γ⊢d_1","\ldots","d_n = e_1","\ldots","e_n : 
            \kw{unit}⊣Γ_{n+1}}
            {
      \begin{gathered}
        Γ⊢e_i : t_i  ~~ ^{(∀i∈1..n)} \quad
        Γ_1 = Γ \quad
        Γ, Γ_i ⊢ d_i :: t_i \dashv Γ_{i+1} ~~ ^{(∀i∈1..n)} \quad
      \end{gathered}
      }
      \\[\rulesep]
  \infer[\rnm{MultiAssignCall}]
        {Γ⊢d_1","\ldots","d_n = f(e_1","\ldots","e_m) : 
            \kw{unit}⊣Γ_{n+1}}
            {
      \begin{gathered}
        Γ⊢e_i : t_i  ~~ ^{(∀i∈1..m)} \quad
        Γ_1 = Γ \quad
        Γ, Γ_i ⊢ d_i :: t'_i \dashv Γ_{i+1} ~~ ^{(∀i∈1..n)} \quad \\
        Γ(f) = "fn"~(t_1,\dots,t_m) → (t'_1,\dots,t'_n) \quad
      \end{gathered}
      }
\end{gather*}
%

These rules actually subsume the earlier \rnm{Assign},
\rnm{ArrAssign}, and \rnm{VarInit} rules in the case where $n=1$, so
it is redundant to implement those rules directly.

\section{Checking top-level declarations}
\label{sec:top-decl}

At the top level of the program, we need to figure out the types of
procedures and functions, and make sure their bodies are well-typed. Since
mutual recursion is supported, this needs to be done in two passes. First, we
use the judgment $Γ ⊢ \fd ⊣ Γ'$ to state that the top-level (global)
declaration $\fd$ extends top-level bindings $Γ$ to $Γ'$:
\begin{gather*}
  \infer{Γ ⊢ x ":" t ⊣ Γ'}{x ∉ Γ & Γ' = Γ[x \mapsto "var"~t]} \qquad
  \infer{Γ ⊢ x ":" t = e ⊣ Γ'}{x ∉ Γ & Γ' = Γ[x \mapsto "var"~t] }  \\[\rulesep]
  \infer{Γ ⊢ "$f$($x_1$:$t_1$,$\ldots$,$x_n$:$t_n$) $s$" ⊣ Γ'}
        { f∉ \dom{Γ} & Γ' = Γ[f\mapsto\kw{fn}~(t_1,\ldots,t_n)\to ()]} \\[\rulesep]
  \infer{Γ ⊢ "$f$($x_1$:$t_1$,$\ldots$,$x_n$:$t_n$):$t'_1$,$\ldots$,$t'_m$ $s$" ⊣ Γ'}
        { f∉ \dom{Γ} & Γ' = Γ[f\mapsto\kw{fn}~(t_1,\ldots,t_n)\to (t'_1,\ldots,t'_m)]}
\end{gather*}

The second pass over the program is captured by the judgment $Γ ⊢
\fd~~"def"$, which defines how to check well-formedness of each
global definition against the top-level environment $Γ$, ensuring that
parameters do not shadow anything and that the body is well-typed.
We treat procedures just like functions that return no values.
The body of a procedure definition may have any type, but
the body of a function definition must have type \kw{void}, which ensures
that the function body does not fall off the end without returning.
\begin{gather*}
  \infer{Γ⊢  "$f$($x_1$:$t_1$,$\ldots$,$x_n$:$t_n$):$t'_1$,$\ldots$,$t'_m$ $s$"~~"def"}
    {
      \begin{gathered}
        |\dom{Γ} ∪ \{x_1,\dots,x_n\}| = |\dom{Γ}| + n \\
        Γ[x_1↦"var"~t_1, \dots, x_n↦"var"~t_n,
        ρ↦"ret"~(t'_1,\ldots,t'_m)] ⊢  s : \kw{void}⊣Γ'
      \end{gathered}
    } \\[\rulesep]
  \infer{Γ⊢  "$f$($x_1$:$t_1$,$\ldots$,$x_n$:$t_n$) $s$"~~"def"}
    {
      \begin{gathered}
        |\dom{Γ} ∪ \{x_1,\dots,x_n\}| = |\dom{Γ}| + n \\
        Γ[x_1↦"var"~t_1, \dots, x_n↦"var"~t_n,
        ρ↦"ret"~()] ⊢  s : R⊣Γ'
      \end{gathered}
    } \\[\rulesep]
  \infer{Γ⊢ x : t~~"def"}{}
  \qquad
  \infer{Γ⊢ x : t = e~~"def"}{Γ⊢e:t & \text{$e$ is a numeric, boolean, or character literal}}
\end{gather*}

\section{Checking a program}

Using the previous judgments, we can define when an entire program
$\fd_1~\fd_2~\ldots~\fd_n$ that does not contain a "use" declaration
is well-formed, written $⊢ \fd_1~\fd_2~\ldots~\fd_n~~"prog"$:

\[
\infer{⊢ \fd_1~\fd_2~\ldots~\fd_n~~"prog"}
{
     ∅ ⊢ \fd_1 ⊣ Γ_1 &
     Γ_1 ⊢\fd_2 ⊣ Γ_2 &
     \ldots &
     Γ_{n-1} ⊢ \fd_n ⊣ Γ &
     Γ ⊢ \fd_i~~"def" ~~ ^{(∀i∈1..n)}
}
\]

For brevity, the rules for adding declarations appearing in interfaces are
omitted.  These rules are slightly different from those of the form
$Γ ⊢ \fd ⊣ Γ'$ in Section~\ref{sec:top-decl}, where $f∉ \dom{Γ}$ is replaced
with appropriate conditions.
Once added, these declarations also permits declarations in the source file
of identical signature.  See Section~\extref{:language:sec:interfaces} of
the {\lang} Language Specification.
\end{document}