Sei sulla pagina 1di 140

TypeChecking

Announcements

Programming Project 2 due tonight at 11:59PM.

Office hours from 1:00PM 3:00PM in Gates 160. Scope checkpoint due Saturday, July 23 at 11:59PM. This is a hard deadline, no late days allowed. Final submission due Wednesday, July 27 at 11:59PM. Start early; this assignment is significantly larger than the previous two assignments.

Programming Project 3 out.


More Announcements

Programming Assignment 1 graded and returned on paperless.stanford.edu.


Mean: 52.9 / 60 Stdev: 8

Written Assignment 1 graded. Hard copies returned after class, electronic copies will be emailed later today.

Mean: 20.2 / 24 Stdev: 3

Let us know ASAP if you haven't heard back from us by tomorrow morning.

Where We Are
Source Code

Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; x[5] = myInteger * y; } void doSomething() { } int fibonacci(int n) { return doSomething() + fibonacci(n 1); } }

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; multiply x[5] = myInteger * y; } void doSomething() {
Can't redefine functions Interface not declared

Can't

Wrong type

strings

Variable not declared

} int fibonacci(int n) { return doSomething() + fibonacci(n 1); Can't add void }


No main function

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; multiply x[5] = myInteger * y; } void doSomething() {
Can't redefine functions

Can't

Wrong type

strings

Variable not declared

} int fibonacci(int n) { return doSomething() + fibonacci(n 1); Can't add void }


No main function

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; multiply x[5] = myInteger * y; } void doSomething() {

Can't

Wrong type

strings

Variable not declared

} int fibonacci(int n) { return doSomething() + fibonacci(n 1); Can't add void }


No main function

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; multiply x[5] = myInteger * y; } void doSomething() { } int fibonacci(int n) { return doSomething() + fibonacci(n 1); Can't add void }
No main function

Can't

Wrong type

strings

Review from Last Time


class MyClass implements MyInterface { string myInteger; void doSomething() { int[] x; x = new string; multiply x[5] = myInteger * y; } void doSomething() { } int fibonacci(int n) { return doSomething() + fibonacci(n 1); Can't add void }

Can't

Wrong type

strings

What Remains to Check?


Type errors. Today:


What are types? What is type-checking? A type system for Decaf.

What is a Type?

This is the subject of some debate. To quote Alex Aiken:


The notion varies from language to language. The consensus:


A set of values. A set of operations on those values

Type errors arise when operations are performed on values that do not support that operation.

Types of Type-Checking

Static type checking.

Analyze the program during compile-time to prove the absence of type errors. Never let bad things happen at runtime. Check operations at runtime before performing them. More precise than static type checking, but usually less efficient. (Why?) Throw caution to the wind!

Dynamic type checking.


No type checking.

Type Systems

The rules governing permissible operations on types forms a type system. Strong type systems are systems that never allow for a type error.

Java, Python, JavaScript, LISP, Haskell, etc.

Weak type systems can allow type errors at runtime.

C, C++

Type Wars

Endless debate about what the right system is. Dynamic type systems make it easier to prototype; static type systems have fewer bugs. Strongly-typed languages are more robust, weakly-typed systems are often faster.

Type Wars

Endless debate about what the right system is. Dynamic type systems make it easier to prototype; static type systems have fewer bugs. Strongly-typed languages are more robust, weakly-typed systems are often faster. I'm staying out of this!

Our Focus

Decaf is typed statically and weakly:


Type-checking occurs at compile-time. Runtime errors like dereferencing null or an invalid object are disallowed.

Decaf uses class-based inheritance. Decaf distinguishes primitive types and classes.

Typing in Decaf

Static Typing in Decaf

Static type checking in Decaf consists of two separate processes:

Inferring the type of each expression from the types of its components. Confirming that the types of expressions in certain contexts matches what is expected.

Logically two steps, but you will probably combine into one pass.

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }

expression with wrong type.

Well-typed

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }

An Example
while (numBitsSet(x + 5) <= 10) { if (1.0 + 4.0) { /* */ } while (5 == null) { /* */ } }
Expression with type error

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


+

IntConstant

IntConstant

137

42

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


+

int

IntConstant

IntConstant

137

42

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


+

int

IntConstant

int

IntConstant

137

42

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


int

int

IntConstant

int

IntConstant

137

42

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


=

bool x

Identifier

bool y

Identifier

bool true BoolConstant

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


=

bool x

Identifier

bool =

bool y

Identifier

bool true BoolConstant

Inferring Expression Types

How do we determine the type of an expression? Think of process as logical inference.


bool =

bool x

Identifier

bool =

bool y

Identifier

bool true BoolConstant

Sample Inference Rules

If x is an identifier that refers to an object of type t, the expression x has type t. If e is an integer constant, e has type int. If the operands e1 and e2 of e1 + e2 are known to have types int and int, then e1 + e2 has type int.

Type Checking as Proofs

We can think of syntax analysis as proving claims about the types of expressions. We begin with a set of axioms, then apply our inference rules to determine the types of expressions. Many type systems can be thought of as proof systems.

Formalizing our Notation

We will encode our axioms and inference rules using this syntax:
Preconditions Postconditions

This is read if preconditions are true, we can infer postconditions.

Examples of Formal Notation


A Bv is a production. t FIRST(B) t FIRST(A) A is a production. FIRST(A)

A B1B2...Bntv is a production. FIRST(Bi) for 1 i n t FIRST(A)

A B1 Bn is a production. FIRST(Bi) for 1 i n FIRST(A)

Formal Notation for Type Systems

We write

e:T
if the expression e has type T.

The symbol means we can infer...

Our Starting Axioms

Our Starting Axioms

true : bool

false : bool

Some Simple Inference Rules

Some Simple Inference Rules


i is an integer constant i : int s is a string constant s : string

d is a double constant d : double

More Complex Inference Rules

More Complex Inference Rules

e1 : int e2 : int e1 + e2 : int

e1 : double e2 : double e1 + e2 : double

More Complex Inference Rules


If we can show that e1 and e2 have type int

e1 : int e2 : int e1 + e2 : int

e1 : double e2 : double e1 + e2 : double

More Complex Inference Rules


If we can show that e1 and e2 have type int

e1 : int e2 : int e1 + e2 : int

e1 : double e2 : double e1 + e2 : double


then we can show that e1 + e2 has type int as well

Even More Complex Inference Rules

Even More Complex Inference Rules

e1 : T e2 : T T is a primitive type e1 == e2 : bool

e1 : T e2 : T T is a primitive type e1 != e2 : bool

Why Specify Types this Way?

Gives a rigorous definition of types independent of any particular implementation.

No need to say you should have the same type rules as my reference compiler. Can implement type-checking however you want, as long as you obey the rules. Can do inductive proofs on the structure of the program. Good practice if you want to study types.

Gives maximum flexibility in implementation.

Allows formal verification of program properties.

This is what's used in the literature.

A Problem

A Problem
x is an identifier. x : ??

A Problem
x is an identifier. x : ??

know what it refers to?

type of x if we don't

How do we know the

An Incorrect Solution

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
:x : double x double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
:x : double x double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T d is a double constant d : double

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T d is a double constant d : double

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T e1 : T e2 : T T is a primitive type e1 == e2 : bool

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T e1 : T e2 : T T is a primitive type e1 == e2 : bool

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double x == 1.5 : bool

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T e1 : T e2 : T T is a primitive type e1 == e2 : bool

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int 1.5 : double x == 1.5 : bool

An Incorrect Solution
x is an identifier. x is in scope with type T. x:T e1 : T e2 : T T is a primitive type e1 == e2 : bool

int MyFunction(int x) { { double x; } if (x == 1.5) { /* */ } }

Facts
x : double x : int Problem? 1.5 : double x == 1.5 : bool

Strengthening our Inference Rules


The facts we're proving have no context. We need to strengthen our inference rules to remember under what circumstances the results are valid.

Adding Scope

We write

Se:T
if in scope S, the expression e has type T.

Types are now proven relative to the scope they are in.

Old Rules Revisited


S true : bool i is an integer constant S i : int S false : bool s is a string constant S s : string d is a double constant S d : double S e1 : double S e2 : double S e1 + e2 : double S e1 : int S e2 : int S e1 + e2 : int

A Correct Rule

x is an identifier. x is a variable in scope S with type T. Sx:T

A Correct Rule

x is an identifier. x is a variable in scope S with type T. Sx:T

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : ??

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : ??

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : ??

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : ??

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : ??

Rules for Functions


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : U

Rules for Functions


Read rules

f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ti for 1 i n S f(e1, ..., en) : U

like this

Rules for Arrays

S e1 : T[] S e2 : int S e1[e2] : T

Rule for Assignment


S e1 : T S e2 : T S e1 = e2 : T

Rule for Assignment


S e1 : T S e2 : T S e1 = e2 : T

Why isn't this rule a problem for this statement?

5 = x;

Rule for Assignment


S e1 : T S e2 : T S e1 = e2 : T

If Derived extends Base, will this rule work for this code? Base myBase; Derived myDerived; myBase = myDerived;

Typing with Classes

How do we factor inheritance into our inference rules? We need to consider the shape of class hierarchies.

Single Inheritance
Instructor Animal

Professor

Lecturer

TA

Man

Bear

Pig

AlexAiken

Keith

Hrysoula

Riddhi

Multiple Inheritance
Instructor Animal

Professor

Lecturer

TA

Man

Bear

Pig

AlexAiken

Keith

Hrysoula

Riddhi

ManBearPig

Properties of Inheritance Structures


Any class is convertible to itself. (Reflexivity) If A is convertible to B and B is convertible to C, then A is convertible to C. (Transitivity) If A is convertible to B and B is convertible to A, then A and B are the same type. (Antisymmetry) This defines a partial order over types.

Types and Partial Orders


We say that A B if A is convertible to B. We have that


AA A B and B C implies A C A B and B A implies A = B

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : ??

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : ??

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : ??

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : T1

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : T1

Can we do better than this?

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : T2

Updated Rule for Assignment


S e1 : T1 S e2 : T2 T2 T1 S e1 = e2 : T2

semantic analyzer, but easy extra credit!

Not required in your

Updated Rule for Comparisons

Updated Rule for Comparisons

S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool

Updated Rule for Comparisons


S e1 : T1 S e2 : T2 T1 and T2 are of class type. T1 T2 or T2 T1 S e1 == e2 : bool

S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool

Updated Rule for Comparisons


Can we unify these rules?
S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool S e1 : T1 S e2 : T2 T1 and T2 are of class type. T1 T2 or T2 T1 S e1 == e2 : bool

The Shape of Types


Engine

CarEngine

DieselEngine

DieselCarEngine

The Shape of Types


Engine

CarEngine

DieselEngine

bool

string

int

double

DieselCarEngine

The Shape of Types


Engine

CarEngine

DieselEngine

bool

string

int

double

Array Types

DieselCarEngine

Extending Convertibility

If A is a primitive or array type, A is only convertible to itself. More formally, if A and B are types and A is a primitive or array type:

A B implies A = B B A implies A = B

Updated Rule for Comparisons


S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool S e1 : T1 S e2 : T2 T1 and T2 are of class type. T1 T2 or T2 T1 S e1 == e2 : bool

Updated Rule for Comparisons


S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool S e1 : T1 S e2 : T2 T1 and T2 are of class type. T1 T2 or T2 T1 S e1 == e2 : bool

S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S e1 == e2 : bool

Updated Rule for Comparisons


S e1 : T S e2 : T T is a primitive type S e1 == e2 : bool S e1 : T1 S e2 : T2 T1 and T2 are of class type. T1 T2 or T2 T1 S e1 == e2 : bool

S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S e1 == e2 : bool

Updated Rule for Function Calls


f is an identifier. f is a non-member function in scope S. f has type (T1, , Tn) U S ei : Ri for 1 i n Ri Ti for 1 i n S f(e1, ..., en) : U

A Tricky Case

S null : ??

Back to the Drawing Board


Engine

CarEngine

DieselEngine

bool

string

int

double

Array Types

DieselCarEngine

Back to the Drawing Board


Engine

CarEngine

DieselEngine

bool

string

int

double

Array Types

DieselCarEngine

null Type

Handling null

Define a new type corresponding to the type of the literal null; call it null type. Define null type A for any class type A. The null type is not accessible to programmers; it's only used internally inside the compiler. Many programming languages have types like these.

A Tricky Case

S null : ??

A Tricky Case

S null : null type

A Tricky Case

S null : null type

Object-Oriented Considerations
S is in scope of class T. S this : T

T is a class type. S new T : T

S e : int S NewArray(e, T) : T[]

Object-Oriented Considerations
S is in scope of class T. S this : T

T is a class type. S new T : T

S e : int S NewArray(e, T) : T[]

need to check if T is void?

Why don't we

What's Left?

We're missing a few language constructs:


Member functions. Field accesses. Miscellaneous operators.

Good practice to fill these in on your own.

Typing is Nuanced

The ternary conditional operator ? : evaluates an expression, then produces one of two values. Works for primitive types:

int x = random()? 137 : 42; Base b = isB? new Base : new Derived;

Works with inheritance:

What might the typing rules look like?

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : ??

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : ??

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : ??

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : ??

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

A Proposed Rule
S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

Is this really what we want?

A Small Problem
Super

S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

Base

Derived1

Derived2

A Small Problem
Super

S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

Base

Derived1

Derived2

Base = random()? new Derived1 : new Derived2;

A Small Problem
Super

S cond : bool S e1 : T 1 S e2 : T 2 T1 T2 or T2 T1 S cond ? e1 : e2 : max(T1, T2)

Base

Derived1

Derived2

Base = random()? new Derived1 : new Derived2;

Least Upper Bounds

An upper bound of two types A and B is a type C such that A C and B C. The least upper bound of two types A and B is a type C such that:

C is an upper bound of A and B. If C' is an upper bound of A and B, C C'.

When the least upper bound of A and B exists, we denote it A B.

(When might it not exist?)

A Better Rule
Super

Base

S cond : bool S e1 : T1 S e2 : T2 T = T1 T2 S cond ? e1 : e2 : T

Derived1

Derived2

Base = random()? new Derived1 : new Derived2;

that still has problems


Base1 Base2

S cond : bool S e1 : T1 S e2 : T2 T = T1 T2 S cond ? e1 : e2 : T

Derived1

Derived2

Base1 = random()? new Derived1 : new Derived2;

that still has problems


Base1 Base2

S cond : bool S e1 : T1 S e2 : T2 T = T1 T2 S cond ? e1 : e2 : T

Derived1

Derived2

Base1 = random()? new Derived1 : new Derived2;

Multiple Inheritance is Messy


Type hierarchy is no longer a tree. Two classes might not have a least upper bound. Occurs in Java due to interfaces. Not a problem in Decaf; there is no ternary conditional operator. How to fix?

Minimal Upper Bounds

An upper bound of two types A and B is a type C such that A C and B C. A minimal upper bound of two types A and B is a type C such that:

C is an upper bound of A and B. If C' is an upper bound of C, then it is not true that C' < C.

Minimal upper bounds are not necessarily unique. A least upper bound must be a minimal upper bound, but not the other way around.

A Correct Rule
Base1 Base2

S cond : bool S e1 : T1 S e2 : T2 T is a minimal upper bound of T1 and T2 S cond ? e1 : e2 : T

Derived1

Derived2

Base1 = random()? new Derived1 : new Derived2;

A Correct Rule
Base1 Base2

S cond : bool S e1 : T1 S e2 : T2 T is a minimal upper bound of T1 and T2 S cond ? e1 : e2 : T

Derived1

Derived2 expression has type Base1 Can prove both that

type Base2. Base1 = random()? new Derived1 : new Derived2;

and that expression has

So What?

Type-checking can be tricky. Strongly influenced by the choice of operators in the language. Strongly influenced by the legal type conversions in a language. In C++, the previous example doesn't compile. In Java, the previous example does compile, but the language spec is enormously complicated.

See 15.12.2.7 of the Java Language Specification.