Some Thoughts on Types for Python-3000

May 1, 2006

Title: Initial Thoughts on Types for Python-3000
Author: Bill Birch
Email: birchb at-sign tpg dot com dot au
Version: 0.5
Date: May 1st 2006
Status: Draft,
1 INTRODUCTIONThis paper describes a proposed type system for Python-3000. The typesystem supports meta-programs like compilers and type checkers, itsupports run-time data checking in production code and test suites.
The type system provides structural subtyping predicates and
operations and abstract interface which allows programmers to define
their own types and typing semantics.

This PEP refers to BDFL's blog "Optional Static Type Checking - Stop
the Flames" as [blog3]

Blog3 []


This PEP explores language enhancements as well as ways the core
language features can be leveraged.  Eventually some of the items will
be marked as "user-space" features, implementable in .py files. Right
now they're unsorted.


Syntax used in this document is derived from BDFL's proposed syntax
but syntax is not central to the proposal. Many languages have
separate syntax for type expressions. Python has a flexible syntax
which can be used without much change to support types. [blog3] has a
proposed syntax for type annotations on function declarations such as:

def foo(a: int) -> int
        return 2*a

and for paramterised collections:


However there are pros and cons to having  a special syntax just for
types. Syntax is embedded in the compiler and is out of reach of
users. However expressing the type of complex interfaces with just
existing list, tuple and dict syntax is not easy on the eye.  

We propose the following syntax in place of assignment to
__metaclass__ with the expectation that interfaces can be implemented
as meta-classes.

class[interface] myInterface:


Python has a long history of discussion about type systems. The
motivations for adding a type system to Python are different to the
reasons for type systems in other, static languages. Static languages
have a greater need for strict compile-time checks since there are no
checks at runtime. Incorrect type usage would produce hard to debug
errors. Dynamic, typed languages such as Python or LISP protect the
runtime environment from the damage caused by invalid operations.
Despite the success of dynamic type checks, there are still good
reasons for proposing a formally defined type system as part of the


Without type annotations of variables and arguments, writers of
libraries must documentation must describe the domain and range of
inputs and outputs in natural language. Most libraries also include a
kind of type annotation in the documentation of functions arguments.
Having a standard form for specification of types will make
documentation simpler.


Many programmers would like to be able to annotate classes and
functions with type information and have this checked during
development and optionally at runtime. This has been discussed at
length resulting in [blog3]. Type checking is seen as one way to
reduce errors in programs. 

See also Colin Winter's typecheck module as an example of typechecking
implemented in today's Python.

Winter []


At runtime many Python programs receive 'untrusted' data for
processing either from clients or servers on network connections or
from data sources such as files or databases. A normal step of
execution is to perform input validation to weed out badly formed
structures form these external sources. A type system which supports
declarative specification of the expected input and functions for
checking input data will reduce the programming required. 


As systems become large, and the number of parties involved grows,
teams of developers prefer to formally define interfaces between
subsystems. A type system should be supportive of the specification
and checking of interfaces.


Both unit tests and integration tests routinely need code which checks
the invariants of the code under test including the structure of data
generated by code. Whilst production code may not be able to afford
the costs of full type checking, the same does not apply to testing.
A type system should support declaration of deep types runtime
checking of data structures in test harnesses. 


A dynamic type system can also support the generation of objects given
a type specification. This has a variety of uses, including generation
of random or exhaustive test data. The type system should  provide
easy introspection of type trees so that generators can be developed. 

In general the type system must be open to allow meta-programs to be
developed which work with the type objects. 


This section provides a recap of terminology and basic concepts.

A type is meta-data, it is data about data.
A type defines a set of possible objects.
A type describes an interpretation of unstructured information.
An object or collection of objects may have more than one
interpretation at a time, hence more than one type.

Unlike static languages (such as 'C') a Python type does not always
define a physical structure. 


An atomic type cannot contain another type without a conversion
operation. Atomic types include:

int, float, str


Non-atomic types can contain other objects. Examples are:
lists, tuples, dictionaries, class instances (aka objects), modules,
classes etc.


There are many varieties of type system, this section lists some of
the major kinds of typing and subtyping and relevance to py3k.


Static type systems are usually implemented within the language
compiler which has little or no access to values at runtime.
Variables have a fixed type which is determined at compile time. 'C'
is a classic example of a statically typed language. 

However a programmer may choose to use more than one interpretation of
a variable at run-time, communicated by a cast at compile-time.
Object oriented languages which are thought of as 'statically' typed
are forced to use run-time type information to implement polymorphism.
C++ and Java are examples of hybrid static/dynamic type systems.

Academic discussions of type systems are frequently targeted a
statically typed languages. Because static compiled type systems have
little or no access to values and the runtime has equally little
access to the type system, these discussions of type systems are
incomplete when applied to py3k.


Dynamic typing relies on every object being marked with
self-describing type information. The focus on type information moves
from the compiler to the run-time. Typically the language variables
are unconstrained and can contain any type. Often the compiler has no
access to the type information.

LISP and Python are dynamically typed. 

Some dynamically typed languages (such as Common Lisp, Visual Basic)
allow programmers to optionally specify types of variables. This may
result in smaller data or faster execution since the compiler has more

This proposal covers similar ideas however the motivations do not
include performance optimisation by specification of type information.


Duck typing is form of dynamic structural subtyping. Knowledge of the
actual type of an object is unknown, rather expected methods are
called and if present all is well. Duck typing is not really a type
system since the type of an object is never a consideration during
either compilation or execution.

WikiPedia []


Structural subtyping (aka structural equivalence) examines the actual
type of objects and if the objects are constructed the same way they
are deemed equivalent. Depending on the type system the depth of
examination of the objects can vary. 

Type  systems with structural subtyping have been widely accepted by
the academic community in the form of languages such as Haskell and ML
since they provide superior levels of re-use than nominal subtyping

This proposal recommends structural subtyping for py3k since that is
the current behaviour.

Wikepedia []
Cardelli85 []


In a nominative type system the subtype relationship must be
explicitly declared by the programmer by including the names of
supertypes in a type definition. Even if two types are identical in
structure they may never be substituted for each other if they do not
share names. In this case programmers are rrequiredto write 'wrappers'
which coerce the types.

Traditional programming languages such as 'C++' and Java use nominal

Python has an element of nominal subtyping since class ddefinitions
include the names of superclasses, and the interpreter uses these
names to locate methods. However the dynamic nature of Python means
that there is no guarantee that a nominated subtype is structurally a
subtype. As this example demonstrates:

class Base(object):
	   def aMethod(self): print "Base.aMethod()"

class Other(object): pass

class Derived(Base):
	   def __init__(self):
		  self.__class__ = Other

d = Derived()
    d.aMethod() # error

Wikepedia []


Liskov subtyping requires that the subtype not only be structurally
compatible, but all of it's properties must conform to the supertype.
The properties to be matched are anything 'provable' including history
(pre- and post- conditions). 

This approach seems particularly strict and difficult to implement,
since the type checking is required to check the before and after
states of objects. 

The py3k approach should not hinder advanced users implementing such

LiskovWing []


A shallow type system looks only at the type of the first object
referenced. For example we have:
    >>> type([1,2,3,4])

Which is equivalent to 



A 'deep type' includes all the objects reference by a non-atomic type.
It is a transitive closure of the types of all the objects referenced
directly or indirectly by a specified object. Languages such as
Haskell understand deep typing. Example:

>>> type([1,2,3,4])


The expectation of Python programmers is that, at least, the type
system should support optional type checking which is a predictor for
future type errors. Thus the built-in py3k type objects define the
isSubType() and conforms() predicates so that structural subtypes
are recognised.

The discussion of sub-typing also applies to whether a value is an
instance of a type. The py3k conforms() predicate also addresses the
actual structure of the object, not just it's class. It also examines
the deep type of the object and it's children.

Most statically compiled languages  allow the user no control or
influence on the type system beyond declaration of user-defined types.
Python offers more opportunities.

In py3k types are dynamically constructed hence the possibility exists
to allow the advanced user to influence fundamental typing concepts
within their own types. Provided that users comply with the abstract
type interface they are free to define their own kinds of subtyping.


Unlike type expressions in static languages such as C++ or Java, type
expressions in py3k are constructed at runtime just like an other
value expression and consist of Python new-style objects.
Type expressions are best introduced by examples.

list[int]     #  all lists with every element an int

list[list[int]]    # all lists of the above

tuple[str]    # tuples of strings

tuple[str, int, int]   # all 3-tuples with string then two ints

dict[{str: float}]   # dictionary with str keys and float values.

dict[{just('x-coord'): Number, just('y-coord'):Number}]
 # dictionaries with two keys 'x-coord' and 'y-coord' 

All type objects share some common methods which must be present,
these make up the abstract Type interface discussed later in this
paper. Types are constructed implicitly by the compiler/parser or
explicitly by invoking constructors.


There are some constants which are required to capture key concepts in
the type system. These constants are immutable singletons. Constants


This stands for all types. This is different from 'object' since
'object' requires subtypes to be class instances.


Which stands for the empty set. ie no objects are of this type.
The py3k also includes the types defined in the types module. 


Type constructors are functions which create type objects of various
kinds. These can be called explicitly or with the assistance of the
compiler through its expression syntax.
Users can define their own type classes and constructors.

4.2.1 just(object: any) -> JustType

The py3k type system can have direct access to values available at
run-time.  We could use values (constants) directly in type
expressions for example:

list['table', str]

the type of all 2-lists with 'table' as their first element and a
string as their second.

However this would require that the string class provide type
operations such as isSubType(). Instead we wrap LISTOF(T: TYPE)values
using the just() operator: 

list[just('table'), str]

The types resulting from just() provide the necessary type methods.

4.2.2 listof(t: Type)

Declares a homogeneous parameterised type. example: 


creates an object of this class.

4.2.3 tupleof(t: Type)

Homogeneous tuple. Example:


4.2.4 listtype(*args)

A list with a particular non-repeating sequence of types given by
*args. Example:

list[int, int, float, str]

If the last type specified is the ellipses object, the previous
element type is allowed zero or more times. Example:

list[str, int, ...]


['a', ]   ['b', 1] and ['c', 1,2,3]

4.2.5 tupletype(*args)

similar to listType().

4.2.6 functiontype(fargs, vargs, kargs, returntype) -> type

This constructs an object which captures the signature of a callable.
Parameters fargs, vargs and kargs specify the types of the fixed,
variable and keyword arguments as dictionaries. (Note: Function
signatures are subject of another proposal). Examples:

functionType()     # anything callable

functionType([int, float] , [str,...], {'a':long, 'b':bool}, int)

functionType([int, float] , , , int)

4.2.7 instancetype()

The instanceType describes new-style objects created with new(). The
names of the members are given in a dict e.g.

instanceType({  'x': float,
                    'y': float,
                    'move': functionType({  'dx': float,
                                            'dy': float})})

4.2.8 union(t1: Type, t2: Type, ...)-> type

This constructs an object which denotes that the type is a union of
all the types included. Example:

Number = union(int, long, float, decimal)

4.2.9 intersection(t1: Type, t2 : Type, ...) -> Type

An intersection of types is described by this object. An object must
be a subtype of all the included types to be a subtype or instance.
Example dicts with both a save and a draw function:

intersection(   dict[{'save':functionType()}],

4.2.10 deeptype(x: any) -> Type

Given a value, return a quite specific type expression. This operation
allows objects to be used as 'exemplars'. Example:

>>> deepType( [1,2,3,4,5] ) 

list[int, int, int, int, int]


Interfaces are composed of an instanceType with method signatures.

instanceType({'moveUp': functionType({'dx': float}, bool)})

The compiler may provide a nicer syntax more redolent of class
definitions which that would include a name binding. Example:

interface IMoveable(IWindow):
        def move(dx: float, dy: float) -> bool

or if implemented as a meta-class:

class[interface] IMoveable(IWindow):
        def move(dx: float, dy: float) -> bool

If an interface included a parent interface as above, then this would
wrap the interface under an intersection object. ie

IMoveable = intersection(IWindow, instanceType( ...


A naked type expression is un-named and is a simple value like any
other in Python. Further, two type expressions may result in different

assert list[float] != list[float]


Simple assignment binds a type expression to a variable subject to all
the usual scoping rules.

var =  


Here we address the use of Class objects in type expressions.
Consider the following code:

class Point(object):
        x_coord: int
        y_coord: int
        def __init__(self, x, y):
            self.x_coord = x
            self.y_coord = y
        def getX(self): return self.x_coord
        def getY(self): return self.y_coord

def drawLine(start: Point, end: Point) :

It's very natural for a programmer to use class names as types.
However the downside is the Point class does not provide much type
information. A structural subtype operation would only be able to
typecheck that 'start' refers to an object with an __init__ member
which takes two parameters and the two int coordinate variables.
Worse, a structural subtype check would examine _all_ features of the
class, even the internals (such as x_coord) which are of no interest
to the client. 

The user is expecting that 'start' can only refer to an instance of
the Point class or one of it's subclasses. The expectation is that a
Nominative subtype check applies when classes are used as types.
This is not a good program in any case, since it mandates a concrete
class. A better program would avoid nominative sub-typing as follows:

class[interface]  Point:
        def __init__(self, x: Number, y: Number)
        def getX() -> Number
        def getY() -> Number

def drawLine(start: Point, end: Point) :

We propose that Structural Subtyping in py3k would be available from
interfaces. The conforms() and isSubType() predicates of the
Interface type will perform deep, structural type checking. In this
example an optionally type-checked drawLine() method would ensure
'start' and 'end' conform to the interface regardless of the object's

The isinstance(), conforms() and isSubType() predicates of the class
meta-class (type) will not use structural subtyping, rather they are
limited to simple nominal subtyping. 

This is an important resolution which cannot be understated. 

* Class types provide implementation and nominative type checking.

* Interfaces are purely structural type declarations with no

The above class could incorporate the interface as a hint to type
checkers with special syntax:

class PointImpl(object) implements[Point]:
        x_coord: int
        y_coord: int
        def __init__(self, x, y):
            self.x_coord = x
            self.y_coord = y
        def getX(self): return self.x_coord
        def getY(self): return self.y_coord

Note the above contrary syntax is underlines the distinction between
implementation inheritance and structural subtypes. [blog3] expresses
this in the bases list since the compiler knows which classes are

class PointImpl(object, Point):

If the compiler could guarantee that the class conforms to the
interfaces, then the conforms() and isSubType() predicates could use
the __implements__ attribute of the class and avoid deep type checks.
However this is not the case, therefore these predicates function the
same as if there were no interfaces implemented.  

The compiler or a type checking program could be developed to enforce
interface constraints. For example the following code would raise
errors since it uses floats, not ints:

class PointImpl(object) implements[Point]:
        x_coord: float
        y_coord: float
        def __init__(self, x: float, y: float):
            self.x_coord = x
            self.y_coord = y
        def getX(self) -> float: return self.x_coord
        def getY(self) -> float: return self.y_coord

The user of meta-classes is free to change the rules for classes,
since the meta-classes will implement the isinstance() and isSubType()
functions according to the tastes of the developer. So one could
implement a meta-class which uses structural subtyping instead of
nominative subtyping. 


A self-referential type needs some special magic to prevent the
interpreter from evaluating the type symbol at definition-time. Some
languages have special syntax and processing e.g.

typedef linkedList := data, None | linkedList

We propose recursive types are created within a closure using a def
or lambda, since they suspend evaluation of variable names during the
interpretation. Example:

linkedList = recursive(lambda : [any, union(linkedList,

Predicates in the recursive type must deal with the possible infinite
regress. Cardelli and others have described algorithms for checking
conformance of recursive types.

An explicit 'typedef' keyword may make recursive types easier to read.

Amadio, Cardelli []
LinguaFranca []


To be useful, a type system must have operations on types and on types
and instances.


8.1.1 isinstance(obj any, t) -> bool

This is already part of Python, and is a nominal subtype operator.

8.1.2 issubtype(t1:type, t2:type) -> boolean

A structural sub-type predicate. This predicate is used by type
checking programs, compilers and user programs.
All builtin types implement this predicate.
Should this be optional or mandatory?

8.1.3 conforms(obj: any, t: type) -> bool

Structural is-instance predicate. Returns True if the object conforms
to the type specified in all its detail. Ignores the objects class
name. Example:

class A(object):
        def aMethod(arg: int) -> int : pass
        def anotherMethod(arg: list) -> int : pass

class[interface] myInterface:
        def aMethod(arg: int) -> int : pass

a  = A()
    assert isconforming(a, myInterface)

8.1.4 issibling(obj1: any, obj2: any) -> bool

True if the objects have the same deep type. 

8.2 type(obj: any) -> Type

Existing shallow type operator. Returns the type of the object

8.3 convert(obj: t1, t2: Type) -> t2

Given an object of type t1, this function attempts to return a new
object which conforms to type t2. The new object is a deep copy of the
old object. Failure results in a exception.
This function relies on known and default type conversions:

int -> long
    int -> str
    tuple -> list
    list-> tuple

8.4 adapt(obj:any, protocol: t2, alternate: any) -> t2

Adapt was described in PEP 246.

This is a function which can return a wrapper object which effectively
is a view of the obj passed in but conforming to the type of the

We add to the adapt requirements that if the obj is a structural
subtype of the protocol it is returned. ie

if isinstance(obj, protocol) or conforms(obj, protocol):
        return obj
        ....other adapt logic...

PEP246 []


Py3k has a new abstract type interface to which all type objects
conform (and without which recursive descent type operators will be
dumbfounded). This convention is what allows users to create their own
kinds of type relations and subtype relations. This section lists
methods in addition to the usual (equality, hash etc).

9.1 conforms(self: any, obj: any) -> boolean

True if obj structurally conforms to this Type.

9.2 issubtype(self: any, t Type) -> boolean

True if t is a structural subtype.

9.3 isinstance(self: any, obj: any) -> bool 

True if obj is an instance by name.


Users can define their own Type classes which conform to the Type
abstraction. This section demonstrates extensibility.


A function to construct a type for a list of three with the same type
in each:

def myType(t: Type)-> Type :
        return listType(t, t, t) 


Example: All functions which return a list of the same type as its
input (whatever the type):

exists( ('t'), functionType(object: varType(t)) -> varType(t) ) 

This is very different from: 

functionType(object:any) -> any

To make this work the user has defined the following type

exists() - creates a dynamic scope in which type variables can be

varType() - creates a type object which stores the first type it
                sees when its conforms() method is called. All
                subsequent calls to conform() must match the stored


A regular expression specifies a set of strings and can be used as
a type specifier. For example:

creditCardNumberType = regexType("[0-9][0-9][0-9][0-9]
+[0-9][0-9][0-9][0-9] +[0-9][0-9][0-9][0-9] +[0-9][0-9][0-9][0-9] +")

Here the user has defined a regexType() type constructor which uses a
regular expression match within its conforms() predicate.

The regexType.isSubType() operator would be rather hard to code. 


We need structural subtyping NOW!

March 9, 2006

I’m a developer working in industry. From many years in the business I can tell you that perhaps 30% of application code is devoted to interpreting, validating input data and outputting it to the next program. Especially now we have middleware messaging and a proliferation of layering.

The average programming language (e.g. Java) has zero ability to automatically classify input data so we are left to churn out code which looks at data and figures out what class it belongs to. Add to this is the long list of programming languages used per system (JavaScript, HTML, XML, XSLT, Java, JSF, SQL) and the large number of class libraries which require interfacing glue code to hand data between them. In a typical J2EE application information passes through tens if not hundreds of forms as it traverses the system from user to database. At each hop it must pass from one encapsulated form to another.

Nominal typing systems are holding us back.

If programming languages could allow us to simply declare the data structures and associate classifications we could eliminate billions of lines of code. Preferrably the same declarations would work in all the different languages. Nirvana would be a VM which has a (read) method which automatically classifies the data input.

Structural Subtyping can help because it allows programs to operate on unfamiliar data without adapters. There are a handful of languages past and present which attempt to use it. Xen/Cowmega, Mozzie, MotionTypes. Apparently LOOM, Modular-3, OCaml, Brew and others can do it also but I have yet to explore these languages. And I am not the only one looking for this.
The challenge is to build this into languages we use every day.