Some Thoughts on Types for Python-3000

##

Title: Initial Thoughts on Types for Python-3000
Author: Bill Birch
Email: birchb at-sign tpg dot com dot au
Version: 0.5
Date: May 1st 2006
Status: Draft,
1 INTRODUCTIONThis paper describes a proposed type system for Python-3000. The typesystem supports meta-programs like compilers and type checkers, itsupports run-time data checking in production code and test suites.
The type system provides structural subtyping predicates and
operations and abstract interface which allows programmers to define
their own types and typing semantics.

This PEP refers to BDFL's blog "Optional Static Type Checking - Stop
the Flames" as [blog3]

Blog3 [http://www.artima.com/weblogs/viewpost.jsp?thread=87182]

1.1 DOCUMENTATION CONVENTIONS USED HERE.

This PEP explores language enhancements as well as ways the core
language features can be leveraged.  Eventually some of the items will
be marked as "user-space" features, implementable in .py files. Right
now they're unsorted.

1.2 SYNTAX

Syntax used in this document is derived from BDFL's proposed syntax
but syntax is not central to the proposal. Many languages have
separate syntax for type expressions. Python has a flexible syntax
which can be used without much change to support types. [blog3] has a
proposed syntax for type annotations on function declarations such as:

def foo(a: int) -> int
        return 2*a

and for paramterised collections:

list[int]

However there are pros and cons to having  a special syntax just for
types. Syntax is embedded in the compiler and is out of reach of
users. However expressing the type of complex interfaces with just
existing list, tuple and dict syntax is not easy on the eye.  

We propose the following syntax in place of assignment to
__metaclass__ with the expectation that interfaces can be implemented
as meta-classes.

class[interface] myInterface:
        pass

2 MOTIVATIONS

Python has a long history of discussion about type systems. The
motivations for adding a type system to Python are different to the
reasons for type systems in other, static languages. Static languages
have a greater need for strict compile-time checks since there are no
checks at runtime. Incorrect type usage would produce hard to debug
errors. Dynamic, typed languages such as Python or LISP protect the
runtime environment from the damage caused by invalid operations.
Despite the success of dynamic type checks, there are still good
reasons for proposing a formally defined type system as part of the
language. 

2.1 DOCUMENTATION

Without type annotations of variables and arguments, writers of
libraries must documentation must describe the domain and range of
inputs and outputs in natural language. Most libraries also include a
kind of type annotation in the documentation of functions arguments.
Having a standard form for specification of types will make
documentation simpler.

2.2 TYPE CHECKING

Many programmers would like to be able to annotate classes and
functions with type information and have this checked during
development and optionally at runtime. This has been discussed at
length resulting in [blog3]. Type checking is seen as one way to
reduce errors in programs. 

See also Colin Winter's typecheck module as an example of typechecking
implemented in today's Python.

Winter [http://oakwinter.com/code/typecheck/]

2.3 INPUT VALIDATION

At runtime many Python programs receive 'untrusted' data for
processing either from clients or servers on network connections or
from data sources such as files or databases. A normal step of
execution is to perform input validation to weed out badly formed
structures form these external sources. A type system which supports
declarative specification of the expected input and functions for
checking input data will reduce the programming required. 

2.4 INTERFACES

As systems become large, and the number of parties involved grows,
teams of developers prefer to formally define interfaces between
subsystems. A type system should be supportive of the specification
and checking of interfaces.

2.5 SUPPORTING TESTING

Both unit tests and integration tests routinely need code which checks
the invariants of the code under test including the structure of data
generated by code. Whilst production code may not be able to afford
the costs of full type checking, the same does not apply to testing.
A type system should support declaration of deep types runtime
checking of data structures in test harnesses. 

2.6 INTROSPECTION

A dynamic type system can also support the generation of objects given
a type specification. This has a variety of uses, including generation
of random or exhaustive test data. The type system should  provide
easy introspection of type trees so that generators can be developed. 

In general the type system must be open to allow meta-programs to be
developed which work with the type objects. 

3 WHAT IS A TYPE?

This section provides a recap of terminology and basic concepts.

A type is meta-data, it is data about data.
A type defines a set of possible objects.
A type describes an interpretation of unstructured information.
An object or collection of objects may have more than one
interpretation at a time, hence more than one type.

Unlike static languages (such as 'C') a Python type does not always
define a physical structure. 

3.1 ATOMIC TYPES ARE INDIVISIBLE.

An atomic type cannot contain another type without a conversion
operation. Atomic types include:

int, float, str

3.2 NON-ATOMIC TYPES ARE CONTAINERS FOR ATOMIC TYPES.

Non-atomic types can contain other objects. Examples are:
lists, tuples, dictionaries, class instances (aka objects), modules,
classes etc.

3.3 KINDS OF TYPING AND SUB-TYPING

There are many varieties of type system, this section lists some of
the major kinds of typing and subtyping and relevance to py3k.

3.3.1 STATIC TYPING

Static type systems are usually implemented within the language
compiler which has little or no access to values at runtime.
Variables have a fixed type which is determined at compile time. 'C'
is a classic example of a statically typed language. 

However a programmer may choose to use more than one interpretation of
a variable at run-time, communicated by a cast at compile-time.
Object oriented languages which are thought of as 'statically' typed
are forced to use run-time type information to implement polymorphism.
C++ and Java are examples of hybrid static/dynamic type systems.

Academic discussions of type systems are frequently targeted a
statically typed languages. Because static compiled type systems have
little or no access to values and the runtime has equally little
access to the type system, these discussions of type systems are
incomplete when applied to py3k.

3.3.2 DYNAMIC TYPING

Dynamic typing relies on every object being marked with
self-describing type information. The focus on type information moves
from the compiler to the run-time. Typically the language variables
are unconstrained and can contain any type. Often the compiler has no
access to the type information.

LISP and Python are dynamically typed. 

Some dynamically typed languages (such as Common Lisp, Visual Basic)
allow programmers to optionally specify types of variables. This may
result in smaller data or faster execution since the compiler has more
information. 

This proposal covers similar ideas however the motivations do not
include performance optimisation by specification of type information.

3.3.3 DUCK TYPING

Duck typing is form of dynamic structural subtyping. Knowledge of the
actual type of an object is unknown, rather expected methods are
called and if present all is well. Duck typing is not really a type
system since the type of an object is never a consideration during
either compilation or execution.

WikiPedia [http://en.wikipedia.org/wiki/Duck_typing]

3.3.4 STRUCTURAL SUB-TYPING

Structural subtyping (aka structural equivalence) examines the actual
type of objects and if the objects are constructed the same way they
are deemed equivalent. Depending on the type system the depth of
examination of the objects can vary. 

Type  systems with structural subtyping have been widely accepted by
the academic community in the form of languages such as Haskell and ML
since they provide superior levels of re-use than nominal subtyping
systems.

This proposal recommends structural subtyping for py3k since that is
the current behaviour.

Wikepedia [http://en.wikipedia.org/wiki/Structural_type_system]
Cardelli85 [http://citeseer.ist.psu.edu/cardelli85understanding.html]

3.3.5 NOMINATIVE SUBTYPING (AKA INHERITANCE SUBTYPING)

In a nominative type system the subtype relationship must be
explicitly declared by the programmer by including the names of
supertypes in a type definition. Even if two types are identical in
structure they may never be substituted for each other if they do not
share names. In this case programmers are rrequiredto write 'wrappers'
which coerce the types.

Traditional programming languages such as 'C++' and Java use nominal
subtyping. 

Python has an element of nominal subtyping since class ddefinitions
include the names of superclasses, and the interpreter uses these
names to locate methods. However the dynamic nature of Python means
that there is no guarantee that a nominated subtype is structurally a
subtype. As this example demonstrates:

class Base(object):
	   def aMethod(self): print "Base.aMethod()"

class Other(object): pass

class Derived(Base):
	   def __init__(self):
		  self.__class__ = Other

d = Derived()
    d.aMethod() # error

Wikepedia [http://en.wikipedia.org/wiki/Nominative_type_system]

3.3.6 LISKOV SUBSTITUTION

Liskov subtyping requires that the subtype not only be structurally
compatible, but all of it's properties must conform to the supertype.
The properties to be matched are anything 'provable' including history
(pre- and post- conditions). 

This approach seems particularly strict and difficult to implement,
since the type checking is required to check the before and after
states of objects. 

The py3k approach should not hinder advanced users implementing such
schemes.

LiskovWing [http://citeseer.ist.psu.edu/liskov94behavioral.html]

3.3.7 SHALLOW TYPING

A shallow type system looks only at the type of the first object
referenced. For example we have:
    >>> type([1,2,3,4])

Which is equivalent to 

list[any]

3.3.8 DEEP TYPING

A 'deep type' includes all the objects reference by a non-atomic type.
It is a transitive closure of the types of all the objects referenced
directly or indirectly by a specified object. Languages such as
Haskell understand deep typing. Example:

>>> type([1,2,3,4])

3.4 SUBTYPING FOR PYTHON-3000

The expectation of Python programmers is that, at least, the type
system should support optional type checking which is a predictor for
future type errors. Thus the built-in py3k type objects define the
isSubType() and conforms() predicates so that structural subtypes
are recognised.

The discussion of sub-typing also applies to whether a value is an
instance of a type. The py3k conforms() predicate also addresses the
actual structure of the object, not just it's class. It also examines
the deep type of the object and it's children.

Most statically compiled languages  allow the user no control or
influence on the type system beyond declaration of user-defined types.
Python offers more opportunities.

In py3k types are dynamically constructed hence the possibility exists
to allow the advanced user to influence fundamental typing concepts
within their own types. Provided that users comply with the abstract
type interface they are free to define their own kinds of subtyping.

4 TYPE EXPRESSIONS - A WAY OF DESCRIBING TYPES IN PYTHON.

Unlike type expressions in static languages such as C++ or Java, type
expressions in py3k are constructed at runtime just like an other
value expression and consist of Python new-style objects.
Type expressions are best introduced by examples.

list[int]     #  all lists with every element an int

list[list[int]]    # all lists of the above

tuple[str]    # tuples of strings

tuple[str, int, int]   # all 3-tuples with string then two ints

dict[{str: float}]   # dictionary with str keys and float values.

dict[{just('x-coord'): Number, just('y-coord'):Number}]
 # dictionaries with two keys 'x-coord' and 'y-coord' 

All type objects share some common methods which must be present,
these make up the abstract Type interface discussed later in this
paper. Types are constructed implicitly by the compiler/parser or
explicitly by invoking constructors.

4.1 TYPE CONSTANTS

There are some constants which are required to capture key concepts in
the type system. These constants are immutable singletons. Constants
include:

any 

This stands for all types. This is different from 'object' since
'object' requires subtypes to be class instances.

nothing

Which stands for the empty set. ie no objects are of this type.
The py3k also includes the types defined in the types module. 

4.2 TYPE CONSTRUCTORS

Type constructors are functions which create type objects of various
kinds. These can be called explicitly or with the assistance of the
compiler through its expression syntax.
Users can define their own type classes and constructors.

4.2.1 just(object: any) -> JustType

The py3k type system can have direct access to values available at
run-time.  We could use values (constants) directly in type
expressions for example:

list['table', str]

the type of all 2-lists with 'table' as their first element and a
string as their second.

However this would require that the string class provide type
operations such as isSubType(). Instead we wrap LISTOF(T: TYPE)values
using the just() operator: 

list[just('table'), str]

The types resulting from just() provide the necessary type methods.

4.2.2 listof(t: Type)

Declares a homogeneous parameterised type. example: 

list[int]  

creates an object of this class.

4.2.3 tupleof(t: Type)

Homogeneous tuple. Example:

tuple[float]

4.2.4 listtype(*args)

A list with a particular non-repeating sequence of types given by
*args. Example:

list[int, int, float, str]

If the last type specified is the ellipses object, the previous
element type is allowed zero or more times. Example:

list[str, int, ...]

matches:

['a', ]   ['b', 1] and ['c', 1,2,3]

4.2.5 tupletype(*args)

similar to listType().

4.2.6 functiontype(fargs, vargs, kargs, returntype) -> type

This constructs an object which captures the signature of a callable.
Parameters fargs, vargs and kargs specify the types of the fixed,
variable and keyword arguments as dictionaries. (Note: Function
signatures are subject of another proposal). Examples:

functionType()     # anything callable

functionType([int, float] , [str,...], {'a':long, 'b':bool}, int)

functionType([int, float] , , , int)

4.2.7 instancetype()

The instanceType describes new-style objects created with new(). The
names of the members are given in a dict e.g.

instanceType({  'x': float,
                    'y': float,
                    'move': functionType({  'dx': float,
                                            'dy': float})})

4.2.8 union(t1: Type, t2: Type, ...)-> type

This constructs an object which denotes that the type is a union of
all the types included. Example:

Number = union(int, long, float, decimal)

4.2.9 intersection(t1: Type, t2 : Type, ...) -> Type

An intersection of types is described by this object. An object must
be a subtype of all the included types to be a subtype or instance.
Example dicts with both a save and a draw function:

intersection(   dict[{'save':functionType()}],
                    dict[{'draw':functionType()}]) 

4.2.10 deeptype(x: any) -> Type

Given a value, return a quite specific type expression. This operation
allows objects to be used as 'exemplars'. Example:

>>> deepType( [1,2,3,4,5] ) 

list[int, int, int, int, int]

4.3 INTERFACES

Interfaces are composed of an instanceType with method signatures.
Example:

instanceType({'moveUp': functionType({'dx': float}, bool)})

The compiler may provide a nicer syntax more redolent of class
definitions which that would include a name binding. Example:

interface IMoveable(IWindow):
        def move(dx: float, dy: float) -> bool

or if implemented as a meta-class:

class[interface] IMoveable(IWindow):
        def move(dx: float, dy: float) -> bool

If an interface included a parent interface as above, then this would
wrap the interface under an intersection object. ie

IMoveable = intersection(IWindow, instanceType( ...

5 ANONYMOUS TYPES.

A naked type expression is un-named and is a simple value like any
other in Python. Further, two type expressions may result in different
objects.

assert list[float] != list[float]

6 NAMING OF TYPES

Simple assignment binds a type expression to a variable subject to all
the usual scoping rules.

var =  

6.1 CLASSES AND INTERFACES

Here we address the use of Class objects in type expressions.
Consider the following code:

class Point(object):
        x_coord: int
        y_coord: int
        def __init__(self, x, y):
            self.x_coord = x
            self.y_coord = y
        def getX(self): return self.x_coord
        def getY(self): return self.y_coord

def drawLine(start: Point, end: Point) :
        pass

It's very natural for a programmer to use class names as types.
However the downside is the Point class does not provide much type
information. A structural subtype operation would only be able to
typecheck that 'start' refers to an object with an __init__ member
which takes two parameters and the two int coordinate variables.
Worse, a structural subtype check would examine _all_ features of the
class, even the internals (such as x_coord) which are of no interest
to the client. 

The user is expecting that 'start' can only refer to an instance of
the Point class or one of it's subclasses. The expectation is that a
Nominative subtype check applies when classes are used as types.
This is not a good program in any case, since it mandates a concrete
class. A better program would avoid nominative sub-typing as follows:

class[interface]  Point:
        def __init__(self, x: Number, y: Number)
        def getX() -> Number
        def getY() -> Number

def drawLine(start: Point, end: Point) :
        pass

We propose that Structural Subtyping in py3k would be available from
interfaces. The conforms() and isSubType() predicates of the
Interface type will perform deep, structural type checking. In this
example an optionally type-checked drawLine() method would ensure
'start' and 'end' conform to the interface regardless of the object's
classnames.

The isinstance(), conforms() and isSubType() predicates of the class
meta-class (type) will not use structural subtyping, rather they are
limited to simple nominal subtyping. 

This is an important resolution which cannot be understated. 

* Class types provide implementation and nominative type checking.

* Interfaces are purely structural type declarations with no
  implementation.

The above class could incorporate the interface as a hint to type
checkers with special syntax:

class PointImpl(object) implements[Point]:
        x_coord: int
        y_coord: int
        def __init__(self, x, y):
            self.x_coord = x
            self.y_coord = y
        def getX(self): return self.x_coord
        def getY(self): return self.y_coord

Note the above contrary syntax is underlines the distinction between
implementation inheritance and structural subtypes. [blog3] expresses
this in the bases list since the compiler knows which classes are
interfaces:

class PointImpl(object, Point):

If the compiler could guarantee that the class conforms to the
interfaces, then the conforms() and isSubType() predicates could use
the __implements__ attribute of the class and avoid deep type checks.
However this is not the case, therefore these predicates function the
same as if there were no interfaces implemented.  

The compiler or a type checking program could be developed to enforce
interface constraints. For example the following code would raise
errors since it uses floats, not ints:

class PointImpl(object) implements[Point]:
        x_coord: float
        y_coord: float
        def __init__(self, x: float, y: float):
            self.x_coord = x
            self.y_coord = y
        def getX(self) -> float: return self.x_coord
        def getY(self) -> float: return self.y_coord

The user of meta-classes is free to change the rules for classes,
since the meta-classes will implement the isinstance() and isSubType()
functions according to the tastes of the developer. So one could
implement a meta-class which uses structural subtyping instead of
nominative subtyping. 

7 RECURSIVE TYPES

A self-referential type needs some special magic to prevent the
interpreter from evaluating the type symbol at definition-time. Some
languages have special syntax and processing e.g.

typedef linkedList := data, None | linkedList

We propose recursive types are created within a closure using a def
or lambda, since they suspend evaluation of variable names during the
interpretation. Example:

linkedList = recursive(lambda : [any, union(linkedList,
                                                      just(None))])

Predicates in the recursive type must deal with the possible infinite
regress. Cardelli and others have described algorithms for checking
conformance of recursive types.

An explicit 'typedef' keyword may make recursive types easier to read.

Amadio, Cardelli [http://citeseer.ist.psu.edu/amadio93subtyping.html]
LinguaFranca [http://citeseer.ist.psu.edu/27902.html]

8 TYPE OPERATORS. 

To be useful, a type system must have operations on types and on types
and instances.

8.1 PREDICATES

8.1.1 isinstance(obj any, t) -> bool

This is already part of Python, and is a nominal subtype operator.

8.1.2 issubtype(t1:type, t2:type) -> boolean

A structural sub-type predicate. This predicate is used by type
checking programs, compilers and user programs.
All builtin types implement this predicate.
Should this be optional or mandatory?

8.1.3 conforms(obj: any, t: type) -> bool

Structural is-instance predicate. Returns True if the object conforms
to the type specified in all its detail. Ignores the objects class
name. Example:

class A(object):
        def aMethod(arg: int) -> int : pass
        def anotherMethod(arg: list) -> int : pass

class[interface] myInterface:
        def aMethod(arg: int) -> int : pass

a  = A()
    assert isconforming(a, myInterface)

8.1.4 issibling(obj1: any, obj2: any) -> bool

True if the objects have the same deep type. 

8.2 type(obj: any) -> Type

Existing shallow type operator. Returns the type of the object
referenced.

8.3 convert(obj: t1, t2: Type) -> t2

Given an object of type t1, this function attempts to return a new
object which conforms to type t2. The new object is a deep copy of the
old object. Failure results in a exception.
This function relies on known and default type conversions:

int -> long
    int -> str
    tuple -> list
    list-> tuple
    etc...

8.4 adapt(obj:any, protocol: t2, alternate: any) -> t2

Adapt was described in PEP 246.

This is a function which can return a wrapper object which effectively
is a view of the obj passed in but conforming to the type of the
protocol.

We add to the adapt requirements that if the obj is a structural
subtype of the protocol it is returned. ie

if isinstance(obj, protocol) or conforms(obj, protocol):
        return obj
    else:
        ....other adapt logic...

PEP246 [http://www.python.org/dev/peps/pep-0246/]

9 THE ABSTRACT META-TYPE

Py3k has a new abstract type interface to which all type objects
conform (and without which recursive descent type operators will be
dumbfounded). This convention is what allows users to create their own
kinds of type relations and subtype relations. This section lists
methods in addition to the usual (equality, hash etc).

9.1 conforms(self: any, obj: any) -> boolean

True if obj structurally conforms to this Type.

9.2 issubtype(self: any, t Type) -> boolean

True if t is a structural subtype.

9.3 isinstance(self: any, obj: any) -> bool 

True if obj is an instance by name.

10 FUN STUFF (THAT USERS CAN DO IN PYTHON SCRIPT)

Users can define their own Type classes which conform to the Type
abstraction. This section demonstrates extensibility.

10.1 PARAMETRIC TYPES

A function to construct a type for a list of three with the same type
in each:

def myType(t: Type)-> Type :
        return listType(t, t, t) 

10.2 EXISTENTIAL QUANTIFICATION

Example: All functions which return a list of the same type as its
input (whatever the type):

exists( ('t'), functionType(object: varType(t)) -> varType(t) ) 

This is very different from: 

functionType(object:any) -> any

To make this work the user has defined the following type
constructors:

exists() - creates a dynamic scope in which type variables can be
               used.

varType() - creates a type object which stores the first type it
                sees when its conforms() method is called. All
                subsequent calls to conform() must match the stored
                type.

10.3 REGEX TYPES

A regular expression specifies a set of strings and can be used as
a type specifier. For example:

creditCardNumberType = regexType("[0-9][0-9][0-9][0-9]
+[0-9][0-9][0-9][0-9] +[0-9][0-9][0-9][0-9] +[0-9][0-9][0-9][0-9] +")

Here the user has defined a regexType() type constructor which uses a
regular expression match within its conforms() predicate.

The regexType.isSubType() operator would be rather hard to code. 

END
About these ads

One Response to Some Thoughts on Types for Python-3000

  1. Tony Lownds says:

    The operators | and & have been proposed to represent union and intersection respectively.

    For completeness, how about a type specifier that represents an attribute access?

    Eg, attrType(attrName, resultType) or reflect.attr == resultType

    An object would conform to the type if getattr(object, attrName) returns a value conforming to resultType.

    This allows *expressions* to represent so-called duck-typing interfaces:

    Point = (attrType(‘x_coord’, float)
    & attrType(‘y_coord’, float)
    & attrType(‘y_coord’, float)
    & attrType(‘getX’, functionType(returns=float))
    & attrType(‘getY’, functionType(returns=float))
    )

    Can None stand for itself, ie functionType(returns=union(float, None)), or is just(None) needed?

    What is the point of “nothing”? Is it useful anywhere?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: