Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Types
I. Schwartzbacht aau .dk of Computer ~rhus, Denmark Science,
& Michael
DK-8000
to automatically compar-
records and pointers. They impose a shape invamant, which is verijied at compile-time and exploited to automatically ing, However, a major generate code for building, values wiihout use. which allow comlists or threaded We deto speccomputing a secondcopying, comparand traversing loss of ejiciency.
[7] [10],
ubiquitous such as ML
in modern
funcimper-
We propose a notion of graph types, mon shapes, such as doubly-linked trees, to be expressed concisely jine regular languages addresses tree. An ify relative spanning such of routing
at ive languages. Their benefits are substantial, but they also impose limitations; in particular, the values of recursive data types will always be tree shaped. In this paper we present a natural generalization, graph types, which allows a large variety of graph shaped trees. values, including root-linked (doubly-chained) cyclic lists, leaf-to-
of extra pointers
in a canonical
eficient
addresses
is developed.
trees, leaf-linked
order monadic logic to decide well- formedness of graph type specifications. This logic can also be used for automated reasoning about pointer structures.
The key idea is to allow only graphs with a backbone, which is a canonical spanning tree. All extra edges must depend functionally on this backbone. The extra edges are specified by a language of regular routing expressions, which give relative addresses within the backbone. We show that construction of such graph valuesalong happen cidable tomatic monadic derivation with all relevant time. manipulationscan We introduce a deallows auThere efficiently in linear
Introduction
Recursive data types are abstractions of structures bllilt from simple records and pointers. The values of a recursive tures that advantage
q
data
strucThe
all obey
validity
which contributes
*The author is supported by a fellowship from the Danish Resemch Counsil. t The author is partially supported by the Danish Research Council, DART Project (5.21.08.03).
Permission granted direct title that to copy that without fee all or part are not made copyright and appear, of this material is for provided commercial of the copying specific ACM the copies or distributed notice notice
have been other attempts to describe values. Our proposal, however, allows tions of a more general class of types, using an intuitive notation that is very ing concepts in programming languages. This summary is kept in an informal, and algorithms
advantage,
the ACM
publication To copy
PoPL-1/93-S.C.,
For this presentation, a (recursive) data type D is a special kind of tree grammar. The non-terminals are called types. There is a distinguished main type, which
-561 -5/93
/0001
/0196
. ..$1 .50
196
in examples is always the one mentioned ers are merely auxiliary. A production z4v(al: Tl,..., an:Tn)
first;
the oth-
always
denote
value;
in an be-
imperative
language
x = y is alis(x,v) yields
L. If x is a value In an impera-
of D, where T and the T~s are types, declares a variant v of type T containing say that the production data jields named al, ..., declares a type-variant an; we (T: v).
tive language, the value assignment x := y is present, possibly accompanied by the swap x :=: y which exchanges two subtrees without copying. Values of data types are traversed by recursive functions or procedures. Thus, explicit pointers are never used. in this approach.
For each type, the possible variants must be mutually distinct; thus (T: v) uniquely determines the production, Moreover, for each type-variant, the data fields must be mutually distinct.
There is no intrinsic The values of a data type are essentially the derivation trees of the underlying context-free grammar, starting with the main type. They are implemented as pointer trees, but the programmer ulate these pointers. will never directly maniptree Shortcomings Each node of such a pointer
loss of efficiency
Constants can be built, copied, compared, and traversed in optimal linear time, and addresses are accessed in constant time. Thus, if one really wants tree shaped values, then only advantages of Data Types are to be seen.
is an instance of a variant of a type. A formal detlnition of the values of a data type is given in section Al of the appendix. following integer data lists L + nonempty(head: Int, tail: L) As a simple type, which example, consider the specifies a type of simple
The main draw-back of data types is the limited shapes of values that they allow. For the above simple lists, values always look as follows pictured as a ground symbol) (an empty record is
-+ emptyo We can think specified w of the type Int as being a data type However, it is a common optimization to want an extra pointer to gain constant time access to the last element of the list. Thus, following shape the values should instead have the
Int+Oollo1201..
We allow implicit variants as a form of syntactic sugar. If the sets of data fields are distinct for all variants, then the explicit of the variant field names. variants are not needed; we may think a concatenation write L) of the names ss being Thus, L + 4() Programming with Data Types
These are not trees and, hence, cannot data types. Until now, there this problem. The only possibility to the often perilous use of explicit 3 Graph Types the notion simple
be specified
by to
When a data type has been specified, it gives rise to a number of operations in the programming language. First of all, there is a language for denoting constant values. For the above lists, one may write down L(head: 11, tail: (head: 12, tail: (head: 13, tail: ())))
We introduce conceptually
form
a al-
extension There
They
for the list of type L with elements 11, 12, and 13. If x is a variable containing a value of type L, then x.tail.tail.head specifies the address of a subtree, in this case of type Int. In a functional language this
197
the
remaining
deter-
mined
each case. In pictures of values, we use the convention that pointers from data fields are solid, whereas those from routing pointer fields with are dashed. no origin. looks like The root of the unby a a pointer derlying solid spanning tree, or backbone, is indicated
Many, but not all, sets of graphs fit this mold; examples of both kinds.
A graph type extends a data type by having routing fields aa well as data jields. Productions now look like T~v(. ..ai
:~... flj:!lj[l ?)...)
H a L A + A typical value is
(first: (head: ()
L[Jfirst L)
Jtail
$ t])
Here ai is a normal data field but aj is a routing field. It is distinguished by having an associated routing expression R. A graph type has an underlying which is obtained by removing the routing backbones ues of this relative data type. Routing data type, fields. The the valdescribe
first
addresses within
The complete
graph type value is obtained by using the routing expressions to evaluate the destinations of the routing fields. Routing expressions are regular expressions over a language of directives, which describe navigation within a backbone. Directives include move up to the parent (from a specific child) (t or T a) , move down to a specific child (J a), and verify a property of the current node, ( A), this defines where properties is a leaf include this is the root expression preto a ($), and this is (a specific variant (T or (T: v)). indicated if its regular A graph A routing
i. . . . . . . . . . . .
H-------------f
Jfirst Jtail $ T for the last
The routing
expression
field contains the following directives: move down along the first pointer (Jfirst); follow the tail pointers until a leaf is reached (Jtail $); then back up once (t). This is the destination of the last pointer. A cyclic list looks like
C -+ (next: -+ (next:
the destination
language
cisely one sequence of successful directives node in the tree. routing expression always defines a unique gives formal
type is well-formed
if every of
A typical
value is
destination. definitions
Section A2 of the appendix these concepts. To make a convincing we need to demonstrate . many useful families
case for this new mechanism, the following of structures facts: can be easily next ~ I next
specified; . values can be manipulated at run-time similarly to values of data types, and without loss of efficiency;
q
M
The routing expressions contain the following A doubly-linked simple cyclic directives: move up to the root. list looks like D ~ ~ (next: (next: D, prev: D[t +A Jnext$]) D[t A], prev: D[t +A]) A typical value is
We now show that many common pointer structures have simple specifications as graph types. The examples are all well-formed, which can be easily seen in
198
r ----1 jr II II prev I Ii i
L -;I L I
lnext
IM!3
next
-..p::v1 -..
$-R
prev ----next 1 I , prev I next left complicated operator here; they use the expresA
!?
left
Directives
are more
nondeterministic
union
on regular
L ------
sions (+) to express context-dependent choices. For example, consider the (prev field of the first variant. According root, follow to the routing next expression T + A $next $ of A binary tree with this field, we must either move up, or, if we are at the to the leaf. to the root
-----next
I . . . . . J
those
pointers
A binary tree in which all leaves are linked looks like R ~(left, -(root: A typical value is right: R[f R)
A])
K +(left, right: K) ared(next: K[BLACK* ~black(next: where RED abbreviates breviates STEP (K: black). ing a typical looks like T +(left, right: T, post: K[RED*
We shall abstain
TIPOST])
! ,
a(post: \ A I I 1 I I i ---1 1 I
TIPOST])
---/ left
~right+tleft
\rightJleft*$+
A Jleft.
I J
-6
J a(left, +(next:
~ b
root
-J
r -----I I I
------
. --
----
in a post
left --
1 : 1
1 1 I t
left
I 1
199
glance such specifications at least to the authors The use of abbreviations, legibility expressions. Complicated they
may seem dauntquickly became reuse strucsuch as STEP and and promote pointer
exactly
the same as for the underlying fields are just for constants data type. ignored.
tures may give rise to complicated graph type specifications. However, it is fair to say that the complexity of the graph type specification inherent pictorial complexity, description would. correlates well with this a verbal or in the same way that
are then computed automatically. The example values of the previous section are specified as constants as follows: H(first: C(next: D(next: R(left: J(left: T(left: Note that identical, (head: 11, tail: (head: 12, tail: (head: 13, tail: ())))) (next: (next: (left: (left: (left: (next: (next: (), right: (), right: (), right: ()))) ()))) ()), right: ()), right: ()), right: ()) ()) ()) are
of graph shaped values can by specified First of all, they must tree. This be determinof such be precludes cannot all edges must be functions
some underlying
spanning
specified. Consider a generalized tableau structure on a grid, in which there must be an edge from a point to the one immediately below, if they are both present.
the expressions
Copying (sub) values happens in two steps. First, the underlying spanning tree is copied; second, the values of the routing fields must be reevaluated. Consider for example is copied, root of that the leaf-to-root-linked tree. then several spanning Consider tree. now point If a subtree to the new then the leaves must
routing fields in the both the surrounding tree and the new graft may have to change. for example changed the red-black leaf-linked then from red to black, it must
from one cyclic list and inserted in another. A simple way of handling this is to reevaluate all routing fields, but that A graph type cannot represent such graphs, since the variant at a given node is dependent on whether there is a downward pointing edge. Thus the variant is dependent on the rest of the graphsomething we cannot specify 5 in a context-free grammar. Routing fields can be read just like data fields; also point to subtrees of the canonical spanning It is, of course, routing field. In summary, not possible to assign directly they tree. to a is undesirable since the surrounding small. A similar fields of subtrees. tree may problem that are be large and the graft maybe exists for the swapping an algorithm for detecting required to be updated.
We must develop
the routing
Programming
So far, we have seen that many families of pointer structures can be captured ss the values of graph types. We must also demonstrate used for programming in a manner data types. An obvious is that the problem recursive with having graph shaped values may be problematic; traversal that they can be similar to that for
many
of the required
algorithms
are in-
herited from the underlying data structure. However, we must be able to evaluate all routing fields in only combined linear time, and for assignment we need to detect those routing fields that must be updated. Evaluating Backbones Routing can clearly Fields be constructed in linear time.
how can we avoid cycles? However, for graph types we have the canonical spanning tree of the underlying data value. Thus, many of the simple techniques can be inherited in a straightforward manner. For example, the algorithm for comparing two graph values is
to evaluate
all routing
200
First, each routing expression in the graph type is translated into an equivalent nondeterministic automaton. This translation is linear. Next, a table is constructed that for each node a and
such a node can be followed backwardstowards possible origins, routing fields whose routes go through the node-and forwardstowards a possible destination. Above, this involves finding four destinations and four origins. For example, when considering a, we obtain two origins, the next fields of a) and a{, and their corresponding destinations. We shall shortly sible. Note, see how further that optimizations are po5 types the
for each automaton state q of each automaton A contains a pointer. Intuitively, if this pointer is not nil, it indicates rectives a node /3 reachable a such that by a sequence w of diw, automaton described in from upon reading
A may end up in a final state at node ~. This table is calculated in linear time by an algorithm the appendix. When the table has been constructed, the destination of a routing field at CY given as the pointer found in is an entry (cr, go) of the table, of the automaton Detecting Sometimes date routing representing where go is an initial the routing state expression.
however,
number of paths This happens for type R described an existing tree. the techniques the algorithm Monadic
to follow may be proportional to n. example for the root linked trees of earlier when a new root is added to In this case there is no gain in using in this section all routing compared fields. to
Required
Updates to up-
Logic
and
of the value.
formalism
allows several important to be expressed. Our logic permits types, addresses, we define the logic formally
For example, this happens when swapping subtrees of values of type J, the type of leaf-linked binary trees. Consider the situation after the subtrees rooted at addresses a and /3 have been swapped:
it is decidable.
of addresses. In this logic we can formulate questions such as What is the type-variant of a node a in a value z? question or Is there a walk in a value z from node a to a routing a graph Thus about type expression R? The can of whether is well-formed to node ~ according also be expressed Similarly,
in the logic as it is shown in section this question is decidable. such as types, comparing values,
are decidable. Although much can be expressed in the monadic second-order logic on graph types, there are simple operations that cannot. For example, one cannot represent the result subtree (although be expressible). Access Optimization of updating only routing four fields fields in leaf-linked needed to be upthe destiof replacing certain a subtree with another may properties of the result
L -----------------
------------------
---.
.-.
..-a
next
adding
it would often be less costly to locate the four nodes {a, a, @, ~} after the change and reevaluate their next fields than evaluating all routing expressions in the backbone from scratch. In fact, with this approach we can guarantee that the time to locate fields in need of updating is proportional to the total length of the paths that lead to these fields, in this csse of the paths from a to a, from a to a, from ~ to /?, and from /3 to p . To generate these paths, we consider each node incident on a backbone edge that changes (above, it would be a, ~, and their parents). Each automaton state at
we saw that
calculating
nation of each such routing field is not necessary. For example, the new value of the next field at a is the old value of the the four routing next field at ~. Thus, when fields have been located, the updates
can take place in constant time by properly permuting the values of known next pointers. Such use of the values of routing fields is called access optimization.
201
The formal reasoning behind access optimization can be formulated in monadic logic. For example the question graph yes Is the value of the next field at cd in the new field at /3 and the answer the same aa the value of the next can be expressed, can be computed.
may only point to nodes labeled syntactically with a marker. Since the number of markers is finite, this language precludes the modeling of e.g. doubly-linked trees. of the main lists or leaf-linked The ADDS trees, but allows root-linked in [6] allows
notation
In general, a strategy for access optimization is to compare values contained in nodes already located to the destination of paths that arise in the detection of required updates. This involves trying out different combinations of paths that are followed explicitly and testing mulate number whether other needed problem destinations Thus for finding or origins one can forthe least in order to can be found in constant that time.
abstract properties of pointer concepts of dimensions and motivation is to make through (non-invasive)
structures directions.
feasible With
the ADDS notation one cannot of values, and manipulations pointer operations. for evaluating
routing
fields
are simgram-
a minimization of paths
for reevaluating
attributed
carry out an update, For doubly-linked lows the automatic time code for having grammer 6
is decidable. al-
mars [9], but to our knowledge the algorithms for updating a tree of a grammar whose attributes are nodes in the tree has not been described Acknowledgments Thanks to the comments. References anonymous referees for their helpful before.
constantthe pro-
any pointer
operations.
Related
Decidability of Iogics of graphs have been studied tensively; see [4] for references to the classical sults trees eral that graphs. the monadic and The similar second for order logic is decidable extensions graph to more rewriting
exregen-
Logic
19th ACM
on Print.
hyperedge-replacement context-free
of [4] and
language
malisms describe much larger our graph types. An important property expressed graphs is decidable
and A. Podelski. Towards a meaning [3] H. Ait-Kaci of life. In Jan Maluszyfiski and Martin Wirsing, editors, posium Proceedings of the %-d International Language (Passau, LNCS Symon Programming ImplementaGermany), 528, Au-
mars. We could have used this result to derive our decidability result; but the translation into context-free graph grammars graph appears Although to be more mathematically tend complex than to untypes in our approach. context-free derstand; interesting,
tion and Logic Programming pages 255274. gust 1991. [4] B. Courcelle. The monadic
Springer-Verlag,
grammars
to be hard
this is likely
the reason why, to our knowlsecond-order logic of Ingraphs I. Recognizable sets of finite graphs: formation and computation, 85:12-75, 1990.
edge, they have not been used for describing programming languages.
Closer in spirit to our approach are the feature grammars and algebras; see [5] for references. These formalisms sponding are built to our on the view record fields) that features (correare partial functions
[5] J. Dorre and W.C Rounds. On subsumption and semiunification in feature algebras. In Proc. IEEE Symp. on Logics in Computer Science, pages 300310, 1990.
that identify attributes. Not being based on tree structures, features allow the description of self-referential data structures. As opposed to our approach, the values designated are not guided by any expressions.
[6] L. Hendren, J. Hummel, and A. Nicolau. Abstractions for recursive pointer data structures: Improving the analysis and transformation perative programs. In Proc. SIGPLAN92 of imCon-
The programming languages in [1, 2] and [3] use similar ideas and permits circular data structures. A restriction of this work is that such circular references
ference on Programming Language Design and Implementation, pages 249-260. ACM, 1992.
202
hter[7] C.A.R, Hoare. Recursive data structures. national Journal of Computer and Information Sciences, 4:2:105-132, [8] Robin Milner, Mads 1975. Tofte, and Robert ML. MIT Harper.
A2:
Graph
Types
and
Routing
Expressions
While F~ still denotea all fields, we use F: to denote the data fields, and F; to denote the routing jields. We use the notation ~ (T: v) a to denote the routing the routfield a in Press, 1990. for attribute ing expression associated variant v of type T. with
of Standard
Incremental
evaluation
grammars with unrestricted movement between tree modifications. Acts Infornaatica, 25, 1986. [10] D.A. Turner. Miranda: A non-strict functional
The graph type has an underlying data type Data Q which is obtained by removing all the routing fields. The Data routing expressions below. must all be defined on G, as described
language with polymorphic types. In Proc. Conference on Functional Programming Languages and Computer Architecture, pages 116. SpringerVerlag (LNCS 201), 1985.
Given a data type D, define the alphabet A that consists of directives (letters) A; $; T; Ta and la, where aEFv; T and (T:v), where TETD and VEVVT. Given x E Val D we define the step relation -= on
transitions:
This appendix contains the formal definitions of the concepts introduced. They may be used to elucidate and substantiate mary. Al: Data Types the contents of the preceding sum-
6:=6
.3
Cr-=cl
a,a+=
T4
Associated with a data type D we have some notation. The main type is denoted Main D. By TV we denote the set of types. variant above, By TV(T: Tv(T : v)ai v)a we denote = Ti. By VD the type of we denote When the data field a in variant v of type T, i.e., for the type-
Q ~=
~, we say that
~ is reached from a ~=
CYby
the set of all variants in D; by VDT we denote the set of variants of type T. By Fv we denote the set of all data fields in D; by FD (T: v) we denote the set of data fields of type T and variant variant declaration above, FO(T of F;. z : F% 4 An address a is an element v, i.e., for the type: v) = {al, , . . . an}.
directive defined,
d. Note that
@ such that
~ is uniquely
over x
=
A.
fl,
along
sequence, if it exists,
. . . an
The values of D is the set Val D of functions Tv x VD such that dom x is finite c(e) = (Main and prefix closed;
Q z=
p. R on D is a regular expression regular expressions using operand * (iteration). L(R). is a x, a destination defined by R is denoted
-era
Edomx ~ aCFD(T:v) A Tv(T: v) a = T where z(cra) = (T: v) for some v. the addresses in dom % serve as pointer
@c dorn x such that a ~= p for some route p E L(R). The set of all destinations is denoted Dest =(R, a). If this set is a singleton we say that R at a in z has the unique destination property. Intuitively, the routing expressions specify where the
pointers in he routing fields should lead to. A graph type is only well-formed when all such expressions always have the unique lead to subtrees destination property types. and always of the specified
Intuitively, values.
203
The values of a well-formed Val G of finite construct graphs. in the underlying a graph data type.
graph type G form the set Given x E Val Data are labeled G we Note that and that
There is a graph for every value x, the set of by field the node a and its
names, come in two flavors: data edges and routing edges. The data edges provide the canonical spanning tree-the backbone---and {a ~ The routing {a4pl a c F~z(a), Rgz(a)a = R, Dest (both .(R, cr) = {/3} } are defined as
immediate neighborsthus a number of nodes that depends on the grammar only. We conclude that the algorithm Ofz. With the well-formedness criterion it is not hard to see that the destination of a routing field at a is the node ~ if and only if there exists an initial the corresponding A4: The Monadic monadic M2L automaton Logic second-order GT, logic of graph certain types, desuch that state q of q) = ~. Tbl(a, runs in linear time as a function of the size
aa I aa G domz}. as
noted
is used to express
proper-
ties of graph types. We first introduce a simpler logic, monadic second-order logic of data types, denoted M2LDT. Fix a data type D. We define the M2LDT on D as follows. There are two kinds of second-order variables, value variables and address set variables. D. A value variable M x denotes a value of D. An a set of addresses with of U, fl and 0 address set variable Such variables denotes
automaton A with transition . . . dn 6 A*, we write w=do there exists go, .d.., qn+l ql . ..-3
qn+~.
can be combined
such that
qn+l
= i,
to form address set expressions. The set of addresses of z is denoted dom z, which is also a set expression. A first-order variable a, also called an address vamable, denotes an address of D. That a is an address in M is expressed as the formula a E M. A value variable z of type D is introduced by an existential quantification 3DX or a universal quantification lfvx. Variables that denote addresses The formulas quantification, the following or sets of addresses (3) or universal A (and), .,. are intro(V) quanby duced by usual existential tification.
and go ~
Our goal is to build a table Tbl such that for each node a in z and for each automaton A and each state q of A, the value of Tbl(cr, q) is a node ~, if it exists, such that then Tbl(a, for some w G A*, q) = nil. below employs a queue Q to calculate a %= ~ and q ~.4 qF, where qF is a final state of A; if no such node exists
V (or), = (nega?.. -
1. Tbl(cr, q) := nil, for all nodes a in z and all automata states q 2. make Q empty 3. for all (a, q), where q is a final (a) (b) 4. while Tb/(cx, q) := et insert (a, q) in Q state:
is A(cJ)
isc$(a) isz (T: v)(a) is=T(a) is=walk(a, a=p El = E2 t~ ~ 82 faE& ~ = @.a ~, R)
~p c L(R)
a=/?ka
aGFDz(@
where & and the &is are address set expressions. formulas have the obvious is true iff the type-variant
q and ~ ~= q)
q) := Tb/(~,
204
and a is a
the last state need not be final) starting at cr. This collection of subsets can be coded using IARI set variables. We must then write a M2LkSFT formula expressing that all states in a subset have a predecessor under the transition We must also write the collection condition; relation (unless and in the subset at a). This alone down a condition technically, we of subsets is minimal in order to ensure initial states at cr. c1 are omitted. for some directive
are calculating that all states The details where D = Data ~ is the underlying data type; 3 ! is an abbreviation for there exists a unique; and AND is an abbreviation by expanding expressing the conjunction indices. obtained Logic of graph over the corresponding
of this translation
types types,
Decidability Theorem
The monadic second-order logic of graph M2LGT, has the same syntax as M2LDT. Theorem 2 M2LGT is decidable. into M2LkSFT only /3 ; a. If a c F~z(@, is=walk(~, We omit the details.
Proof M2LDT is decidable by an easy reduction to M2LkSFT, the monadic second-order logic of k successors on finite trees. The latter logic hss set variables, such as X, denoting subsets of {1, . . . . k}* and jirst-order {1,... , k}*. .kforeachj variables, E{l,..., such as a, denoting elements of In addition there is a successor function k} and connective and quan-
Proof The translation for the formula a = the translation where R = R~z(~)a.
tifiers as above. We will indicate how formulas of M2LDT involving a data type D can be translated into 1,..., mula IX; M2LkSFT. fields k. ,..., We let k be [Fm[, in D, and we rename into xd the number field names of as forX~T, dom z; different
3ZJX : t
x;,... X;= expresses the type at position a by the bit pattern (ac XJ,..., acX~T) (here nl = loglT~l); at position x;, . . . X~w expresses the variant the bit pattern (a c X;,..., a c X~ ) (here log lV~ l); ~ is the translation expressing that z is a value of D according a by n =
ditions on derivation trees given in Section 1. Address set variables are just translated into set variables and address variables into first-order variables. Most of the basic formulas are now easy to express. For example, a = ~ i a is translated into a = f?.a A a E x; this formula is equivalent a = ~. a A a c F~z(~) ,8, R) since z c Val V. The basic formula iszwalk(a,
is more difficult, Here we encode the working of AR, the automaton equivalent to R, on z by a formula that guesses the subsets of states at each a that are accessible from a partial ron (which is like a run except that
205