Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Object-Oriented
Databases:
Modeling and Applications
Zongmin Ma
Universit de Sherbrooke, Canada
Acquisitions Editor:
Senior Managing Editor:
Managing Editor:
Development Editor:
Copy Editor:
Typesetter:
Cover Design:
Printed at:
Mehdi Khosrow-Pour
Jan Travers
Amanda Appicello
Michele Rossi
Lori Eby
Jennifer Wetzel
Lisa Tosheff
Yurchak Printing Inc.
Advances in Fuzzy
Object-Oriented Databases:
Modeling and Applications
Table of Contents
Preface .............................................................................................................. v
SECTION I
Chapter I. A Constraint Based Fuzzy Object Oriented Database
Model ............................................................................................................... 1
G. de Tr, Ghent University, Belgium
R. de Caluwe, Ghent University, Belgium
Chapter II. Fuzzy and Probabilistic Object Bases .................................. 46
T. H. Cao, Ho Chi Minh City University of Technology, Vietnam
H. Nguyen, Ho Chi Minh City Open University, Vietnam
Chapter III. Generalization Data Mining in Fuzzy Object-Oriented
Databases ....................................................................................................... 85
Rafal Angryk, Tulane University, USA
Roy Ladner, Naval Research Laboratory, USA
Frederick E. Petry, Tulane University & Naval Research Laboratory,
USA
Chapter IV. FRIL++ and Its Applications ............................................ 113
J. M. Rossiter, University of Bristol, UK & Bio-Mimetic Control
Research Center, The Institute of Physical and Chemical
Research (RIKEN), Japan
T. H. Cao, Ho Chi Minh City University of Technology, Vietnam
SECTION II
Chapter V. Fuzzy Information Modeling with the UML .................... 153
Zongmin Ma, Universit de Sherbrooke, Canada
SECTION III
Chapter VI. A Framework to Build Fuzzy Object-Oriented Capabilities
Over an Existing Database System ........................................................ 177
Fernando Berzal, University of Granada, Spain
Nicols Marn, University of Granada, Spain
Olga Pons, University of Granada, Spain
M. Amparo Vila, University of Granada, Spain
Chapter VII. Index Structures for Fuzzy Object-Oriented Database
Systems ....................................................................................................... 206
Sven Helmer, Universitt Mannheim, Germany
Chapter VIII. Introducing Fuzziness in Existing Orthogonal
Persistence Interfaces and Systems ....................................................... 241
Miguel ngel Sicilia, University of Alcal, Spain
Elena Garca-Barriocanal, University of Alcal, Spain
Jos A. Gutirrez, University of Alcal, Spain
SECTION IV
Chapter IX. An Object-Oriented Approach to Managing Fuzziness
in Spatially Explicit Ecological Models Coupled to a Geographic
Database ...................................................................................................... 269
Vincent B. Robinson, University of Toronto at Mississauga, Canada
Phil A. Graniero, University of Windsor, Canada
Chapter X. Object-Oriented Publish/Subscribe for Modeling and
Processing Imperfect Information .......................................................... 301
Haifeng Liu, University of Toronto, Canada
Hans Arno Jacobsen, University of Toronto, Canada
About the Authors ..................................................................................... 332
Index ............................................................................................................ 338
Preface
A major goal for database research has been the incorporation of additional
semantics into the data model. Classical data models often suffer from their
incapability to represent and manipulate imprecise and uncertain information
that may occur in many real-world applications. Since the early 1980s, Zadehs
fuzzy logic has been used to extend various data models. The purpose of introducing fuzzy logic in data modeling is to enhance the classical models so that
uncertain and imprecise information can be represented and manipulated. This
resulted in numerous contributions, mainly with respect to the popular relational
model or to some related form of it.
However, rapid advances in computing power brought opportunities for databases in emerging applications in CAD/CAM, multimedia, geographic information systems, knowledge management, etc. These applications characteristically require the modeling and manipulation of complex objects and semantic
relationships. The advances of object-oriented databases are acknowledged
outside the research and academic worlds. It proves that the object-oriented
paradigm lends itself extremely well to the requirements. Because the classical
relational database model and its extension of fuzziness do not satisfy the need
of modeling complex objects with imprecision and uncertainty, currently, much
research has concentrated on fuzzy object-oriented database models in order
to deal with complex objects and uncertain data together.
This book focuses on an important extension of the object-oriented paradigm
that allows for the inclusion of fuzzy information in this paradigm and presents
the latest research and application results in fuzzy object-oriented databases.
Some major issues on concepts, semantics, models, design, implementation, and
applications of fuzzy object-oriented databases will be investigated in the book.
The different chapters in the book were contributed by different authors and
provide possible solutions for the different types of technological problems concerning fuzzy object-oriented databases. Each of the contributors to the book is
a leading researcher in the field of fuzzy object-oriented databases who has
made numerous contributions to fuzzy information engineering.
vi
Introduction
This book is organized into four major sections. The first section discusses the
issues of the representation, semantics, and models of fuzzy object-oriented
databases in the first four chapters. Chapter V describes fuzzy object-oriented
conceptual data modeling and comprises the second part. The next three chapters covering the implementation issues in fuzzy object-oriented databases comprise the third part. Finally, the last two chapters, which comprise the fourth
part, contain applications of fuzzy object-oriented information modeling and fuzzy
databases in publish/subscribe and geographic information systems, respectively.
First, we will look at the problem of the representation, semantics, and models
of fuzzy object-oriented databases.
The authors of the Chapter I, de Tr and de Caluwe, define a fuzzy objectoriented formal database model that allows us to model and manipulate information in a (true to nature) natural way. The presented model was built upon an
object-oriented-type system and an elaborated constraint system, which, respectively, support the definitions of types and constraints. Types and constraints
are the basic building blocks of object schemes, which, in turn, are used for
defining database schemes. Finally, the definition of the database model was
obtained by providing adequate data definition operators and data manipulation
operators. Novelties in the approach are the incorporation of generalized constraints and of extended possibilistic truth values, which allow for a better representation of data(base) semantics.
Cao and Nguyen introduce an extension of the probabilistic object base model.
Their model is not the same as the probabilistic object base model that was
investigated in the literature. Their model uses fuzzy sets for representing and
handling vague and imprecise values of object attributes. A probabilistic interpretation of relations on fuzzy set values is proposed to integrate them into that
probability-based framework. Then, the definitions of fuzzy-probabilistic object
base schemas, instances, and algebraic operations are presented.
Angryk, Ladner, and Petry extend the attribute generalization algorithms that
were most commonly applied to relational databases and consider the application of generalization-based data mining to fuzzy similarity based object-oriented databases. A key aspect of generalization data mining is the use of a
concept hierarchy. The objects of the database are generalized by replacing
specific attribute values with the next higher-level term in the hierarchy. This
will eventually result in generalizations that represent a summarization of the
information in the database. The authors focus on the generalization of similarity-based simple fuzzy attributes for an object-oriented database (OODB) using approaches to the fuzzy concept hierarchy developed from the given similarity relation of the database. They then consider application of this approach
to complex structure-valued data in the fuzzy OODB.
vii
viii
ticular fuzzy object-oriented data model but is kept general enough to be used in
different FOODBS contexts. Second, the author presents the index structures
for each query pattern, which support the efficient evaluation of these queries.
An explanation of the basic techniques from standard index structures (like
B-trees) to sophisticated access methods (like Join Index Hierarchies) is given
in the chapter rather than an exhaustive description.
Sicilia, Garca-Barriocanal, and Gutirrez focus on how to integrate the models
and techniques that can deal with imprecise and uncertain information in the
facets of object data stores with current database design and programming
practices, so that the benefits of fuzzy extensions can be easily adopted and
seamlessly integrated in current applications. The authors try to provide some
criteria to use to select the fuzzy extensions that more seamlessly integrate into
the current object storage paradigm known as orthogonal persistence, in which
programming language object models are directly stored, so that database design becomes mainly a matter of object design. They provide concrete examples and case studies as practical illustrations of the introduction of fuzziness,
both at the conceptual and the physical levels of this kind of persistent system.
In the fourth section, we see the applications of fuzzy object-oriented information modeling and fuzzy databases.
Robinson and Graniero use a spatially explicit, individual-based ecological modeling problem to illustrate an approach to managing fuzziness in spatial databases that accommodates the use of nonfuzzy as well as fuzzy representations
of geographic databases. The approach taken in the chapter uses the Extensible Component Objects for Constructing Observable Simulation Models (ECOCOSM) system loosely coupled with geographic information systems. The ecological modeling problem described in the chapter is used to illustrate how combining Probes and ProbeWrappers with Agent objects affords a flexible means
of handling semantic variation and serves as an effective approach to utilize
heterogeneous sources of spatial data.
The publish/subscribe systems describe such a paradigm that information providers disseminate publications to all consumers who expressed interest by registering subscriptions with the publish/subscribe system. Liu and Jacobsen notice that in all existing publish/subscribe systems, neither subscriptions nor publications can capture uncertainty inherent to the information underlying the application domain. However, in many situations, exact knowledge of either specific subscriptions or publications is not available. To address this problem, the
authors propose a new object-oriented publish/subscribe model based on possibility theory and fuzzy set theory to process imperfect information for either
expressing subscriptions or publications or both combined. Furthermore, the
authors define the approximate publish/subscribe matching problem and develop and evaluate the algorithms for solving it.
ix
Acknowledgments
The editor would like to acknowledge the help of all involved in the collation
and review process of the book, without whose support the project could not
have been satisfactorily completed.
Most of the authors of chapters included in this book also served as referees
for papers written by other authors. Thanks go to all those who provided constructive and comprehensive reviews.
A special note of thanks goes to all the staff at Idea Group Publishing, whose
contributions throughout the whole process, from inception of the initial idea to
final publication, have been invaluable.
Special thanks go to the publishing team at Idea Group Publishing. In particular
to Mehdi Khosrow-Pour, whose enthusiasm motivated me to initially accept his
invitation for taking on this project, and to Michele Rossi, who continuously
prodded via e-mail to keep the project on schedule.
In closing, I wish to thank all of the authors for their insights and excellent
contributions to this book. I also want to thank all of the people who assisted in
the reviewing process. In addition, this book would not have been possible without the ongoing professional support from Mehdi Khosrow-Pour and Jan Travers
at Idea Group Publishing.
Zongmin Ma, Ph.D.
Sherbrooke, Canada
April 2004
SECTION I
Chapter I
A Constraint Based
Fuzzy Object Oriented
Database Model
G. de Tr
Department of Telecommunications and Information Processing,
Ghent University, Belgium
R. de Caluwe
Department of Telecommunications and Information Processing,
Ghent University, Belgium
Abstract
The objective of this chapter is to define a fuzzy object-oriented formal
database model that allows us to model and manipulate information in a
(true to nature) natural way. Not all the elements (data) that occur in the
real world are fully known or defined in a perfect way. Classical database
models only allow the manipulation of accurately defined data in an
adequate way. The presented model was built upon an object-oriented type
system and an elaborated constraint system, which, respectively, support
the definitions of types and constraints. Types and constraints are the basic
building blocks of object schemes, which, in turn, are used for defining
database schemes. Finally, the definition of the database model was
obtained by providing adequate data definition operators and data
manipulation operators. Novelties in the approach are the incorporation of
generalized constraints and of extended possibilistic truth values, which
allow for a better representation of data(base) semantics.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2 de Tr & de Caluwe
Introduction
In this chapter, a formal object-oriented database model that is suited to model
both perfect and imperfect information is built. This model distinguishes itself
from existing fuzzy object-oriented models by integrating (generalized) constraints (Zadeh, 1997). These constraints are used to define the semantics and
integrity of the data and to define query criteria. Another novelty is its underlying
logical framework of extended possibilistic truth values (de Tr, 2002). Moreover, the model is built upon the Object Data Management Group (ODMG) data
model (Cattell & Barry, 2000), as far as its crisp components are considered.
The starting point for the formalism is an algebraic foundation, in which sets of
objects, operators on these sets, and constraints that are defined for these sets
are central (de Tr, de Caluwe, & Van der Cruyssen, 2000). Special domainspecific elements that are represented by the symbol, are used to formalize
undefined (or inapplicable) data. This foundation is formally defined on the
basis of a type system and a constraint system. Starting from this basis, object
schemes and database schemes are defined, which allow for databases to be
defined rather easily. Furthermore, querying is generalized to a manageable
closed set of operators.
Contrary to existing proposals that extend a crisp model, an approach based on
generalization allows databases to be defined that handle perfect data as special
cases of imperfect data. For the generalization, fuzzy set theory and possibility
theory are used. Moreover, with the presented work, it is shown how Zadehs
theory on fuzzy information granulation and generalized constraints (Zadeh,
1996, 1997) can be applied within the context of a database model.
The underlying logic of the database model is many valued and uses so-called
extended possibilistic truth values (de Tr, 2002), which are obtained by
considering the three truth values true, false, and undefined and
adding possibilistic uncertainty. This logic allows for a more epistemological
modeling of truth and, moreover, can explicitly handle those cases where some
of the data are not applicable.
The remainder of the chapter is organized as follows. In the next section, an
overview of different approaches in fuzzy object-oriented database modeling is
given. Furthermore, some preliminary concepts and definitions are introduced.
In the section entitled, Types and Type System, a type system, which supports
the formal definition of all data types defined in the database model, is presented.
These data types are compliant with the ODMG data model, as far as their crisp
counterparts are considered. In Constraints and Constraint System, a constraint system supporting the formalization of constraints is defined. Constraints
are important for defining database semantics and query criteria. In Object
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Some Preliminaries
Simultaneously with the maturation of object-oriented database models, research on fuzzy object-oriented databases is getting more attention. Nowadays, several fuzzy object-oriented database models exist. Based on some of
them, prototypes were already implemented.
Related Work
Among the existing fuzzy object-oriented database models are the following:
the object-centered model of Rossazza et al. (1990, 1997); the object-oriented
model of Tanaka et al. (1991); the similarity-based model of George et al. (1992,
1997); the fuzzy object-oriented data (FOOD) model of Bordogna et al. (1994,
1999, 2000); the fuzzy algebra of Rocacher et al. (1996); the UFO model of Van
Gyseghem (1998); the fuzzy association algebra of Na and Park (1997); the
FIRMS model of Mouaddib et al. (1997); the FOODM model of Marn et al.
(2000, 2001, 2003); and the rough object-oriented database of Beaubouef and
Petry (2002).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
4 de Tr & de Caluwe
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
6 de Tr & de Caluwe
semantics (Kim, 1994; Alagi , 1997) and its limited ability to deal with constraints, despite the fact that a thorough support of constraints is the most obvious
way to define the semantics of a database (Kuper, Libkin, & Paredaens, 2000;
de Tr & de Caluwe, 2000).
The presented fuzzy object-oriented database model is consistent with the
ODMG data model (as far as its crisp components are considered) and,
moreover, deals with constraints. Zadehs generalized constraints (Zadeh, 1997)
were integrated in the framework and allow for a general, extensible definition
of the semantics and integrity of the data and of the query criteria. Furthermore,
a logic based on extended possibilistic truth values is used to be able to explicitly
cope with missing information.
Generalized Constraints
The concept of generalized constraint was introduced by L. A. Zadeh (Zadeh,
1986, 1997) as the basis for a computational approach to meaning and knowledge
representation. The introduction of this concept was motivated by the fact that
conventional crisp constraints of the form X C, where X is a variable and C is
a set, are insufficient to represent the meaning of perceptions.
A generalized constraint is, in effect, a family of constraints and can be seen as
a generalization of an assignment statement (Zadeh, 1997).
Definition 1 (Generalized constraint): An unconditional generalized constraint on a variable X is defined by:
X isr R
where R is the constraining relation, and isr is a variable copula in
which the discrete-valued variable r defines the way in which R constrains X.
As specified in (Zadeh, 2002), the principal constraints are the following:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
8 de Tr & de Caluwe
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
t*(p)
(x)
where t*(p)(x) denotes the possibility that the value of t*(p) conforms to x,
and t*(p)(x) is the membership grade of x within the fuzzy set t~*(p).
Special cases of EPTVs are as follows:
t ~*(p)
Interpretation
{(T,1)}
p is true
{(F,1)}
p is false
{(T,1), (F,1)}
p is unknown
{(,1)}
p is undefined
p is unknown or undefined
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
10 de Tr & de Caluwe
Definition of Types
In order to give a complete definition of the concept of type, it is necessary to
provide the rules that define its syntax, as well as the rules that define its
semantics.
Definition 3 (Type): Each type supported by the type system is defined by
its syntax and its semantics.
The syntax of a type. The syntax rules for a type can be formally
described by means of some mathematical expressions.
A set of domains Dt
A designated domain domt D t
A set of operators Ot
A set of axioms At
The designated domain dom t defines the set of valid values for the type and is
called the domain of the type. In order to deal with cases where a regular domain
value does not apply, the assumption was made that every domain domt contains
a special, domain-specific value t, which is used to represent undefined
domain values. The set of operators O t contains the operators, which are defined
on the domain dom t. The set of domains Dt consists of the domains that are
involved in the definition of the operators of Ot, whereas the set of axioms At
consists of the axioms that are involved in the definition of the semantics of the
operators of Ot.
Type System
In order to define the types supported by the presented database model, a type
system (Lausen & Vossen, 1998) was built. The presented type system is
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
consistent with the specifications of the ODMG object model (Cattell & Barry,
2000). To guarantee this consistency, a distinction was made between a socalled void type (which is the most primitive type of the system), literal types,
object types, and reference types (which are new with respect to the ODMG
model). Reference types enable us to refer to the instances of object types and
are used to formalize the binary relationships between the object types in a
database scheme.
Each type supported by the type system is formally defined as prescribed by
Definition 3. The syntax rules for the types of the presented type system are
defined as in Definition 4.
Definition 4 (Types: syntax rules): Let ID denote the set of valid identifiers, and let the sets of type expressions that satisfy the syntax of a reference
type, a literal type, and an object type be denoted, respectively, as Treference,
Tliteral, and Tobject, where:
Basic types:
Tbasic {Integer, Real, Boolean, Octet, String} Tliteral
Collection types:
Tcollect {Set(t), Bag(t), List(t), Array(t), Dict(t,t) | t,t Tliteral} Tliteral
Type t is called the significant type of the collection type. In the case of
nested collection types, the significant type of the innermost collection type
is called the most significant type of the collection type.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
12 de Tr & de Caluwe
Enumeration types:
Tenum {Enum id (id1,id2,,idn |{id,id1,id 2,,idn} ID} Tliteral
Structured types:
Tstruct {Struct id (id1 isr1 t1; id2 isr2 t2;; idn isrn tn) | ({id,id1,id2,,idn} ID)
[ 1 i n: (isri {ise,is,isv}) (ti Tliteral Treference)]} Tliteral
Hereby, id identifies the structured type, whereas (id1 isr 1 t1; id 2 isr2 t2;;
id n isr n tn) represents the components of the structured type. Each component idi isr i ti, 1 i n is a (generic) generalized constraint on a variable
id i with associated type ti Tliteral Treference.
If isri = ise, the valid values of idi are restricted to the values of the
domain domti of the associated type ti.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Let Vsignat denote the set of all valid operator signatures, which is defined
as follows:
Hereby, Void denotes the void type, which is used in situations where a
further type specification could not be given (Cattell & Barry, 2000).
Furthermore, t' is the type of the returned value(s) of the operator, and id'i
isri t'i, 1 i p are the input parameters of the operator. Each input
parameter is a (generic) generalized constraint on a variable id'i with
associated type t'i T literal T reference. These generalized constraints are
interpreted as specified previously.
If id ID, {id 1 , id 2 ,, id m} ID \ {id}, {id 1 ,id 2 ,,id n } ID, isr i
{ise,is,isv} and si Tliteral T reference Vsignat, 1 i n, then:
The identifier id identifies the object type. Like many object models, the
ODMG Object Model includes an inheritance-based type-subtype hierarchy. The identifiers idi, 1 i m denote the supertypes of the object type
(if existent). The characteristics1 of the object type are represented by (id1
isr1 s1;id2 isr2 s2;;idn isrn sn).
Each characteristic idi isri s i, 1 i n is a (generic) generalized constraint
on a variable idi with associated specification s i Tliteral T reference Vsignat.
The semantics of the generalized constraints are the same as specified
previously. If si Tliteral, the characteristic is called an attribute; if s i Treference,
the characteristic is called a binary relationship; whereas if si Vsignat, the
characteristic is a method. The generalized constraint puts a restriction on
the return values of the operator. In addition to the characteristics stated
in its type specification, an object type inherits the characteristics of its
supertypes (if existent).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
14 de Tr & de Caluwe
Void type. The domain of the Void type is, by definition, {Void}. Its
corresponding set of operators is the singleton {: dom Void} consisting
of the bottom operator , which always results in an undefined domain
value (represented by the symbol Void).
Reference types. The reference types are all generic types, designated
by a type generator and an object type parameter. Reference types were
introduced in order to formalize binary association relationships between
object types. An association relationship between two object types has a
one-to-one, a one-to-many, or a many-to-many cardinality, which
denotes the maximum number of participating domain values of both types.
To support the notion of cardinality, a distinction was made between singlevalued and multivalued reference types. Multivalued reference types are
subdivided into set-of-references, bag-of-references, and list-ofreferences, in order to formalize the different ODMG definitions of oneto-many and many-to-many relationships (Cattell & Barry, 2000).
Multivalued reference types include set-of-references, bag-ofreferences, and list-of-references, and are denoted, respectively,
by the type generators SetRef, BagRef, and ListRef and by an object-type
parameter t Tobject. The domain of type Set Ref(t) [resp. BagRef(t) and
ListRef(t)] consists of the undefined domain value Set_Ref(t) [resp.
Bag_Ref(t) and List_Ref(t)] and of sets (resp. bags and lists) of references
to regular elements of dom t. Furthermore, the types SetRef(t) [resp.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Collection types. The collection types are all generic types, designated by
a type generator and one or two type parameters, e.g., the bag types are
denoted by the type generator Bag and a type parameter t. The domain of
the bag type Bag(t) consists of the undefined domain value Bag(t) and of
unordered collections of elements of the domain of type t, in which
duplicates are allowed. The associated set of operators consists of =, ,
cardinality, is_empty, count, +, , , \, is_element, and . For example,
the collection type Set(Integer) is used to model sets of integer numbers,
whereas the collection type Bag(Real) is used to model bags of real
numbers.
Structured types. The domain of a structured type Struct id (id1 isr1 t1;id2
isr2 t2;;idn isrn tn) contains the undefined domain value id. All other domain
values are composite and consist of n values id i isr'i vi, with isr'i {ise,is},
i = 1,2,,n. Each value in the composition is, in turn, described by a
generalized constraint, for which the semantics are as follows:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
16 de Tr & de Caluwe
isri = ise
If isr'i = ise, then vi dom ti. The value of idi is crisply described.
For example, the value Name ise My_company is a valid
value for the Name ise String component of TCompany and
denotes that the name of the represented company is certain and
equals My_company.
isri = is
isr'i = ise
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
isr'i = is
isri = isv
isr' i = ise
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
18 de Tr & de Caluwe
isr'i = is
R(x1,x2,...,xn) = y
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
In the type system, fuzzified operators are defined using polymorphism and operator overloading, which allows a different meaning to
be assigned to operators in different contexts. Operators then vary
depending on whether their parameters are ordinary values, fuzzy sets,
or Level 2 fuzzy sets.
Operators intended for the handling of fuzzy sets and of Level 2 fuzzy
sets. Examples include the operators =, , , co, normalize, support,
core, -cut, -cut, and (where (F,x) returns the membership
grade of element x within fuzzy set F). Each other operator preserves
its usual semantics.
Object types. The object types are the most elaborated types of the type
system. Each object type is characterized by a number of properties (which
describe its structure) and a number of explicitly defined operators, also
called methods (which describe its behavior).
As specified in Definition 4, a property is either an attribute or a binary
relationship. In order to define the binary relationships between object
types, a partial association relation is defined over the set Tobject. (id1
id2 denotes that object type id 1 is binary related to object type id2.)
An object type can inherit properties and methods from its parent types
(Taivalsari, 1996). In order to define the inheritance-based type-subtype
relationships between object types, a partial ordering relation < is defined
over the set Tobject. (id < id denotes that object type id inherits all
characteristics of object type id.)
The domain of an object type id contains the undefined domain value id
and the undefined domain values id of the parent types id of type id. Each
other domain value is composite and contains a value id i isr' i v i , with
isr' i {ise,is}, for each of the (inherited) properties id i isri si, si Tliteral
Treference of the type. Each value in the composition is, in turn, described
by a generalized constraint, for which the semantics are the same as that
explained with the structured types. The set of operators associated with
a given object type is the union of a set of implicitly defined operators and
a set of explicitly defined operators. The implicitly defined operators are =,
, . (period member operator), set_property, get_property, and . The
explicitly defined operators are the (inherited) methods id i isri si, si Vsignat
of the object type.
The type system TS, which defines all the valid types supported by the presented
database model, is defined by the following definition.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
20 de Tr & de Caluwe
Example 1: The type system allows for definitions like the following, which are
intended to describe a (simplified) type representing employees. With the
structured types
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Instances of Types
The instances of a reference type, a literal type, and an object type are,
respectively, called reference instances, literals, and objects, whereas the Void
type cannot have instances.
Definition 6 (Reference instance): Every reference instance r is defined as
a pair: [t,v] where t Treference and v domt.
Definition 7 (Literal): Every literal l is defined as a pair: [t,v] where
t T literal and v domt.
Depending on its lifetime, an object can be either transient or persistent.
Definition 8 (Transient object): A transient object o is defined as a triple
[t,v, t ~* (o is an instance of t)] in which:
The unicity of the object identifier has to be guaranteed over the whole database.
The object identifier oid is used to refer to the (state of the) object. The set of
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
22 de Tr & de Caluwe
and
[oid 2, { }, TPerson,
(
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
24 de Tr & de Caluwe
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
expression (resulting in an EPTV), without aggregation operators, that is defined over the properties and components of t and
its associated types and expresses a restriction for the domain
values of the property or component denoted by id, then
c{id}value [e] Cis
The set Cim consists of multitype dependent constraints that are not
defined with respect to the entire extent of an object type, and it is
defined as follows:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
26 de Tr & de Caluwe
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
c{id}reference [ ] Cem
If there exists an inverse association relationship in the referenced object type t' and id' ID is the path expression, which
denotes this relationship, then
c{id,id}reference [ ] C em
Then the set C of all constraint expressions is defined by:
C C is C es C i m C em
The full semantics of the constraints c C are defined by providing an
appropriate definition for their corresponding logical function (cf. Definition 10).
Below, informal descriptions are given:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
28 de Tr & de Caluwe
values of these properties over the extent Vtextent of type t. Furthermore, the
constraint guarantees that none of these values is undefined.
The constraint system CS, which defines all the valid constraints supported by
the presented database model, is defined by the following:
Definition 12 (Constraint system): The constraint system CS is formally
defined by the triple CS = [ID,E,C] where:
Example 3: With respect to the object types TPerson and TEmployee presented in Example 1, the following constraints can be considered:
c 1 = c {TEmployee.EmployeeID}not_null [ ]
c2 = c{TPerson.Age}value [0 TPerson.Age around_120]
c3 = c{TEmployee.Works_for.Percentage}value [0 TEmployee.Works_for.Percentage 100]
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
c4 = c{TPerson}key [TPerson.Name]
c 5 = c {TPerson,{TPerson,TEmployee}} oid [ ]
c 6 = c {TPerson,{TPerson,TEmployee}} name [ ]
c 7 = c {TEmployee,{TPerson,TEmployee}}oid [ ]
c 8 = c {TEmployee,{TPerson,TEmployee}}name [ ]
c9 = c{TPerson.Children}reference [ ]
The set of all existing object schemes is denoted as OS and is defined as the union
of the set of all the quadruples that satisfy Definition 13 and the singleton {OS},
with an element that represents an undefined object scheme.
An instance o of the object type t is defined to be an instance of the object scheme
os = [id,t,M,Ct], if and only if it satisfies [with an EPTV that differs from
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
30 de Tr & de Caluwe
{(F,1)}] all constraints in C t and all constraints in the fuzzy sets C t of the object
schemes [id,t,M,Ct] that were defined for the supertypes t of t. By this,
inheritance has an impact on the specific constraints that has to be satisfied.
The set of all the instances of an object scheme os is denoted as Vosinstance,
whereas the set of all the persistent instances of os is written as Vosextent.
Obviously, Vosinstance Vtinstance and Vosextent Vtextent.
Example 4: With the object types TPerson and TEmployee presented in
Example 1 and the constraints c1,c2,,c9 presented in Example 3, the following
object schemes can be constructed:
OSPerson = [OSPerson,TPerson,scheme to represent persons,{(c2,1)}]
and
OSEmployee = [OSEmployee,TEmployee,scheme
employees,{(c 1,1),(c 3,0.7)}]
to
represent
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Database Model
The database model is finally obtained by extending the formalism with data
definition (DDL) and data manipulation operators (DML).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
32 de Tr & de Caluwe
The operators create_DB and drop_DB are meant to create and remove
a database and its database scheme.
The operators add_Char and drop_Char are meant to add and drop a
characteristic, i.e., a property or a method, in the object type of a given
object scheme in a given database scheme.
The operators add_OSC and drop_OSC are used to add and remove a
weighted constraint to or from a given object scheme in a given database
scheme.
The operators add_DBC and drop_DBC are meant to add and remove a
weighted constraint to or from a given database scheme.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The object type t' inherits all common characteristics of the types t1 and
t2, i.e., t' inherits from the supertype or from the common ancestor
type, and has no specific characteristics of its own.
The set of all instances Vos' instance of os' is constructed by preserving the objects
for which the state v is in the union (resp. intersection and difference) of the sets
of states of the instances of os1 and os2 and by calculating the associated EPTVs
by applying the logical operators ~, ~, and ~ for EPTVs (as presented in de
Tr, 2002).
The set of all the persistent instances of os' is defined to be empty, i.e., V os'
= .
extent
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
34 de Tr & de Caluwe
(Cartesian) product (): With the object schemes os1 = [id1,t 1,M1,Ct1]
and os 2 = [id2,t2,M2,Ct2], the binary (Cartesian) product operation (os 1,os2)
returns a new object scheme:
(os1,os2) = os' = [id',t',M',C t2 ]
where
The object type t' is constructed by merging the (inherited) characteristics of the types t1 and t2 of the given object schemes.
The fuzzy set of specific constraints C t' consists of all the single-type
dependent constraints (with associated membership grades) that were
defined for the characteristics of type t' and necessarily have to be an
element of Ct1, Ct2, or Ct, with t being an ancestor type of t1 or t2.
The set of all instances Vos' instance is constructed by calculating the Cartesian
product Vos1instance Vos2instance and merging the states of the objects of the
resulting pairs. The associated EPTVs are calculated by applying the logical
conjunction operator ~ for EPTVs.
Vos' extent =
Projection (): This operator is intended to select a number of characteristics from the (inherited) characteristics of the type of an object scheme
and the (inherited) characteristics of the object types that are binary related
to this type (via the partial association relation ). If {id1,id2,,id n} ID
is the set of the identifiers of the selected characteristics of the type t of a
given object scheme os = [id,t,M,C t ], then the operation
(os,{id 1,id 2,,idn}) results in a new object scheme:
(os,{id 1,id 2,,idn}) = os' = [id',t',M',Ct' ]
where
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Because values for derived properties are not stored in the database, the set
of all instances Vos' instance equals Vosinstance.
Vos' extent =
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
36 de Tr & de Caluwe
The set of all instances Vos' instance consists of all instances of Vosinstance for which
the extra condition that is imposed by constraint c is satisfied [with an EPTV that
differs from {(F,1)}].
Vos' extent =
The set of all instances Vos' instance consists of all instances o of Vosinstance for which
the threshold restriction:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Illustrative Example
As an illustration of the flexible querying facilities of the presented database
model, consider the database scheme DSEmpl as presented in Example 5.
Example 6: With the employee database with database scheme DSEmpl,
consider the query:
Find the names and employee IDs of all young employees that are fluent in
Dutch and French (the criterion young is less important than the criterion
fluent in Dutch and French).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
38 de Tr & de Caluwe
if 0 x 30
if 30 < x < 50
young(x) = 0
if x 50
if 0 x 1
the set of all instances VOSResult instance consists of all instances that satisfy the
query conditions [with an EPTV that differs from {(F,1)}] and equals:
VOSResult instance = {[TResult,(EmployeeID ise ID25; Name ise Joe), {(T,0.4), (F,0.6)}]}
The EPTV {(T,0.4), (F,0.6)} was calculated as follows:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
40 de Tr & de Caluwe
extent
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
References
Alagi , S. (1997). The ODMG object model: Does it make sense? ACM
SIGPLAN Notices, 32(10), 253270.
Beaubouef, T., & Petry F. E. (2002). Uncertainty in OODB modeled by rough
sets. In Proceedings of the IPMU 2002 conference (Vol. III, pp. 1697
1703). Annecy, France.
Berzal, F., Marn, N., Pons, O., & Vila, M. A. (2003). FoodBi: Managing fuzzy
object-oriented data on top of the Java platform. In Proceedings of the
10th IFSA World Congress (pp. 384387). Istanbul, Turkey.
Blanco, I., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the objectoriented database model: Imprecision, uncertainty and fuzzy types. In
Proceedings of the IFSA/NAFIPS World Congress (pp. 23232328).
Vancouver, Canada.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
42 de Tr & de Caluwe
Bordogna, G., & Pasi, G. (eds.). (2000). Recent issues on fuzzy databases.
Heidelberg, Germany: Physica-Verlag.
Bordogna, G., Lucarella, D., & Pasi, G. (1994). A fuzzy object oriented data
model. In Proceedings of the Third IEEE International Conference on
Fuzzy Systems, FUZZ-IEEE94 (pp. 313318). Orlando, FL.
Bordogna, G., Pasi, G., & Lucarella, D. (1999). A fuzzy object-oriented data
model for managing vague and uncertain information. International
Journal of Intelligent Systems, 14(7), 623651.
Bordogna, G., Leporati, A., Lucarella, D., & Pasi, G. (2000). The fuzzy objectoriented database management system. In G. Bordogna, & G. Pasi (Eds.),
Recent issues on fuzzy databases (pp. 209236). Heidelberg, Germany:
Physica-Verlag.
Cattell, R. G. G., & Barry, D. (eds.). (2000). The object data standard: ODMG
3.0. San Francisco, CA: Morgan Kaufmann Publishers.
de Cooman, G. (1995). Towards a possibilistic logic. In D. Ruan (Ed.), Fuzzy set
theory and advanced mathematical applications (pp. 89133). Boston,
MA: Kluwer Academic Publishers.
de Cooman, G. (1999). From possibilistic information to Kleenes strong multivalued logics. In D. Dubois, E. P. Klement, & H. Prade (Eds.), Fuzzy sets,
logics and reasoning about knowledge (pp. 315323). Boston, MA:
Kluwer Academic Publishers.
de Tr, G. (2002). Extended possibilistic truth values. International Journal of
Intelligent Systems, 17, 427446.
de Tr, G., & de Baets, B. (2003). Aggregating constraint satisfaction degrees
expressed by possibilistic truth values. IEEE Transactions on Fuzzy
Systems, 11(3), 361368.
de Tr, G., & de Caluwe, R. (2000). The application of generalized constraints
to object-oriented database models. Mathware and Soft Computing,
VII(23), 245255.
de Tr, G., & de Caluwe, R. (2003). Modelling uncertainty in multimedia
database systems: An extended possibilistic approach. International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
11(1), 522.
de Tr, G., & de Caluwe, R. (2003a). Level-2 fuzzy sets and their usefulness in
object-oriented database modelling. Fuzzy Sets and Systems, 140, 2949.
de Tr, G., de Caluwe, R., & Van der Cruyssen, B. (2000). A generalised objectoriented database model. In G. Bordogna, & G. Pasi (Eds.), Recent issues
on fuzzy databases (pp. 155182). Heidelberg, Germany: Physica-Verlag.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
de Tr, G., de Caluwe, R., Hallez, A., & Verstraete, J. (2002). Fuzzy and
uncertain spatio-temporal database models: A constraint-based approach.
In Proceedings of the Ninth International Conference on Information
Processing and Management of Uncertainty in Knowledge-Based
Systems IPMU 2002 (pp. 17131720). Annecy, France.
Dubois, D., & Prade, H. (1988). Possibility theory. New York: Plenum Press.
Dubois, D., & Prade, H. (1997). The three semantics of fuzzy sets. Fuzzy Sets
and Systems, 90(2), 141150.
George, R. (1992). Uncertainty management issues in the object-oriented
database model. Ph.D. thesis, Tulane University, New Orleans, LA.
George, R., Yazici, A., Petry, F. E., & Buckles, B. P. (1997). Modeling
impreciseness and uncertainty in the object-oriented data model A
similarity-based approach. In R. de Caluwe (Ed.), Fuzzy and uncertain
object-oriented databases: Concepts and models (pp. 6395). Singapore:
World Scientific.
Gottwald, S. (1979). Set theory for fuzzy sets of higher level. Fuzzy Sets and
Systems, 2(2), 125151.
Kim, W. (1994). Observations on the ODMG-93 proposal for an object-oriented
database language. ACM SIGMOD Record, 23(1), 49.
Kuper, G., Libkin, L., & Paredaens, J. (Eds.). (2000). Constraint databases.
Berlin, Germany: Springer-Verlag.
Lausen, G., & Vossen, G. (1998). Models and languages of object-oriented
databases. Harlow, UK: Addison-Wesley.
Marn, N., Pons, O., & Vila, M. A. (2000). Fuzzy types: A new concept of type
for managing vague structures. International Journal of Intelligent
Systems, 15(11), 10611085.
Mouaddib, N., & Subtil, P. (1997). Management of uncertainty and vagueness
in databases: The FIRMS point of view. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, 5(4), 437457.
Na, S., & Park, S. (1997). Fuzzy object-oriented data model and fuzzy association algebra. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented
databases: Concepts and models (pp. 187206). Singapore: World
Scientific.
Prade, H. (1982). Possibility sets, fuzzy sets and their relation to Lukasiewicz
logic. In Proceedings of the 12th International Symposium on MultipleValued Logic (pp. 223227).
Rescher, N. (1969). Many-valued logic. New York: McGraw-Hill.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
44 de Tr & de Caluwe
Rocacher, D., & Connan, F. (1996). A fuzzy algebra for object oriented
databases. In Proceedings of the Fourth European Congress on
Intelligent Techniques and Soft Computing, EUFIT96 (Vol. 2, pp. 871
876). Aachen, Germany.
Rossazza, J. -P. (1990). Utilisation de hirarchies de classes floues pour la
reprsentation de connaissances imprcises et sujettes exception: le
systme SORCIER. Ph.D. thesis, Universit Paul Sebatier, Toulouse,
France.
Rossazza, J. -P., Dubois, D., & Prade, H. (1997). A hierarchical model of fuzzy
classes. In R. de Caluwe (Ed.), Fuzzy and uncertain object-oriented
databases: Concepts and models (pp. 2161). Singapore: World Scientific.
Shaw, G. M., & Zdonik, S. B. (1990). A query algebra for object-oriented
databases. In Proceedings of the Sixth International Conference on
Data Engineering, ICDE90 (pp. 154162). Los Angeles, CA.
Taivalsari, A. (1996). On the notion of inheritance. ACM Computing Surveys,
28(3), 438479.
Tanaka, K., Kobayashi, S., & Sakanoue, T. (1991). Uncertainty management in
object-oriented database systems. In D. Karagiannis (Ed.), Proceedings
of the International Conference on Database and Expert System
Applications, DEXA 1991 (pp. 251256). Berlin, Germany: SpringerVerlag.
Van Gyseghem, N. (1998). Imprecision and uncertainty in the UFO database
model. Journal of the American Society for Information Science, 49(3),
236252.
Zadeh, L. A. (1968). Probability measures of fuzzy events. Journal of Mathematical Analysis and Applications, 23, 421427.
Zadeh, L. A. (1975). The concept of linguistic variable and its application to
approximate reasoning (Parts I, II, and III). Information Sciences, 8, 199
251, 301357 ; 9, 4380.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets
and Systems, 1(1), 328.
Zadeh, L. A. (1986). Outline of a computational approach to meaning and
knowledge representation based on a concept of a generalized assignment
statement. In M. Thoma, & A. Wyner (Eds.), Proceedings of the
International Seminar on Artificial Intelligence and ManMachine
Systems (pp. 198211). Heidelberg, Germany: Springer.
Zadeh, L. A. (1996). Fuzzy logic = Computing with words. IEEE Transactions
on Fuzzy Systems, 4(2), 103111.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Endnotes
1
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter II
Abstract
Database systems have evolved from relational databases to those integrating
different modeling and computing paradigms, in particular, object
orientation and probabilistic reasoning. This chapter introduces an
extension of the probabilistic object base model by Eiter et al. (2001), using
fuzzy sets for representing and handling vague and imprecise values of
object attributes. A probabilistic interpretation of relations on fuzzy set
values is proposed to integrate them into that probability-based framework.
Then, the definitions of fuzzy-probabilistic object base schemas, instances,
and selection operation are presented. Other algebraic operations, namely,
projection, renaming, Cartesian product, join, intersection, union, and
difference of the probabilistic object base model are also adapted for its
fuzzy extension.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
For modeling real-world problems and constructing intelligent systems, the
integration of different methodologies and techniques has been the quest and
focus of significant interdisciplinary research effort. The advantages of such a
hybrid system are that the strengths of its partners are combined and are
complementary to each others weaknesses.
In particular, object orientation provides a hierarchical data abstraction scheme
and an information hiding and inheritance mechanism. Meanwhile, probability
theory and fuzzy logic provide measures and rules for representing and reasoning
with uncertainty and imprecision in the real world. Many uncertain and fuzzy
object-oriented models (e.g., George, Buckles, & Petry, 1993; Itzkovich &
Hawkes, 1994; Rossazza, Dubois, & Prade, 1997; Van Gyseghem & De Caluwe,
1997; Bordogna, Pasi, & Lucarella, 1999; Dubitzky et al., 1999; Yazici & George,
1999; Blanco et al., 2001; Cross, 2003) were proposed and developed. However,
only a few of them combine probability theory and fuzzy logic, in order to deal
with both uncertainty and imprecision.
Early works on fuzzy extension of object-oriented models were done by George,
Buckles, and Petry (1993) and Itzkovich and Hawkes (1994), which introduced
inclusion degrees between classes in a hierarchy. An inclusion degree of one
class to another could be computed on the basis of the fuzzy ranges of their
common attributes. For example, Rossazza, Dubois, and Prade (1997) defined
four inclusion degrees, depending on whether necessary ranges or typical ranges
were used for each of the two classes.
Arguing for flexible modeling, Van Gyseghem and De Caluwe (1997) introduced
the notion of fuzzy property as an intermediate between the two extreme notions
of required property and optional property. Each fuzzy property of a class was
associated with possibility degrees of applicability of the property to the class.
Meanwhile, Yazici and George (1999) presented a deductive fuzzy objectoriented model but did not address uncertain applicability of properties. A general
data model including fuzzy attribute values as well as uncertain properties was
proposed by Bordogna, Pasi, and Lucarella (1999), where the treatment of
uncertainty was, however, based on possibility theory rather than on probability
theory.
As a first attempt to integrate both probabilistic and fuzzy measures into an
object-oriented model, Dubitzky et al. (1999) assumed that each property of a
concept had a probability degree for it occurring in exemplars of that concept.
However, the method therein for computing a membership degree of an object
to a concept, based on matching the objects properties with the uncertainty
applicable properties of the concept, is in our view not justifiable. Also, the work
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
did not address the problem of how inheritance is performed under the membership and applicability uncertainty.
Recently, Blanco et al. (2001) and De Tr (2001) sketched out general models
to manage different sources of imprecision and uncertainty, including probabilistic ones, on various levels of an object-oriented database model. However, no
foundation was laid to integrate probability theory and fuzzy logic, in case
probability was used to represent uncertainty. Later, Cross (2003) reviewed
existing proposals and presented recommendations for the application of fuzzy
set theory in a flexible generalized object model.
Meanwhile, Cao (2001), Cao et al. (2002), and Cao and Rossiter (2003)
introduced a logic-based fuzzy and probabilistic object-oriented model, which
could represent and handle fuzzy attribute values as well as uncertain class
properties. Mass assignment theory (Baldwin, Martin, & Pilsworth, 1995;
Baldwin, Lawry, & Martin, 1996) was employed to compute with fuzzy sets and
probabilities in an integrated framework. Nevertheless, the definition of class
hierarchies in that model was crisp, that is, no uncertainty was considered on
class links.
In another direction, Eiter et al. (2001) developed algebra to handle object bases
with uncertainty, called POBs, where the conditional probability for an object of
a class belonging to one of its subclasses was specified in the class hierarchy of
discourse. Also, for each attribute of an object, uncertainty about its value was
represented by lower-bound and upper-bound probability distribution functions
over a set of values.
However, the major shortcoming of the POB model is that it does not allow vague
and imprecise attribute values. For instance, in the Plant example therein, the
values of the attribute sun are chosen to be only enumerated symbols, such as
mild, medium, and heavy, without any interpretation. Meanwhile, in practice,
those values are inherently vague and imprecise over degrees of sunlight.
Moreover, without an interpretation, they cannot be measured, and their probability distributions cannot be calculated.
Because fuzzy set theory and fuzzy logic provide a basis for defining the
semantics of, and computing with, linguistic terms (Zadeh, 1978), we apply them
to extend the POB model to allow vague and imprecise attribute values. For
instance, the values mild, medium, and heavy of the attribute sun in the
aforementioned Plant example can be defined by fuzzy sets. Primary results of
this extension were presented by Cao and Nguyen (2002).
In this chapter, the second section presents fundamentals of probability and fuzzy
set theories and, in particular, introduces a probabilistic interpretation of relations
on fuzzy sets to integrate them into the probability-based framework of POBs.
Then, the third, fourth, fifth, and sixth sections present a fuzzy extension and
generalization of the definitions of POB schemas, instances, and algebraic
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
operations for fuzzy POBs (FPOBs). Finally, the last section concludes the
chapter and suggests further work.
P1
P2
P3
P4
P5
5
6
P6
P7
P8
P9
P10
Scores
1
2
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
That is, all voters, P1 to P10, vote for value 6 as a high score, while only two of
them, P1 and P2, vote for 3 as a high score, and so on. In other words, the crisp
definition of P10 for the high score is {6}, while that of P1 and P2 is {3, 4, 5, 6},
for instance. An assumption made in this voting model is that any person who
accepts a value as a high score also accepts all values that have higher
membership grades in the fuzzy set high.
This model defines the following mass assignment (i.e., probability distribution)
on the power set of {1, 2, 3, 4, 5, 6}:
{6}:0.1 {5, 6}:0.4 {4, 5, 6}:0.3 {3, 4, 5, 6}:0.2
where the mass (i.e., probability value) assigned to a subset of {1, 2, 3, 4, 5, 6}
[e.g., m high({5, 6}) = 0.4] is the proportion of voters who have that subset as a
crisp definition for the fuzzy concept high score. This mass assignment
corresponds to a family of probability distributions on {1, 2, 3, 4, 5, 6}.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
0.0 0.7 0.1 + 1/2 0.7 0.4 + 1/3 0.7 0.3 + 1/4 0.7 0.2 +
1/3 0.3 0.1 + 1/3 0.3 0.4 + 1/3 0.3 0.3 + 1/4 0.3 0.2
= 0.34
Definition 2. Let A and B be two fuzzy sets on a domain U. The probabilistic
interpretation of the relation A B, denoted by prob(A B), is a value in
[0, 1] that is defined by S,T UPr(u T | u S).mA(S).mB(T).
The intuitive meaning of prob(A B) is that it is the probability for x B being
true given x A being true. In other words, it is the fuzzy conditional probability
of x B given x A as defined by Baldwin, Martin, and Pilsworth (1995). We
note that the above probabilistic interpretation can also be adapted for fuzzy sets
on continuous domains, using integration instead of addition, as in the definition
of fuzzy conditional probability (Baldwin, Lawry, & Martin, 1996) as follows:
11
prob( A B) =
00
Pr ( xA y B )
Pr ( xA)
1 1 x A y B
dxdy =
00
dxdy
A
where x A and y B are -cuts of the fuzzy sets A and B with = x and = y,
respectively. We also define prob(A B) = prob(B A).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
0.0 0.1 0.7 + 1/2 0.4 0.7 + 1/3 0.3 0.7 + 1/4 0.2 0.7 +
1.0 0.1 0.3 + 1.0 0.4 0.3 + 1.0 0.3 0.3 + 3/4 0.2 0.3
0.53
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Operators
Ignorance
Independence
or e2 implies e1)
Mutual exclusion
mutually exclusive)
A set of functions for the user to specify how probabilities are distributed
over the domain of values of attributes.
A set of fuzzy sets for the user to express vague and imprecise values of
attributes.
GUI
FPOB
Calculus
Query
FPOB-Algebra
Query Manager
FPOB
Algebra
Query
FPOB-Algebra
Execution Engine
probabilistic distributions
FPOB
fuzzy sets
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
0.6
ANNUALS
0.4
@
0.4
PERENNIALS
@
VEGETABLES
0.8
ANNUALS_HERBS
0.3
0.2
HERBS
0.3
0.4
FLOWERS
0.3
PERENNIALS_FLOWERS
For FPOBs, we use the same definition of class hierarchy as that used for POBs.
Figure 2 shows an example POB hierarchy of plants given by Eiter et al. (2001),
which are classified as being either perennials or annuals and, alternatively, as
being vegetables, herbs, or flowers. Those subclasses of a class that are
connected to a d node are mutually disjoint (i.e., an object cannot belong to any
two of them at the same time), and they form a cluster of that class. In this
example, the class PLANTS has two clusters, namely, {ANNUALS, PERENNIALS} and
{VEGETABLES , HERBS , FLOWERS }.
The value in [0, 1] associated with the link between a class and one of its
immediate subclasses represents the probability for an object of the class
belonging to that subclass. For instance, the hierarchy says 60% of plants are
annuals, while the rest (40%) are perennials. Also, ANNUALS _HERBS is a common
subclass of ANNUALS and HERBS, where ANNUALS_HERBS constitute 40% and 80%
of annuals and herbs, respectively.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
3.
Example 3: In the Plant example above, the attributes can be soil, sun, water,
which describe the conditions for a plant to grow, and name, size, width, and
height. Some atomic types can be integer, real, string, and soil-type. Some
fuzzy set and tuple types can be {real}, [soil: soil-type, sun: {real}, water:
integer], and [name: string, size: [height: integer, width: integer]].
Each type has a domain of its values as defined below (cf., Eiter et al., 2001).
Definition 4. Let every atomic type T be associated with a domain dom().
Then values are defined by induction as follows:
1.
2.
3.
1 if x [0, 5]
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
medium( x) =
0.2(15 x ) + 1 if x [15, 20]
0 otherwise
Then, [soil: swampy, sun: mild, water: 3] is a value of the type [soil: soil-type,
sun: {real}, water: integer].
In POBs, for each attribute of an object there can be uncertainty about its value
measured by lower-bound and upper-bound probability distribution functions
over a set of values. For FPOBs, we adapt the definition of probabilistic tuple
values for POBs to represent that uncertain information for fuzzy set values as
well.
Definition 5. Let A1, A2, , Ak be pairwise different attributes from A and, for
each i from 1 to k, Vi be a finite set of values of type i, and i, i be probability
distribution functions over Vi. Then ptv = [A1: V1, 1, 1, A2: V2, 2, 2, , Ak:
Vk, k, k] is a fuzzy-probabilistic tuple value of type [A1: 1, A2: 2, , Ak: k]
over {A1, A2, , Ak}. One writes ptv.Ai to denote Vi, i, i.
Example 5: Assume we know that the soil type of a thyme plant is loamy.
However, we are not sure whether the plant is French thyme, Silver thyme, or
Wooly thyme, with the same probability between 0.2 and 0.6 for each category.
mild
medium
10
15
20
sunlight degrees
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
FPOB Schemas
FPOB schemas are now defined the same as POB schemas, as follows:
Definition 6. An FPOB schema is a quintuple (C, , , me, p), where:
1.
2.
maps each class to a tuple type (c) representing the attributes and their
types of that class.
3.
4.
5.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(c)
PLANTS
ANNUALS
PERENNIALS
VEGETABLES
HERBS
FLOWERS
ANNUALS_HERBS
PERENNIALS_FLOWERS
2.
3.
4.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(c)
|(c)|
PLANTS
O1 O2 O10
800
ANNUALS
O1 O2 O 3 O 4 O 5
480
PERENNIALS
O6 O7 O8 O9 O10
320
VEGETABLES
O1 O9
160
HERBS
O2 O5 O 6
240
FLOWERS
O3 O7 O10
320
ANNUALS_HERBS
O5
192
PERENNIALS_FLOWERS
O10
96
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
FPOB Instances
As for POBs, given an FPOB schema, an FPOB instance is defined as a base
of objects associated with their classes and fuzzy-probabilistic tuple values in
accordance with the schema. The following definition is adapted from that of
POBs.
Definition 8. Let S = (C
C , , , me, p) be an FPOB schema and O be a set of
object identifiers. An FPOB instance over S is a pair (, ) where:
1.
2.
We note that, in the definition above, (c) denotes only the set of the identifiers
of the objects that are defined in the class c. Meanwhile, the set of the identifiers
of all the objects that belong to c (i.e., those that are defined in c or its proper
subclasses) are denoted by *(c) = {(d) | d C and d * c}. Also, one writes
C ) to denote {(c) | c C}.
(C
Example 8: An FPOB instance over the FPOB schema in Example 6 can be (, ),
where and * are shown in Table 5 and in Table 6 (cf., Eiter et al., 2001).
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
3.
(c)
(c)
PLANTS
{o1}
ANNUALS
{}
PERENNIALS
{}
{o4}
VEGETABLES
{}
{}
HERBS
{}
FLOWERS
{}
{o4}
ANNUALS_HERBS
PERENNIALS_FLOWERS
{o4}
{o4}
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(oid)
o1
o2
o3
o4
o5
o6
o7
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
an FPOB schema S is another FPOB instance I' over S such that the objects of
the classes in I' and their attribute values satisfy the selection condition of the
query.
Before defining the FPOB selection operation, we present the formal syntax and
semantics of selection conditions. We start with the syntax of path expressions
and selection expressions. The following definition of path expressions is given
by Eiter et al. (2001).
Definition 10. Given a type = [A1: 1, A2: 2, , Ak: k], path expressions are
inductively defined for every i from 1 to k as follows:
1.
2.
Example 10: Given the types in Example 3, name, size.height, and size.width are
path expressions for the type [name: string, size: [height: integer, width:
integer]].
For selection expressions on FPOBs, we generalize the binary relations in
selection expressions on POBs to the fuzzy ones, and add in the implication
relation on fuzzy set values, as in the following definition.
Definition 11. Let S = (C
C , , , me, p) be an FPOB schema and X be a set of
object variables. Then fuzzy selection expressions are inductively defined as
having one of the following forms:
1.
2.
x c, where x X and c C .
x.P v, where x X , P is a path expression, is a binary relation from
{=, , , , <, >, , , , , , }, and v is a value.
3.
4.
5.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Those of the first three forms are called atomic fuzzy selection expressions.
Different probabilistic conjunction and disjunction strategies are given by Eiter
et al. (2001).
Example 11: In the Plant example above, the selection of all objects that require
a very mild sun can be done using the atomic expression:
x.sun very mild
where very mild is also a linguistic label of a fuzzy set on dom(real).
Meanwhile, the selection of all objects that require a very mild sun or over 21
units of daily water can be expressed by the query:
x.sun very mild x.water > 21
Selection conditions are now defined as selection expressions to be satisfied with
a probability in a given interval, as for POBs.
Definition 12. Fuzzy selection conditions are inductively defined as follows:
1.
2.
Example 12: In the Plant example, the selection of all objects that require a very
mild sun with a probability of at least 0.4 and over 21 units of daily water with
a probability of at least 0.8 can be done using the following selection condition:
(x.sun very mild)[0.4, 1] (x.water > 21)[0.8, 1]
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Definition 13. Given a type = [A1: 1, A2: 2, , Ak: k] and a value v = [A1: v1,
A2: v2, , Ak: vk], the interpretation of a path expression P for under v, denoted
by v.P, is inductively defined as follows:
1.
2.
Example 13: In the Plant example, the interpretations of the path expressions
name, size.height, and size.width under the value [name: Thyme, size: [height: 4,
width: 12]] are the values Thyme, 4, and 12, respectively.
Definition 14. Let S = (C
C, , , me, p) be an FPOB schema, I = (, ) be an
C ). The probabilistic interpretation with
FPOB instance over S, and o (C
respect to S, I, and o, denoted by probS,I,o, is the partial mapping from the set of
all fuzzy selection expressions to the set of all closed subintervals of [0, 1] that
is inductively defined as follows:
1.
2.
3.
probS,I,o(x.P1 = x.P2)
= [uV(u).prob(u1.P1' = u2.P2'), min(1, uV(u).prob(u1.P1' = u2.P2'))],
where P1 = A1.P1' , (o).A1 = V1, 1, 1, P2 = A2.P2', (o).A2 = V2, 2, 2,
and [(u), (u)] = [ 1(u 1), 1 (u 1)][ 2(u 2), 2(u 2)] for all u = (u 1, u 2)
V = V 1 V 2.
4.
probS,I,o( ) = probS,I,o()probS,I,o().
5.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Example 14: For the FPOB instance in Example 8 and the fuzzy sets defining
mild and medium as in Example 4, one has:
probS,I,o2(x ANNUALS_HERBS) = [1, 1]
probS,I,o2(x.water > 21) = [9/11, 9/11] = [0.82, 0.82]
Meanwhile:
probS,I,o2(x.sun mild)
=
[0.8 1/2 0.903 + 0.8 1/2 0.068, min(1, 1.2 1/2 0.903 + 1.2 1/2 0.068)]
because:
1 1
mild y mild
x
0 0
1 1
=
0 0
[0,10 5 x] [0,10 5 y ]
dxdy = 0.903
[0,10 5 x]
1 1
dxdy
mild
0 0
1 1
=
0 0
medium y mild
x
dxdy
medium
[5 + 5 x, 20 5 x] [0,10 5 y ]
[5 + 5 x, 20 5 x]
dxdy = 0.068
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
probS,I,o (x.sun
probS,I,o
ANNUALS_HERBS)
mild)
o1
[0.24, 0.24]
Undefined
[1.00, 1.00]
o2
[1.00, 1.00]
[0.39, 0.59]
[0.82, 0.82]
o3
[1.00, 1.00]
[0.90, 0.90]
[0.00, 0.00]
o4
[0.00, 0.00]
[0.90, 0.90]
[0.67, 0.67]
o5
[1.00, 1.00]
[0.39, 0.59]
[0.67, 0.67]
o6
[1.00, 1.00]
[0.90, 0.90]
[0.00, 0.00]
o7
[1.00, 1.00]
[0.90, 0.90]
[0.00, 0.00]
oid
The following definitions are adapted from Eiter et al. (2001) for fuzzy selection
conditions in FPOBs.
Definition 15. Let S = (C
C, , , me, p) be an FPOB schema, I = (, ) be an
C ). The satisfaction of fuzzy selection
FPOB instance over S, and o(C
conditions under probS,I,o is defined as follows:
1.
2.
3.
4.
Example 15: In the Plant example above, using the independence probabilistic
conjunction strategy, one has:
probS,I,o2(xANNUALS_HERBS in x.sun mild)
=
and
probS,I,o2(x.sun mild in x.water > 21)
=
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
probS,I,o
(x ANNUALS_HERBS in
(x.sun mild in
x.sun mild)
o1
Undefined
Undefined
o2
[0.39, 0.59]
[0.32, 0.48]
o3
[0.90, 0.90]
[0.00, 0.00]
o4
[0.00, 0.00]
[0.61, 0.61]
o5
[0.39, 0.59]
[0.26, 0.40]
o6
[0.90, 0.90]
[0.00, 0.00]
o7
[0.90, 0.90]
[0.00, 0.00]
oid
therefore:
prob S, I,o2 | (x
prob
S, I,o2
ANNUALS_ HERBS
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
'(o) = A((o)) obtained from (o) = [B1: V1, 1, 1,, Bk: Vk, k, k]
C ).
by deleting all Bj: Vj, j, j with Bj A, for all o (C
'(o id)
o1
o2
[nam e: {Cu ban -Ba sil, L em on -Ba sil}, u , u, w ater: {20 , ,3 0}, u, u ]
o3
o4
o5
o6
o7
where B = B1, B2,, Bm is a list of distinct attributes from A, and C = C1, C2,,C m
is a list of distinct attributes from A - A.
Definition 19. Let I = (, ) be an FPOB instance over an FPOB schema
C , , , me, p) and N be a renaming expression. The renaming in I with
S = (C
respect to N, denoted by N(I), is I' = (', ') over the FPOB schema N(S)
where:
1.
C , ', , me, p) such that, for all c C , '(c) is obtained from (c)
N(S) = (C
= [A1: 1,, Ak: k] by replacing each attribute Aj = Bi for some i {1, 2, ..., m}
by the new attribute Ci.
2.
3.
'(o) = N((o)) obtained from (o) = [A1: V1, 1, 1,, Ak: Vk, k, k]
by replacing each attribute Aj = Bi for some i {1, 2, ..., m} by the new
C ).
attribute Ci, for all o (C
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
'(oid)
o1
o2
o3
o4
o5
o6
o7
Example 18: Let I be the FPOB instance computed in Example 17. Then the
renaming in I with respect to the renaming expression name, water name2,
water2 is the FPOB instance I' = (', '), where ' = , and ' is given in
Table 10.
Cartesian Product
We recall that, in relational databases, the Cartesian product of two relations is
a new relation consisting of all tuples that are obtained by concatenating a tuple
in the first relation with a tuple in the second relation. Similarly, the Cartesian
product of two FPOBs should be a new one such that the property list of each
object is obtained by concatenating the property list of an object in the first FPOB
instance with the property list of an object in the second FPOB instance.
Meanwhile, in relational algebra, the Cartesian product of two relational schemas
is defined only if their sets of attributes are disjoint. Thus, in FPOB algebra, we
define the Cartesian product only for two FPOB schemas that do not have any
common top-level attribute.
Also, the Cartesian product operation on both schemas and relations is commuC 1, 1 , , me 1, p 1)
tative. For FPOB algebra, given two FPOB schemas S 1 = (C
C 2 , 2, , me 2, p 2), that should mean S 1 S 2 = S2 S1, which implies
and S 2 = (C
C 2 C1 = C 1 C 2. The latter is achieved by using the following assumption.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
C , , , me, p),
Assumption 1. It is assumed that for each FPOB schema S = (C
the set of classes C is a classical relation over a classical relation schema R(S)
= {A1, A2,, Am} associated with S. That is, each class c C is considered as
a tuple over R(S).
C 2, 2, ,
Definition 20. The FPOB schemas S1 = (C
C 1, 1, , me1, p1) and S2 = (C
me2, p2) are Cartesian product-compatible if and only if R(S1) and R(S2) are
disjoint.
C 1, 1, 1, me1, p1) and S2 = (C
Definition 21. Let S1 = (C
C 2, 2, 2, me2, p2) be two
Cartesian product-compatible FPOB schemas, and R1 = R(S1) and R2 = R(S2).
The Cartesian product of S1 and S2, denoted by S1 S2, is the FPOB schema
C, , , me, p) such that:
S = (C
1.
C = C 1 C 2.
2.
For all classes c C, (c[R1], c[R2]) = [A1: 1,, Ak: k, Ak+1: k+1,, Ak+m:
k+m], where 1(c[R1]) = [A1 : 1,, Ak : k] and 2(c[R2]) = [Ak+1: k+1,, Ak+m:
k+m].
3.
4.
5.
Example 19: Let S1 and S2 be the FPOB schemas of the FPOB instances
computed in Examples 17 and 18, respectively. Then the Cartesian product S1 S2
C , , , me, p) is given as follows:
= (C
A partial view on C , me, and p is illustrated in Figure 4.
C.
(c) = [name: string, water: integer, name2: string, water2: integer] for every cC
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
d
0.6
(pl,an)
0.4
0.2
(pl,pe)
(pl,ve)
....
(an,he)
0.6
0.4
0.3
(pl,he)
(pl,fl)
0.2
0.4
(an,pl)
(pe,pl)
(ve,pl)
0.3
0.4
(he,pl)
(fl,pl)
d
0.6
0.2
0.4
(pe,he)
(ve,he)
0.3
(he,he)
0.4
(fl,he)
0.4
0.8
(ah,pl)
0.3 0.3
(pf,pl)
...
Definition 22. Let I1 = (1, 1) and I2 = (2, 2) be two FPOB instances over
C 1, 1, 1, me1, p1) and
the Cartesian product-compatible FPOB schemas S1 = (C
C 2, 2, 2, me2, p2), respectively, and let R1 = R(S1) and R2 = R(S2). The
S2 = (C
Cartesian product of I1 and I2, denoted by I1 I2, is defined as the FPOB
instance (, ) over the FPOB schema S = S1 S2, where:
1.
2.
Example 20: Let I1 and I2 be the FPOB instances computed in Examples 17 and
18, respectively. Then the Cartesian product I1 I2 = (, ), where , are given
in Tables 11 and 12.
Table 11. Resulting from Cartesian product (partial view)
c
(c)
(pl, pl)
{(o1, o1)}
(an, pl)
{}
(ah, pl)
{(o2, o1), (o3, o1), (o5, o1), (o6, o1), (o7, o1)}
(pf, pl)
{(o4, o1)}
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(oid)
(o1, o1)
(o2, o1)
(o3, o1)
Join
In relational databases, the join operation is a generalization of the Cartesian
product operation. That is, in the join of two relations, the value of an attribute
of a tuple in the first relation and the value of the same attribute, if any, in the
second relation are combined. For that combination, the types of such a common
attribute name in both relations must be identical as defined below for FPOBs.
Definition 23. The FPOB schemas S1 = (C
C 1, 1, , me1, p1) and S2 = (C
C 2, 2, ,
me2, p2) are join-compatible iff R(S1) and R(S2) are disjoint and, for all classes
c1 C1 and c2 C 2, if an attribute A is defined for both 1(c1) and 2(c2) then
1(c1).A = 2(c2).A.
C1, 1, , me1, p1) and S2 = (C
C 2, 2, , me2, p2) be two
Definition 24. Let S1 = (C
join-compatible FPOB schemas, and R1 = R(S1) and R2 = R(S2). The join of S1 and
probabilistic conjunction strategy , denoted by ptv1><ptv2 is the fuzzyprobabilistic tuple value ptv over A1 A2 defined by the following:
ptv.A = ptv1.A for all attributes A A1 - A2
ptv.A = ptv2.A for all attributes A A2 - A1
ptv.A = ptv1.Aptv2.A for all attributes A A 1 A2
We are now ready to define the join of two FPOB instances as follows.
Definition 27. Let I1 = (1, 1) and I2 = (2, 2) be two FPOB instances over
C1, 1, , me 1, p1) and S2 = (C
C 2, 2, ,
the join-compatible FPOB schemas S1 = (C
me2, p2), and A 1 and A2 be the sets of top-level attributes of S1 and S2,
respectively, and let R1 = R(S1) and R2 = R(S2). The join of I1 and I2 under a
probabilistic conjunction strategy , denoted by I1><I2, is defined as the
Example 21: Let I1 be the FPOB instance in Example 17 and I2 be the renaming
in I with respect to the renaming expression water2 water. Then the join of I1
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(c)
(pl, pl)
{(o1, o1)}
(an, pl)
{}
(ah, pl)
(pf, pl)
{(o4, o1)}
(oid)
(o1, o1)
(o2, o1)
(o5, o1)
(o4, o1)
object in the two instances. First, the intersection of two fuzzy-probabilistic tuple
values is defined as follows.
Definition 28. Let ptv1 and ptv2 be two fuzzy-probabilistic tuple values over the
same set of attributes A. The intersection of ptv 1 and ptv 2 under a probabilistic
conjunction strategy , denoted by ptv1 ptv2 is the fuzzy-probabilistic tuple
value over A defined by ptv.A = ptv 1.Aptv2.A for all attributes A A.
Definition 29. Let I1 = (1, 1) and I2 = (2, 2) be two FPOB instances over
C , , , me, p). The intersection of I1 and I2 under
the same FPOB schema S = (C
a probabilistic conjunction strategy , denoted by I1 I2, is the FPOB instance
(, ) over the S, where:
1.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1(c)
2(c)
(c)
PLANTS
{o1}
{o1}
{o1}
ANNUALS
{}
{}
{}
PERENNIALS
{}
{}
{}
VEGETABLES
{}
{}
{}
HERBS
{}
{}
{}
FLOWERS
{}
{}
{}
ANNUALS_HERBS
{o2, o3}
{o5}
{}
PERENNIALS_FLOWERS
{o4}
{o4}
{o4}
(oid)
o1
o2
o3
o4
(oid)
o1
o4
o5
(oid)
o1
o4
[name: {Aster, Salvia}, 0.5u, 0.5u, soil: {loamy, sandy}, 0.18u, 0.98u,
water: {20,, 25}, u/6, u/6, sun: {mild}, u, u, expyears: {2, 3, 4},
0.12u, 1.08u, category: {french, silver, wooly}, 0.12u, 1.08u]
The union and difference operations are then defined similarly, on the basis of
the union and difference operations on fuzzy-probabilistic triples and tuple
values.
Definition 30. Let pt 1 = V1, 1, 1 and pt 2 = V2, 2, 2 be two fuzzyprobabilistic triples, and be a probabilistic disjunction strategy. Then pt 1pt2
is the fuzzy-probabilistic triple V, , defined as follows:
V = V 1 V 2.
[ (v ), (v )] =
1 (v ),
1 (v )]
if v V1
V2
2 (v ),
2 (v )]
if v V2
V1
1 (v ),
1 (v )]
2 (v ),
2 (v )]
if v V1 V2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Definition 31. Let ptv1 and ptv2 be two fuzzy-probabilistic tuple values over the
same set of attributes A. The union of ptv1 and ptv2 under a probabilistic
disjunction strategy , denoted by ptv1 ptv2 is the fuzzy-probabilitsic tuple
value over A defined by ptv.A = ptv1.Aptv 2.A for all attributes A A.
Definition 32. Let I1 = (1, 1) and I2 = (2, 2) be two FPOB instances over
C, , , me, p). The union of I1 and I2 under a
the same FPOB schema S = (C
probabilistic conjunction strategy , denoted by I1 I2, is the FPOB instance (,
) over the S, where:
1.
2.
(o) =
1 (o)
if o
1 (C )
2 (C )
2 (o)
if o
2 (C )
1 (C )
1 (o)
2 (o)
for every o
if o
1 (C )
2 (C )
(C ).
Example 23: Let S be the FPOB schema in Example 6, and I1 = (1, 1) and I2
= (2, 2) be the FPOB instances on S in Example 22. Then the union of I1 and
I2 under the ignorance probabilistic disjunction strategy is I1 ig I2 = (, ),
where is given in Table 19 and in Table 20.
Table 19. Resulting from union
C
(c)
PLANTS
{o1}
ANNUALS
{}
PERENNIALS
{}
VEGETABLES
{}
HERBS
{}
FLOWERS
{}
ANNUALS_ HERBS
PERENNIALS_ FLOWERS
{o4}
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(oid)
o1
o2
o3
o4
[name: {Aster, Salvia}, u, 2u, soil: {loamy, sandy}, 0.6u, 2u, water:
{20,, 25}, u, 2u, sun: {mild}, u, u, expyears: {2, 3, 4}, 0.6u, 3u,
category: {french, silver, wooly}, 0.6u, 3u]
o5
Definition 33. Let pt 1 = V1, 1, 1 and pt 2 = V2, 2, 2 be two fuzzyprobabilistic triples, and be a probabilistic difference strategy. Then pt 1pt2
is the fuzzy-probabilistic triple V, , defined as follows:
V = V1 - {v V1 V2 | [1(v), 1(v)][2(v), 2(v)] = [0, 0]}.
if v V V2
[ 1(v),
1(v)]
[ 1(v),
1(v)]
[ (v), (v)] =
[ 2(v),
2(v)]
if v V V2.
Definition 34. Let ptv1 and ptv2 be two fuzzy-probabilistic tuple values over the
same set of attributes A. The difference of ptv1 and ptv2 under a probabilistic
difference strategy , denoted by ptv1-ptv2 is the fuzzy-probabilistic tuple value
over A defined by ptv.A = ptv 1.Aptv2.A for all attributes AA.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Definition 35. Let I1 = (1, 1) and I2 = (2, 2) be two FPOB instances over
C , , , me, p), and A be the sets of top-level
the same FPOB schema S = (C
attributes of S. The difference of I1 and I2 under a probabilistic difference
strategy , denoted by I1-I2, is the FPOB instance (, ) over the S, where:
1.
C ) 2(C
C ) | (1(o)-2(o)).A = , _, _ for some
(c) = 1(c) - {o 1(C
A A}
for every c C.
1 (o)
2.
if o
1 (C )
2 (C )
(o) =
1 (o)
2 (o)
for every o
if o
1 (C )
2 (C ).
(C ).
Example 24: Let S be the FPOB schema in Example 6, and I 1 = (1, 1) and
I 2 = (2, 2) be the FPOB instances on S in Example 22. Consider I2' = (2, 2')
that is different from I2 only in 2'(o1).soil = {loamy, sandy}, u, u. Then the
difference of I1 and I2' under the independence probabilistic difference strategy
is I1-inI2' = (, ), where is given in Table 21 and in Table 22.
We note that o4 (PERENNIALS_FLOWERS ) because (o4).sun = , _, _.
Table 21. Resulting from difference
C
(c)
PLANTS
{o1}
ANNUALS
{}
PERENNIALS
{}
VEGETABLES
{}
HERBS
{}
FLOWERS
{}
ANNUALS_HERBS
{o2, o3}
PERENNIALS_FLOWERS
{}
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(oid)
o1
o2
o3
Conclusion
We presented an extension of the POB model with vague and imprecise values.
In order to integrate fuzzy set values into the probabilistic framework of POBs,
we employed a probability-based voting model of fuzzy sets and introduced a
probabilistic interpretation of relations on them. The definitions of FPOB
schemas, instances, and algebraic operations were then presented, generalizing
those of POBs. The obtained algebra provides a formal basis for development
of fuzzy and probabilistic object bases, as relational algebra does for relational
databases. A prototype of this model was demonstrated, and we are investigating
its full-scale implementation to be applied to build object bases for real-world
problems.
References
Baldwin, J. M., Lawry, J., & Martin, T. P. (1996). A note on probability/
possibility consistency for fuzzy events. In Proceedings of the Sixth
International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (pp. 521525).
Granada, Spain.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Baldwin, J. F., Martin, T. P., & Pilsworth, B. W. (1995). Fril Fuzzy and
evidential reasoning in artificial intelligence. Taunton: Research Studies Press/John Wiley.
Bertino, E., & Martino, L. (1993). Object-oriented database systems: Concepts and architectures. Reading, MA: Addison-Wesley.
Blanco, I., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the objectoriented database model: Imprecision, uncertainty and fuzzy types. In
Proceedings of the First International Joint Conference of the International Fuzzy Systems Association and the North American Fuzzy
Information Processing Society (pp. 23232328). Vancouver, Canada.
Bordogna, G., Pasi, G., & Lucarella, D. (1999). A fuzzy object-oriented data
model managing vague and uncertain information. International Journal
of Intelligent Systems, 14, 623651.
Cao, T. H. (2001). Uncertain inheritance and recognition as probabilistic default
reasoning. International Journal of Intelligent Systems, 16, 781803.
Cao, T. H., & Nguyen, H. (2002). Towards fuzzy and probabilistic object bases.
In Proceedings of the Third International Conference on Intelligent
Technologies and the Third VietnamJapan Symposium on Fuzzy
Systems and Application (pp. 3541). Hanoi, Vietnam.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy objectoriented database language. Fuzzy Sets and Systems, 140, 129150.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2002). On the
implementation of Fril++ for object-oriented logic programming with uncertainty and fuzziness. In Bouchon-Meunier, B. et al. (Eds.), Technologies
for Constructing Intelligent Systems, Studies in Fuzziness and Soft
Computing (vol. 90, pp. 393406). Heidelberg: Physica-Verlag.
Cross, V. V. (2003). Defining fuzzy relationships in object models: Abstraction
and interpretation. International Journal of Fuzzy Sets and Systems,
140, 527.
De Tr, G. (2001). An algebra for querying a constraint defined fuzzy and
uncertain object-oriented database model. In Proceedings of the First
International Joint Conference of the International Fuzzy Systems
Association and the North American Fuzzy Information Processing
Society (pp. 21382143). Vancouver, Canada.
Dubitzky, W., Bchner, A. G., Hughes, J. G., & Bell, D. A. (1999). Towards
concept-oriented databases. Data & Knowledge Engineering, 30, 2355.
Eiter, T., Lu, J. J., Lukasiewicz, T., & Subrahmanian, V. S. (2001). Probabilistic
object bases. ACM Transactions on Database Systems, 26, 264312.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter III
Generalization
Data Mining in
Fuzzy Object-Oriented
Databases
Rafal Angryk
Tulane University, USA
Roy Ladner
Naval Research Laboratory, USA
Frederick E. Petry
Tulane University & Naval Research Laboratory, USA
Abstract
In this chapter, we consider the application of generalization-based data
mining to fuzzy similarity-based object-oriented databases (OODBs).
Attribute generalization algorithms have been most commonly applied to
relational databases, and we extend these approaches. A key aspect of
generalization data mining is the use of a concept hierarchy. The objects
of the database are generalized by replacing specific attribute values by
the next higher-level term in the hierarchy. This will then eventually result
in generalizations that represent a summarization of the information in the
database. We focus on the generalization of similarity-based simple fuzzy
attributes for an OODB using approaches to the fuzzy concept hierarchy
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
Data mining and knowledge discovery have increasing importance as the amount
of data from various sources has rapidly increased. Awash in such volumes of
data, data mining techniques attempt to make sense of this data by formulating
information of value for decision making. This can vary from deciding on
commercial sales promotions to environmental planning to national security
decisions. Much of the current work is in the context of conventional relational
databases. In this chapter, we will discuss how to apply one valuable data mining
approach attribute-oriented generalization to a similarity-based fuzzy
OODB.
Background
In this section, we survey the general area of data mining, discuss some of the
relevant work in fuzzy data mining, and then describe the specific technique of
attribute-oriented induction for generalization, which is the focus of this chapter.
Additionally, we describe the fuzzy object-oriented model based on similarity
relationships that is the context in which we investigate data generalization.
Data Mining
Data mining or knowledge discovery generally refers to a variety of techniques
that have developed in the fields of databases, machine learning, and pattern
recognition. The intent is to uncover useful patterns and associations from large
databases.
Although we are primarily interested here in specific algorithms for knowledge
discovery, we will first review the overall process of data mining (Feelders,
Daniels, & Holsheimer, 2000). The initial steps of data mining are concerned
with preparation of data, including data cleaning intended to resolve errors and
missing data and integration of data from multiple heterogeneous sources. Next
are the steps needed to prepare for actual data mining. These include selection
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
of the specific data relevant to the task and transformation of this data into a
format required by the data mining approach. These steps are sometimes
considered to be those in the development of a data warehouse, i.e., an organized
format of data available for various data mining tools. There is a wide variety of
specific knowledge discovery algorithms that were developed (Han & Kamber,
2000). These discover patterns that can then be evaluated based on some
interestingness measure used to prune the huge number of available patterns.
Finally, as true for any decision aid system, an effective user interface with
visualization and alternative representations must be developed for presentation
of the discovered knowledge.
Specific data mining algorithms can be considered as belonging to two categories: descriptive and predictive data mining. In the descriptive category are class
description, association rules, and classification. Class description can provide
characterization or generalization of data or comparisons between data classes
to provide class discriminations. Data generalization is a process of grouping
data, enabling transformation of similar item sets, stored originally in a database
at the low (primitive) level, into more abstract conceptual representations. This
process is a fundamental element of attribute-oriented induction, a descriptive
database mining technique, allowing compression of the original data set into a
generalized relation, which provides concise and summarative information about
the massive set of task-relevant data.
Association rules correspond to correlations among the data items (Agrawal,
Imielinski, & Swami, 1993). They are often expressed in rule form, showing
attribute-value conditions that commonly occur at the same time in some set of
data. An association rule of the form X \Y can be interpreted as meaning that
the tuples in the database that satisfy the condition X also are likely to satisfy
Y, so that the likely implies this is not a functional dependency in the formal
database sense. Finally, a classification approach analyzes the training data (data
with known class membership) and constructs a model for each class based on
the features in the data. Commonly, the outputs generated are decision trees or
sets of classification rules. These can be used for the characterization of the
classes of existing data and to allow the classification of data in the future, and
so can also be considered predictive.
Predictive analysis is also a very developed area of data mining. One common
approach is clustering. Clustering analysis identifies the collections of data
objects that are similar to each other. The similarity metric is often a distance
function given by experts or appropriate users. A good clustering method
produces high-quality clusters to yield low intercluster similarity and high
intracluster similarity. Prediction techniques are used to predict possible missing
data values or distributions of values of some attributes in a set of objects. First,
one must find the set of attributes relevant to the attribute of interest and then
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
predict a distribution of values based on the set of data similar to the selected
objects. A large variety of techniques is used, including regression analysis,
correlation analysis, genetic algorithms, and neural networks, to mention a few.
Finally, a particular case of predictive analysis is time-series analysis. This
technique considers a large set of time-based data to discover regularities and
interesting characteristics. One can search for similar sequences or subsequences, then mine sequential patterns, periodicities, trends, and deviations.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Generalization should be performed on the smallest decomposable components (or attributes) of the data objects in each generalization class G i.
3.
If there is a large set of distinct values for an attribute but there is no higherlevel concept provided for the attribute, the attribute should be removed in
the generalization process.
4.
5.
Two generalized objects may become similar enough to be merged (see the
next section for merging of objects in a fuzzy OODB). So we include an
added attribute, count, to keep track of how many objects were merged to
form the current generalized object. The value of the count of an object
should be carried to its generalized object, and the counts should be
accumulated when merging identical objects in generalization.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
6.
The generalization is controlled by providing levels that specify how far the
process should proceed. If the number of distinct values of an attribute in
the given class is larger than the generalization threshold value, further
generalization on this attribute should be performed. If the number of
objects of a generalized class is larger than the generalization threshold
value, the generalization should proceed further.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Identity (=): The identity predicate corresponds to the equality of references or pointers in conventional languages.
2.
Shallow equality (se): Two objects are shallow equal if their states or
contents are identical, i.e., corresponding instance variables need not be the
same object, contents must be identical objects.
3.
Deep equality (de): This ignores object identities and checks whether two
objects are instances of the same class (i.e., same structure or type) and
whether the values of the corresponding base objects are the same.
It is clear that identity is stronger than shallow equality, and shallow equality is
stronger than deep equality. If identity holds, the same can be said of shallow and
deep equality; if shallow equality holds, so does deep equality.
The most powerful aspect of an OODM is its ability to model inheritance. A class
may inherit all the methods and attributes of its superclass. When a class inherits
from one superclass, this is known as single inheritance. The situation in which
a class inherits from more than one superclass is called multiple inheritance, and
the inheritance structure forms a lattice. The classsubclass relationships form
a class hierarchy similar to a generalizationspecialization relationship. Another
hierarchy that may originate at an attribute is the class composition hierarchy
(Kim, 1989). The class composition hierarchy is distinct and orthogonal to the
class hierarchy.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Specialization subclasses (also referred to as partial subclass or objectoriented subclass), where the subclass is a specialization of its immediate
superclass, i.e., computer science is a specialization of engineering.
2.
Subclasses that are subsets of its immediate superclass, i.e., the class of
employees is a subset subclass of the class of persons.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1.
2.
For two objects o and o' such that o, o' ext(Ci), if o de o' or o se o', then
o(C i) = o'(Ci). In other words, two objects have the same membership in
a class (and all its superclasses) if they are value equal.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
combines two object instances of a class into a single object instance, provided
predefined level values are achieved. The merge operator at the same time
maintains the membership relationship existing between the object/class and its
class/superclass.
Assume for generality two object members of a given class Ci with list-directed
class/superclass memberships:
o = (i, <ak1:ik1, ak2:ik2,..., akm:ikm>, <o(Ci), o(C i+1),..., o(Cn)>)
o' = (i' , <ak1:ik1' , ak2:ik2' ,...,akm:ikm' >,<o'(C i),o'(Ci+1),...,o'(Cn)>)
So o is a fuzzy object in Ci if o ext(Ci) and o(C i) takes values in the range [0,1].
Now we must consider how the data values as described by similarity relations
behave (Petry, 1996). Assume attribute akj of class Ci with a noncomposite
domain D j. By definition of fuzzy object, the domain of akj is dkj D j. So the
similarity threshold of D j is:
Thresh(Dj) = min { min x,ydjk [ s(x,y) ] }
where o ext(Ci) and x, y are atomic elements.
The threshold of a composite object is undefined. A composite domain is
constituted of simple domains (at some level), each of which has a threshold
value, i.e., the threshold for a composite object is a vector. The threshold value
represents the minimum similarity of the values an object attribute may take. If
the attribute domain is strictly atomic for all objects of the class (i.e., cardinality
of aij is 1), then the threshold = 1. As the threshold value ranges toward 0, larger
chunks of information are grouped together, and the information conveyed about
the particular attribute of the class decreases. A level value given a priori
determines the objects that may be combined by the set union of the respective
domains. Note that the level value may be specified via the query language with
the constraint that it may never exceed the threshold value.
Merging Objects
For object oi and oi', assume akj, the domain (akj) is noncomposite:
o i '' = Merge(o i , o i ') = (i'', <a k1 :i k1 '', a k2 :i k2 '',.., a kj :i kj '',.., a km :i km ''>,
<o''(Ci),o''(Ci+1),..,o'' (Cn)>)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
where okj'' = (ikj'', {ikj, ikj'}) and o''(Cm) = f ((Cm), (Cm')) m, m = 1,...n such that
val(ikj), val(ikj') dij d ij ': min[s(val(i kj ), val(i kj')) > Level(D j )] and
Level(D j) Thres(D j).
The merge operator permits a reorganization of the objects belonging to a class
scheme by grouping them according to the similarity of an attribute object to
another. As in the definition of threshold, the definition can be extended to
composite objects.
Two objects in an OODBMS can be nonredundant even if they are shallow
equal. By introducing fuzziness into the model, however, we weaken this
property. Two objects that are shallow equal are redundant, as are objects
exhibiting deep equality. But equality alone does not determine redundancy, and
the following is the characteristic of redundancy:
Two objects oi and oi' are redundant iff j, j = 1, 2, ..., m and Level(Dj) given
a priori
val(ikj), val(ikj') dij dij': min[s(val(ikj), val(ikj')) > Level(D j)]
This property of redundancy (Buckles & Petry, 1982) is directly responsible for
the property of value abstraction exhibited by the fuzzy database. It also ensures
that the results of database operations are unique.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The set of concepts at each level of hierarchy should cover all of the
attribute values that occurred in the original database (so we are guaranteed not to lose the number of objects when generalizing their values).
2.
Never allow any attribute value (or its abstract) to be counted more or less
than once at each level of the generalization hierarchy. (When we allow a
concept to partially belong to more than one of its direct abstracts, we have
to check each time that the sum of fractional memberships is equal to 1.0).
This aspect is especially important when we plan to apply attribute-oriented
generalization as a pre-analysis tool, to compress the initial data set to a
form more appropriate for the application of computationally complex data
mining algorithms (e.g., association rules mining).
For the purpose of further analysis, we distinguish three basic types of generalization hierarchies:
1.
Crisp concept hierarchy (Han, 1995; Hilderman et al., 1999): Here each
attribute variable (concept) at each level of the hierarchy can have only one
direct abstract (its direct generalization) to which it fully belongs. (There is
no consideration of the degree of relationship, e.g., {master of art, master
of science, doctorate} graduate, {freshman, sophomore, junior, senior}
undergraduate.) This is as shown in the tree in Figure 1.
2.
Fuzzy concept hierarchy (Lee & Kim, 1997; Raschia & Mouaddib, 2002):
The hierarchy of concepts here reflects the degree with which one concept
belongs to its direct abstract and more than one direct abstract of a single
concept is allowed. Because of the lack of guarantee of exact count
propagation, such a hierarchy seems to be more appropriate for simplified
data summarization, or for the cases when subjective results are to be
emphasized (when we purposely want to modify the roles or influences of
certain objects). Utilization of the four popular text editors could be
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
undergraduate
freshman
sophomore
junior
graduate
senior
M.A.
M.S.
Ph.D.
3.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1.0
0.8
0.7
0.7
0.5
0.5
Dark
Brown
0.8
1.0
0.7
0.7
0.5
0.5
Auburn
Red
Blond
Bleached
07
0.7
1.0
0.8
0.5
0.5
0.5
0.7
0.8
1.0
0.5
0.5
0.5
0.5
0.5
0.5
1.0
0.8
0.5
0.5
0.5
0.5
0.8
1.0
data mining task or personal preferences of the analyst. Each of these relations
among concepts can be reflected in a similarity relation, because the user or datamining analyst can be allowed to modify the values in the similarity table in the
individuals user view of the database to represent the similarity between the
concepts (attribute values) in the context of interest.
The existence of a similarity relation modeled for a particular domain can lead
to the extraction of a crisp concept hierarchy, allowing attribute-oriented
generalization. Let S be the -cut of the similarity relation S, presented in Table 1.
It can be shown (Zadeh, 1970) that if S is a similarity relation on a given domain
Dj (which is a single attribute in our case), then (0,1] each S creates
equivalence classes in the domain Dj. Now, let denote the equivalence class
partition induced on domain Dj by S. Clearly, ' is a refinement of if ' .
A nested sequence of partitions 1, 2,, k may be represented diagrammatically in the form of a partition tree.
The nested sequence of partitions in the form of a tree has a structure identical
to the crisp concept hierarchy for data mining generalization purposes (Figure 2).
The increase of abstraction in the partition tree is denoted by decreasing values
of ; lack of abstraction during generalization (0-abstraction level at the bottom
of generalization hierarchy) complies with the 1-cut of the similarity relation
( = 1.0), and can be denoted as S1.0.
An advantage of attribute-oriented generalization with OODBs using similarity
relations is that such an hierarchy is implicit in the object-oriented fuzzy model
and can be extracted automatically, even by a user who has no background
knowledge about the particular domain. Experienced analysts not satisfied with
an existing similarity relation may then define their own similarity tables in user
views to better reflect their knowledge about the attribute values.
The only difference in Figure 2 from crisp concept hierarchies is their lack of
abstract concepts used as labels characterizing the sets of generalized (grouped)
concepts. In our example, we could generalize the values blond and bleached
to one common descriptor BLONDISH, auburn and red to REDDISH, and
black and dark brown to DARKISH (to maintain consistency of the naming
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
BLACK
BLACK
BLACK
BLACK
D .B R O W N A U B U R N
D .B R O W N A U B U R N
D .B R O W N
D .B R O W N
AUBURN
AUBURN
RED
RED
RED
RED
BLOND
BLEACHED
= 0 .5
BLOND
BLEACHED
= 0 .7
BLOND
BLEACHED
= 0 .8
BLOND
B L E A C H E D = 1 .0
ABSTRACTION LEVEL
convention at the first level of abstraction). At the next level of the generalization
hierarchy, we can keep the concept BLONDISH, because there is no change in
its components; however, according to the taxonomy presented in Figure 2, the
concepts DARKISH and REDDISH should be generalized and should have a
new descriptor, which we call DARK to emphasize the change. A term ANY is
usually placed at the highest level of concept hierarchy, to emphasize that the
name describes all values possibly occurring in the particular domain. When
defining abstract names for generalized sets of attribute values, we need to
remember that the lower cut of the similarity relation (smaller values of )
represents a higher abstraction of generalization descriptors.
Due to the nested character of partitions as a result of -cuts of a similarity
relation, to specify a complete set of abstract descriptors it is sufficient to choose
one value of the attribute per equivalence class partition at each level of the
hierarchy, represented by in Table 2. This is sufficient to build the generalization hierarchy in Figure 3.
Because the similarity relation can generate only a nested sequence of equivalence partitions via a decrease in similarity level, we cannot extract a fuzzy
concept hierarchy from the similarity table. The disjoint character of equivalence
classes generated from the similarity relation does not allow any concept in the
Abstraction Level
0.8
0.8
0.8
0.7
0.7
0.5
Abstract Descriptor
DARKISH
REDDISH
BLONDISH
DARK
BLONDISH
ANY
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
=0.5
ANY
DARK
DARKISH
BLACK
D.BROWN
REDDISH
AUBURN
RED
BLONDISH
=0.7
BLONDISH
=0.8
BLOND
BLEACHED =1.0
ABSTRACTION LEVEL
hierarchy to have more than one direct abstract at every level of the generalization hierarchy. A similarity table can be utilized to form a crisp generalization
hierarchy. Such an hierarchy can be successfully applied as a foundation to the
development of a fuzzy concept hierarchy. Data-mining analysts can extend the
crisp hierarchy with additional edges to represent partial membership of the
lower-level concepts in their direct abstract descriptors. Depending on the
assigned memberships, reflecting preferences of the user, they can create
consistent or inconsistent fuzzy concept hierarchies.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
a range (actually occurring values), and a typical range (most common values),
and we apply this classification to the generalization process. With an abstract
concept we can usually identify its typical direct specializers, the elements
clearly belonging to it (e.g., we all would probably agree here that black hair can
be generalized to the descriptor DARK with 100% accuracy). This can be
represented as a core of the fuzzy set (abstract concept). However, there are
also lower-level concepts that cannot be definitely assigned to only one of their
direct abstracts (e.g., assigning blond fully to the abstract concept LIGHT hair
is problematic because there are many people with dark blond hair). We term
such cases possible direct specializers, concepts in the group of lower-level
concepts characterized by the given abstract descriptor (fuzzy set) with membership 1. These are the support of a fuzzy set and are interpreted as the
range of the abstract concept.
Now we define each abstract concept as a set of its typical original attribute
values with the level of doubt about its other possible specializers reflected by
the value of . Then we select the fuzzy similarity class created from the -cut
of similarity relation for these predefined typical specializers and analyze if this
fits our needs. For instance, define the abstract concept LIGHT hair by the
attribute variable bleached with the level of similarity = 0.8 to spread the
range of this abstract descriptor (LIGHT is predefined as the similarity class
BLEACHED0.8). From the similarity relation presented in Table 1, we can derive:
LIGHT = BLEACHED0.8 = {bleached|1.0; blond|0.8}
Of course, each of the abstract concepts can be defined by more than one typical
representative element (in such a case we may also choose an intersection
operator, as best fits our preferences). Assume the descriptor DARK to be
principally defined by the following original values of the HAIR COLOR domain:
black, d.brown, and auburn. Assuming the similarity level to be 0.7, we would
obtain:
DARK = MAX(BLACK0.7; D. BROWN0.7; AUBURN0.7)
={black|1.0;d.brown|1.0;auburn|1.0;red|0.8}
Using both of these abstract concepts, with assumption that only DARK and
LIGHT colors occur at the given level of HAIR COLOR generalization, we
construct the fuzzy generalization hierarchy (Figure 4).
The hierarchy in Figure 4 is called a simplified fuzzy concept hierarchy, because
the fractional memberships of low-level concepts to their abstract descriptors
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
L IG H T
0.8
0.
AUBURN
RED
BLOND
1. 0
D .B R O W N
1. 0
BLACK
1. 0
DARK
1.
ABSTRACTION LEVEL
ANY
BLEACHED
make it similar to the fuzzy concept hierarchy described previously. Each of the
original attribute values belongs to only one direct abstract, creating a simplified
(crisp-hierarchy-like) structure. For instance, define an abstract concept BLACKISH as the -cut from the similarity table for black at the level 0.7:
BLACKISH = BLACK0.7 = {black|1.0; d.brown|0.8; auburn|0.7;red|0.7}
Simultaneously introduce the abstract class BROWNISH at the same -level:
BROWNISH = D.BROWN0.7 = {black|0.8; d.brown|1.0; auburn|0.7;red|0.7}
We can derive the fuzzy concept hierarchy and even modify the generalization
model to become consistent through the normalization of derived memberships:
BLACKISH = BLACK0.7 = {black|
1.0
1.8
;d.brown|
0.8
1.8
; auburn|
0.7
1.4
;red|
0.7
1.4
}=
{black|0.6;d.brown|0.4; auburn|0.5;red|0.5}
BROWNISH = D.BROWN0.7 = {black|
0.8
1.0
0.7
; d.brown| ; auburn|
1.8
1.8
1.4
;red|
0.7
1.4
}=
{black|0.4;d.brown|0.6; auburn|0.5;red|0.5}
Despite the formally correct appearance, this mechanism may be inappropriate.
We characterized two new generalization concepts (BLACKISH and BROWNISH) with a low level of imprecision (each had only one typical direct specializer),
simultaneously choosing a relatively high degree of abstraction ( = 0.7) when
extracting -cuts from the similarity relation. This resulted in two fuzzy similarity
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
classes (BLACK 0.7 and D.BROWN 0.7) that were overlapping and led to the
consistent fuzzy concept hierarchy in Figure 5 (derived through the normalization
of membership degrees). Extraction of two fuzzy classes from the similarity
table at the similarity level where they were considered to be equivalent (black
and d.brown belong to the same equivalence class partition at the similarity level
0.7), despite being formally possible, often may not be semantically meaningful.
This situation may occur when the abstract concepts are characterized incorrectly at the particular level of generalization (which is the case here) or the
similarity relation represents the similarity between these concepts in the
perspective not compatible with the context represented in the particular
generalization hierarchy. It makes no sense to define two or more general
concepts at a level of abstraction so high that they are interpreted as identical.
This rationale found its natural reflection in the distribution of memberships
presented in the consistent fuzzy concept hierarchy (Figure 5), where both of the
introduced abstract concepts have almost identical compositions of their direct
specializers.
Some guidelines are needed when characterizing abstract concepts via their
typical direct specializers and trying to extract their full definition (range of
possible direct specializers) using a similarity table:
1.
We need to assure that the intuitively assumed value of extracts the cut
(subset) of attribute values that corresponds closely to the definition of the
abstract descriptor for which we were looking. The strategy for choosing
the most appropriate level of -cut when extracting the abstract concept
definitions arises from the guideline of minimal generalization (the minimal
concept tree ascension strategy described in the second section). Based on
this strategy, we would recommend always choosing a definition extracted
at the highest possible level of similarity (biggest ), where all predefined
typical components of the desired abstract descriptor are already embraced
(where they occur for the first time).
D.BROWN
0.4 .6
0
0.5
BROWNISH
0.5
AUBURN
0.5
0.5
BLACK
0.4
0.6
DARKISH
RED
ABSTRACTION
LEVEL
Figure 5. Consistent fuzzy concept hierarchy for the attribute HAIR COLOR
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
4.
The approach described here seems to allow us to form only flat (one-level)
generalization hierarchies or to derive the generalized concepts at the first level
of abstraction in the concept hierarchy. Each abstract concept defined with this
method is a generalization of original attribute values, and therefore cannot be
placed at the higher level of the concept hierarchy. However, there is no obstacle
preventing these concepts from being further generalized.
The lack of ability to derive multilevel hierarchical structures does not prevent
this approach from being appropriate, and actually convenient, for rapid data
summarization or something we term selective attribute-oriented generalization. To summarize the given data set, we may prefer to not perform gradual
(hierarchical) generalization but replace it with a one-level hierarchy covering a
whole domain of attribute values. Such an appropriately built flat hierarchy
would represent the majority of dependencies between the original low-level
concepts, which are to be generalized, by the propagation of fractions of counts
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
coming from each attribute value, instead of having to perform detailed hierarchical generalization.
In selective generalization, we generalize all attribute values from a specific point
of view, which is dictated by the character of the data mining task. Assume that
we are interested in association rules regarding only people who have dark hair.
Using the similarity relation, we derive the following:
DARKISH = MAX(BLACK0.7;D.BROWN 0.7)
={black|1.0;d.brown|1.0;auburn|0.7;red|0.7}
This reflects the following interpretation: All people who have black or dark
brown hair are considered to have DARKISH hair, and 70% of redheads and
people with auburn hair have it in a dark shade. This is sufficient to explain the
difference between selective generalization and the application of data selection
when building the initial data-mining class G0. In both cases, we omit all objects
with hair; however, in the case of selective generalization, 70% of each count
represented by each object with red or auburn hair color remains. This is
obviously not equivalent to the extraction of all objects with values red and
auburn and then randomly choosing 70% of them for further generalization.
With selective generalization, we do not omit the objects but decrease their
influence to an appropriate representation of their importance for the given datamining problem.
We should finally point out that consistent fuzzy hierarchies are not appropriate
tools for selective attribute-oriented generalization. In this case, we do not want
to have normalization of counts values to preserve exact count dilemma, we
instead want to preserve an unbalanced relation between the objects, as this
reflects dependencies occurring in real-life data. The ordinary fuzzy hierarchies
seem to be the most appropriate for such purposes.
Although we focused on nonnumeric data in this discussion of fuzzy concept
hierarchies, the generalization of numeric attributes can be performed in a similar
manner. Of course, the numeric hierarchy can be based on similarity relationships for fuzzy numbers, such as was already developed for fuzzy databases
(Buckles & Petry, 1984; Petry 1996), and used as described above for nonnumeric
data. In the case of numeric data, it is possible to analyze the data distribution
characteristics. It may then not be necessary to have predefined concept
hierarchies. For example, consider an income range study in which the incomes
can be clustered into several groups, {< 20K, 2035K, 3545K, 4550K, >50K},
based on some statistical clustering tool. Obviously, further clustering can be
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Conclusions
We considered in detail the issues relative to concept hierarchies for attribute
generalization, as the use of a concept hierarchy is the essential component of
the generalization process. As we have seen, there are several approaches that
can be taken depending on the exact intention of the data-mining application. This
allows one to be more flexible in dealing with fuzzy objects in the similarity-based
fuzzy OODB model we described, in particular, due to the ability to create
hierarchies from the given similarity relationships for the data domains.
There are several directions that can be profitably followed in this area for
OODBs that we have not considered to date. Two of particular interest that we
are currently studying are the issues of generalization of methods and the use of
aggregation as a structuring mechanism. As an application area, the problem of
generalization of multimedia data, especially spatial data (Ladner, Petry, &
Cobb, 2003), in a fuzzy OODB is of particular interest. Also, we have been
considering the extension of fuzzy hierarchy development in a database utilizing
proximity relationships (Angryk & Petry, 2003) and plan on extending the fuzzy
OODM to accommodate generalization via proximity relations.
ACKNOWLEDGMENTS
We would like to thank the Naval Research Laboratorys Base Program,
Program Element No. 0602435N for sponsoring this research.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules
between sets of items in large databases. In Proceedings of the 1993
ACM-SIGMOD International Conference on Management of Data
(pp. 207216). New York: ACM Press.
Angryk, R., & Petry, F. (2003). Consistent fuzzy concept hierarchies for
attribute generalization. In Proceedings IASTED International Conference on Information and Knowledge Sharing (IKS 2003) (pp. 158193).
Angryk, R., & Petry, F. (2003). Data mining fuzzy databases using attributeoriented generalization. In Proceedings of the IEEE International Con-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Han, J., Nishio, S., & Kawano, W. (1994). Knowledge discovery in objectoriented and active databases. In F. Fuchi, & T. Yokoi (Eds.), Knowledge
building and knowledge sharing (pp. 221230). Singapore: IOS Press.
Han, J., Nishio, S., Kawano, H., & Wang, W. (1998). Generalization-based data
mining in object-oriented databases using an object-cube model. Data and
Knowledge Engineering, 25(12), 5597.
Hilderman, R., Hamilton, H., & Cercone, N. (1999). Data mining in large
databases using domain generalization graphs. Journal of Intelligent
Information Systems, 13(3), 195234.
Hirota, K., & Pedrycz, W. (1999). Fuzzy computing for data mining. In
Proceedings of the IEEE, 87, 15751599.
Kacprzyk, J. (1999). Fuzzy logic for linguistic summarization of databases. In
Proceedings of the Eighth International Conference on Fuzzy Systems
(pp. 813818). Seoul, Korea.
Kacprzyk, J., & Zadrozny, S. (2000). On combining intelligent querying and data
mining using fuzzy logic concepts. In G. Bordogna, & G. Pasi (Eds.),
Recent issues on fuzzy databases (pp. 6781). Heidelberg: PhysicaVerlag.
Khoshafian, S., & Copeland, G. (1986). Object identity. In Proceedings of the
OOPSLA 86 Conference (pp. 406416). New York: ACM Press.
Kim, W. (1989). A model of queries for object-oriented databases. In Proceedings of 15th International Conference on Very Large Databases (pp.
4554).
Koyuncu, M., & Yazici, A. (2003). IFOOD: An intelligent fuzzy object-oriented
database architecture. IEEE Transactions Knowledge and Data Engineering, 15(5), 11371154.
Kuok, C., Fu, A., & Wong, H. (1998). Mining fuzzy association rules in
databases. ACM SIGMOD Record, 27, 4146.
Ladner, R., Petry, F., & Cobb, M. (2003). Fuzzy set approaches to spatial data
mining of association rules. Transactions on GIS, 7(1), 123138.
Laurent, A., Bouchon-Meunier, B., Doucet, A., Gancarski, S., & Marasal, C.
(2000). Fuzzy data mining from multidimensional databases. Studies in
Fuzziness and Soft Computing, 54, Proceedings of ISCI (pp. 245256).
Lee, D., & Kim, M. (1997). Database summarization using fuzzy ISA hierarchies. IEEE Transactions On Sysems, Man, and Cybernetics Part B,
27(1), 6878.
Lee, J., Xue, N., Hsu, K., & Yang, J. (1999). Modeling imprecise requirements
with fuzzy objects. Inf. Sci., 118, 101119.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter IV
FRIL++ and
Its Applications
J. M. Rossiter
University of Bristol, UK &
Bio-Mimetic Control Research Center, The Institute of Physical and
Chemical Research (RIKEN), Japan
T. H. Cao
Ho Chi Minh City University of Technology, Vietnam
Abstract
We introduce a deductive probabilistic and fuzzy object-oriented database
language, called FRIL++, which can deal with both probability and
fuzziness. Its foundation is a logic-based probabilistic and fuzzy objectoriented model where a class property (i.e., an attribute or a method) can
contain fuzzy set values, and uncertain class membership and property
applicability are measured by lower and upper bounds on probability.
Each uncertainly applicable property is interpreted as a default probabilistic
logic rule, which is defeasible, and probabilistic default reasoning on fuzzy
events is proposed for uncertain property inheritance and class recognition.
The design, implementation, and basic features of FRIL++ are presented.
FRIL++ can be used as both a modeling and a programming language, as
demonstrated by its applications to machine learning, user modeling, and
modeling with words herein.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
For modeling real-world problems and constructing intelligent systems, the
integration of different methodologies and techniques has been the quest and
focus of significant interdisciplinary research effort. The advantages of such a
hybrid system are that the strengths of its partners are combined and are
complementary to each others weakness.
In particular, object orientation provides a hierarchical data abstraction scheme
and an information hiding and inheritance mechanism; probabilistic/fuzzy reasoning provides measures and rules for representing and reasoning with uncertainty and imprecision in the real world; logic programming provides a declarative
way for problem specification and well-founded semantics for formal reasoning.
However, research on combining all three modeling and computing paradigms
appears to be sporadic.
In Eiter et al. (2001), the authors developed algebra to handle object bases with
uncertainty, where conditional probabilities for an object of a class being a
member of its subclasses are given, and membership of an object to a class is
expressed by a probability value, but fuzzy values are not allowed in class
properties. Meanwhile, there have been many fuzzy object-oriented models
developed, such as those of Bordogna et al. (1999), George et al. (1993),
Itzkovich and Hawkes (1994), Rossazza et al. (1997), and Van Gyseghem and
De Caluwe (1997), but they are not deductive. Yazici and George (1999) present
a deductive fuzzy object-oriented model that, however, does not address
uncertain applicability of properties.
In Dubitzky et al. (1999), each property of a concept is assumed to have a
probability degree for it occurring in exemplars of that concept. However, the
method therein for computing a membership degree of an object to a concept,
based on matching the objects properties with the uncertainly applicable
properties of the concept, is in our view not justifiable. Also, the work does not
address the problem of how inheritance is performed under the membership and
applicability uncertainty.
Recently, Blanco et al. (2001) and De Tr (2001) sketched general models to
manage different sources of imprecision and uncertainty, including probabilistic
ones, on various levels of an object-oriented database model. However, no
foundation was laid to integrate probability theory, and fuzzy logic in case
probability was used to represent uncertainty. In Cross (2003), the author
reviewed existing proposals and presented recommendations for the application
of fuzzy set theory in a flexible generalized object model.
In this chapter, we summarize the main features of a logic-based probabilistic
and fuzzy object-oriented model where a class property can contain fuzzy sets
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
interpreted as families of probability distributions, and uncertain class membership and property applicability are measured by lower and upper bounds on
probability. On the basis of this model, we present the development of FRIL++,
which extends FRIL (Baldwin et al., 1995) with object-oriented features, as a
modeling and programming language for probabilistic and fuzzy object-oriented
deductive databases and knowledge bases, in the same way as predicate logic
programming languages (e.g., Datalog) have been used for classical deductive
databases and knowledge bases. Various applications of FRIL++ are then
demonstrated.
The next section presents the logic-based probabilistic and fuzzy object-oriented
model. In the following section, we introduce probabilistic default reasoning and
its application to fuzzy events as a suitable approach to uncertain property
inheritance and class recognition. We then present our solutions for uncertain
inheritance of attributes, uncertain inheritance of methods, and uncertain recognition of classes. Subsequent sections present the implementation and the basic
features of FRIL++. In the final two sections, we present our application of
FRIL++ to machine learning, user modeling, and modeling with words. Finally,
we conclude the chapter and suggest future research.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
who stated that the membership degree of an object to a class is at least equal
to its membership degree to a subclass of that class. In fact, if C1 is a subclass
of C2, then Pr(C1) Pr(C2).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The practical significance of this later definition is that one needs to consider only
the preferred default subsets and deduction on them in order to obtain default
consequences. Specifically, if P1, P2, ..., Pn are all the preferred default subsets
of D and, for every i from 1 to n, Fi is a logical consequence of T Pi E, then
F1 F2 ... Fn is a default consequence of E. In particular, with F i [l i, u i]
for every i from 1 to n, one has [l, u] is a default consequence of E where
[l, u] = i=1,n[l i , u i], that is, l = min i=1,n{l i } and u = maxi=1,n{ui}.
For fuzzy events characterized by fuzzy sets, in this work, we apply the voting
model interpretation of fuzzy sets (Baldwin et al., 1995; Gaines, 1978), whereby,
given a fuzzy set A on a domain U, each voter has a subset of U as his or her own
crisp definition of the concept that A represents. The membership function value
A(u) is then the proportion of voters whose crisp definitions include u. As such,
A defines a probability distribution on the power set of U across the voters, and
thus a fuzzy proposition x is A defines a family of probability distributions of the
variable x on U. Fuzzy events are said to be consistent with each other iff the
intersection of their characterizing fuzzy sets is a normal fuzzy set (i.e., one with
a maximal membership function value of 1). Baldwin et al. (1995, 1996) describe
the conditioning operations over fuzzy sets and the tractable calculation of the
expected fuzzy set used in this default reasoning framework.
2.
3.
As shown in Lukasiewicz (2000), all three steps are intractable in the probabilistic case. The computational complexity is mainly due to checking consistency
and performing global inference on a probabilistic knowledge base. In applying
that framework to uncertain inheritance for the uncertain object-oriented model,
we propose an approximation for default consequences correspondingly as
follows:
1.
Consider only one priority ordering based on the class specificity ordering.
2.
3.
Apply local inference using Jeffreys rule for deriving logical consequences. Details are explained below.
Let D be partitioned into D0, D1, ..., Dk such that, for every i and j from 1 to n,
if Cj is a subclass of C i, ((Ai) | Ci)[li, ui] D s and ((Aj) | C j)[lj, uj] Dt, then
s < t. Intuitively, D0 comprises the defaults for of the classes that are not
subclasses of any other; D1 comprises the defaults for of the classes that are
the immediate subclasses of those classes; and so on. The priority ordering p is
then defined such that d p d* iff d Ds, d* D t, and s < t.
For every i from 1 to n, Jeffreys rule gives:
Pr((Ai)) = Pr((Ai) | Ci). Pr(Ci) + Pr((Ai) | Ci). Pr(Ci)
with li Pr((Ai) | C i) ui, i Pr(Ci) i, and Pr(C i) = 1 - Pr(Ci). On the
assumption that only 0 Pr((Ai) | Ci) 1 is known, one obtains:
li.i Pr((Ai)) ui.i + (1 - i)
That is, O inherits (Ai)[li.i, ui.i + (1-i)] from each Ci, which can be transformed
into (Bi), where Bi is the expected fuzzy set of Ai[li.i, ui.i + (1 - i)]. We note
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
that, in general, lower and upper bounds of Pr((Ai)) also depend on i, but not
in this case when Pr((Ai) | Ci) is unknown.
Let B0 be the expected fuzzy set of A0[l0, u0], and, for every i from 1 to n, Ai* =
Bi B0. Our notion of weak consistency is now introduced as follows. Let D* be
a subset of D. Without loss of generality, assume that D* = {((Ai) | Ci)[li, ui]
| 1 i m n}. Then T D* E is said to be w-consistent wrt (i.e., with respect
to) iff i=1,mAi* is a normal fuzzy set. For computing the preferred default
subsets of D, instead of considering the subsets of D that are consistent with T
and E, one now considers those that are w-consistent wrt with T and E.
As such, the preferred default subsets of D can be obtained in the two following
steps:
1.
2.
Compare those consistent subsets to select the ones that none of the others
is preferred to, based on the priority ordering on D defined above.
The multiple-inherited attribute (A) for O is then with A being the union of those
intersection fuzzy sets obtained from the preferred default subsets.
The reason for taking only the largest consistent subsets in Step 1 is that, as noted
previously, a consistent set of defaults is always preferred to its proper subsets.
For this step, we employ the algorithm in Dubois et al. (2000), which has the
computational complexity O(n2), and shows that the maximal number of the
consistent subsets is n. For the second step, as shown in Cao (2001), each
comparison takes time proportional to the sizes of the two involved subsets, while
the number of the comparisons is of the square order of the number of the
consistent subsets. Because the maximal size and the maximal number of the
consistent subsets are n, the computational complexity of this step is O(n3). Thus,
the overall computational complexity of the above multiple inheritance procedure
is O(n3).
The proposal for uncertain inheritance of attributes presented above can be
extended for uncertain inheritance of methods as follows. Let C1, C 2, ..., Cn be
the classes that contain methods with heads that are the same y. For each i from
1 to n, let the set of those methods in C i be {(Aiq) iq [l iq1, u iq1][l iq2, u iq2]
| 1 q m i}, and denote q=1,mi{((Aiq) | iq, C i)[liq1, uiq1], ((Aiq) | iq, C i)[liq2,
uiq2]} by Si.
We now consider each Si as an elementary default. Then one has the default
theory (T, D), where T = {(C i | C j) | C j is a subclass of C i , 1 i, j n}, and
D = {S i | 1 i n}. Also, suppose an evidence set E = {Ci[i, i] | 1 i n}S0,
where S0 = q=1,m0{((A0q) | 0q)[l0q1, u0q1], ((A0q) | 0q)[l0q2, u0q2]}. Here
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
each [i, i] is a support for an object of discourse O being a member of Ci, while
S0 gives prior methods to O.
For a priority ordering p, D is also partitioned into D 0, D1, ..., Dk in a similar way
as in the case of uncertain inheritance of attributes. That is, for every i and j from
1 to n, if C j is a subclass of Ci, SiDs, and SjDt, then s < t; S p S* iff S Ds,
S* Dt, and s < t.
Suppose that (A) [l1, u1] [l2, u2] is a method in class C and [, ] is a support
pair for an object of discourse O being a member of C. Jeffreys rule gives:
Pr((A)) = Pr((A) | , C).Pr(, C) + Pr((A) | , C).Pr(, C) +
Pr((A) | , C).Pr(, C) + Pr((A) | , C).Pr(, C)
And, one obtains the lower bound x and the upper bound y for Pr((A)) as proved
in Cao (2001) as follows:
1.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Implementation of FRIL++
The probabilistic and fuzzy object-oriented model presented above provides a
formal basis for the design and implementation of FRIL++ (Baldwin et al., 2000;
Cao et al., 2002; Cao et al., 2001; Rossiter et al., 2000), the object-oriented
extension of FRIL (Baldwin et al., 1995), a PROLOG-like logic programming
language dealing with both probability and fuzziness. Like any other objectoriented system, a FRIL++ system is associated with a class hierarchy. Besides
particular classes for the domain of the system, there is a special class, namely,
FRIL++, which is common to all FRIL++ systems. The class FRIL++ is at the
top of a class hierarchy, containing all FRIL++ built-in predicates, which can be
inherited by all classes in a FRIL++ system.
As in Moss (1994), objects are also treated as classes situating at the bottom of
a FRIL++ class hierarchy, so that they can have their own properties, which may
not be defined in any class. The reason for this is that in reality, a class can
describe only a finite set of common properties of a group of objects, which may
have other properties. Furthermore, in FRIL++, objects can be changed not only
in the values of its properties, but also in its properties themselves, i.e., being
added or deleted, as happens in the real world.
In McCabe (1992), object-oriented logic programs were translated into normal
logic programs of a logic programming system, such as Prolog, to be executed
by the theorem prover of the system. In order to employ FRILs probabilistic and
fuzzy theorem prover, we follow this approach in the implementation of FRIL++
by writing a compiler, using FRIL to translate a FRIL++ source program into a
FRIL target program to be executed by FRIL.
Following McCabe (1992), the execution of an object-oriented logic program is
considered as having two phases, namely, the label phase and the body phase.
In the label phase, the system determines the actual classes with definitions for
the currently called property that are to be executed. Then, once those classes
are determined, the system enters the body phase to execute the property as
defined in the bodies of the classes.
Corresponding to these label phase and body phase are label clauses and body
clauses of a target program, which is a normal logic program, translated from an
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
TALLNOTFATMAN
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The definitions of the classes are written in the following FRIL++ program:
((public class Person extends (Universal))
(constants
(tall [0:0 1.5:0 1.8:1 2.5:1] )
(notSlim [0:1 16:1 22:0 28:1 45:1])
(notFat [0:1 22:1 28:0 45:0])
(properties
((height _ ))
((weight _ ))
((bodyMassIndex B)
(height H)
(times H H H2)
(weight W)
(times B H2 W))
((Person H W)
(setprop ((height H)) )
(setprop ((weight W)) )) ))
((public class TallMan extends (Person))
(properties
((handsome)) : (.9 1)
((isa TallMan)
(height H)
(match tall H)) ))
((public class TallNotSlimMan extends (TallMan))
(properties
((handsome)) : (0 .5)
((isa TallNotSlimMan)
(isa TallMan)
(bodyMassIndex B)
(match notSlim B)) ))
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The property main in the class MAINC LASS provides the entry point for executing
a FRIL++ program. In this example, John is created as a person of height 1.75
and weight 70, and a support pair for him being a handsome man is computed by
the FRIL++ built-in support query qs. Similarly, Bill is created as a person of
height 1.75 and weight 85, and a support pair for him being a handsome man is
computed.
As such, John is a member of TALLMAN and TALLNOTS LIMMAN with the support
pairs [.833, 1] and [.119, 1], respectively, and thus inherits handsome[.75, 1]
from TALLMAN and handsome[0, .941] from TALLNOT SLIMMAN. So the support
pair for John being handsome is [.75, 1] [0, .941] = [.75, .941].
Meanwhile, Bill is a member of TALL MAN and TALLNOTSLIMMAN with the support
pairs [.833, 1] and [.799, 1], respectively, and thus inherits handsome[.75, 1]
from TALLMAN and handsome[0, .601] from TALLN OTS LIMMAN. In this case,
because [.75, 1] [0, .601] = [] and ((handsome)) : (0 .5) in TALLNOT SLIMMAN
is assumed to have a higher priority than ((handsome)) : (.9 1) in TALLMAN, the
support pair for Bill being handsome is [0, .601], using default reasoning.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
KNOWLEDGEBASE
Content
Input/Output
Edit
Query
TESTER
Testing Parameters
Testing Method
RELATIONALKB
DATATABLE
RULEBASE
DEDUCTIVEKB
DECISIONTREE
GRAPHKB
CONCEPTUALGRAPH
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
DATABROWSER
BNBASED
DEDUCER
NNBASED
ABDUCER
SVMBASED
FUZZYID3
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The face problem is to learn which faces are male and which faces are female,
based on measurement of 18 attributes of human faces. In this example, there
are 138 training instances and 30 testing instances. The attribute domains are
partitioned into 20 equal triangle fuzzy sets with an overlapping degree of 0.5.
The obtained accuracy is 83.33%. The following FRIL++ codes show the
structures and main properties of the above-mentioned classes of the data
browser, and the main class for running the ellipse and face examples:
((public class DataTable extends (RelationalKB))
(public (parts
/* A data table is associated with an attribute schema of class
AttributeSchema */
(schema AttributeSchema) ))
(private (properties
/* The content of a data table is a list of instances, each of which
corresponds to a row in the table. Each instance is an object of class
Instance */
((instance _rowIndex _instObj)) ))
(public (properties
/* The number of rows of a data table */
((num_row _naturalNumber))
/* To get an instance of a data table */
((get_instance INSTANCE)
. )
/* To display a data table */
((display)
. )
/* Constructor constructs a data table from a data file */
((DataTable DATA_FILE)
. ) )))
((public class RuleBase extends (DeductiveKB))
(public (parts
/* A rule base is associated with an attribute schema of class
AttributeSchema */
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(schema AttributeSchema) ))
(private (properties
/* The content of a rule base is a list of rules. Each rule is an object of
class Rule */
((rule _index _ruleObj)) ))
(public (properties
/* The number of rules in a rule base */
((num_rule _naturalNumber))
/* To get a rule in a rule base */
((get_rule RULE)
. )
/* To display a rule base */
((display)
. ) )))
((public class DataBrowser extends (FlBased))
(public (properties
/* To induce a rule base from a data table, given one output (or
categorizing) attribute and a list of input attributes */
((induce DATA_TABLE (OUT_ATTR | IN_ATTR_LIST)
RULE_BASE)
. ) )))
((public class Tester extends (Universal))
(public (properties
/* To test a rule base on a data table with respect to a given output
attribute */
((test RULE_BASE DATA_TABLE OUT_ATTR)
. ) )))
((public class MainClass extends (Universal))
(public (properties
((main)
/* To create a data browser */
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
evidence to the case where belief and evidence are imprecise, expressed by
subintervals of [0, 1].
2.
The problem of user recognition centers on the temporal aspect of user behavior.
We have some set of known user types {U1,,Un}, the behaviors of which we
know and to which we provide a corresponding set of services. An unknown user
u at time t behaves in the fashion bt, where behavior is commonly the outcome
of some crisp or fuzzy choice, such as whether or not to buy expensive wine. We
wish to determine the similarity of u to each {U1,,Un} in order to provide the
appropriate service to u at time t. We must repeat this process as t increases.
In an object-oriented environment, we construct a hierarchy of n user classes,
{C1,, Cn}, and we try to determine the support St(u Cm) for user u belonging
to user class C m at time t. This support is some function f of the current behavior
bt and the history of behaviors {b1,, bt-1}. This is shown more generally in
Equation 1.
St (u Cm) = f ({b1, ..., bt})
(1)
We can solve this problem at time t if we have the whole behavior series up to
t. Unfortunately, at time t + 1, we will have to do the whole calculation again.
Where t is very large, the storage of the whole behavior series and the cost of
the support calculation may be too expensive. An alternative approach is to view
the support St(u Cm) as some belief in the statement user u belongs to class
Cm; this belief is updated whenever a new behavior is encountered. This belief
updating approach is more economical in space, because the whole behavior
series no longer needs to be stored. In computation, this approach is more
efficient, because we now must calculate some function g of just the previous
St-1(u Cm) and the latest behavior bt. This belief updating approach is shown
more generally in Equation 2.
St (u Cm) = g (St-1(u C m), bt)
(2)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
In this section, we examine the case where belief is represented by a support pair,
which is a subinterval of [0, 1].
COOKIEEATER
CAKEEATER
SWEETFOOD
CANDY
COOKIE
CAKE
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
lots of candy most of the time is true for eight or more cases out of every 10
candy-eaters, then we can assign an interval [0.8, 1] to the conditional probability
Pr(eats | lots of candy). This approach gives us the following FRIL++ class
definition for the class CandyEater:
((public class CandyEater extends (Consumer) )
(public (properties
((eats X)
(X.isa Candy)
(X.quantity lots )) : (0.8 1) )))
Now consider a new food consumer u who makes a decision whether or not to
eat food x and we wish to determine us membership to the classes C ANDYEATER,
COOKIE EATER, and C AKEEATER. The only information we have is the decision that
u made with respect to eating food x. We can determine memberships by
comparing us decision with the decision that would be made by a prototypical
member of each of the classes CANDYE ATER, C OOKIEE ATER, and CAKEEATER,
given food x. Food x may be an uncertain member of any or all of the classes
CANDY , COOKIE, and C AKE. For example, if x is a sweet iced biscuit, then x is
clearly a member of the class COOKIE but may also have nonzero membership to
the class CANDY .
The remainder of this section is concerned with the case where we wish to
update us membership to the classes CANDYEATER, COOKIEEATER, and CAKEEATER,
as u chooses whether or not to eat each item of food in the ordered stream
x 1,, x n .
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
S n+1 =
nS n + s(xn+1 )
n +1
(3)
This approach is notable in its inflexibility with regard to the weight of impact of
new evidence. That is, new evidence always has a weight 1/(n + 1), and current
belief has a weight n/(n + 1). A more flexible generalization that can be used to
give a higher or lower weighting to new evidence is shown in Equation 4:
nl S n + n1-l s (xn+1 )
S n+1 =
nl + n1-l
(4)
Where would typically lie in the interval [0, 1]. If = 1 we have Equation 3,
where current belief is n times as important as new evidence. If = 0, we have
an expression that weights new evidence n times as important as current belief.
This flexibility may be important in cases where we know that users change their
behavior often and must therefore be reclassified quickly.
The advantage of the FILUM approach is its simplicity. It also updates support
where evidence is presented as either a support pair or a point value. Disadvantages include the inflexibility of the model and the large primacy bias.
the negative evidence e- has two differing effects depending on how large the
belief was before e- was presented. Likewise, there are two differing effects
from the same positive evidence e+.
The anchor and adjustment belief revision model by Hogarth and Einhorn (1992)
updates a belief given new evidence through two processes. Equation 5a shows
how belief Sk is updated given new negative evidence. Equation 5b shows how
the same belief Sk is updated given new positive evidence.
Sk = Sk-1 + Sk-1 (s(xk) - R)
for s(xk) R
(5a)
(5b)
e+
et
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
S k= S
S k= S
k-1
+
)(s (xk) - R )
k-1
+ S k-1(s (xk) - R )
k-1
+ (1 - S
k-1
for s- (xk) R-
(6a)
(6b)
for s (xk) R
(6c)
(6d)
)(s (xk) - R )
Note that R- is a reference point for determining if the lower bound of the
presented evidence is positive or negative with respect to the lower bound of
belief, and R+ is the corresponding reference point for the upper bound of belief.
Here, we choose R- = S-k-1 and R+ = S+k-1, where 0 1 and 0 1.
Figure 7 shows the order effects of this interval belief updating model. The
precise effects of negative evidence e- and positive evidence e+ are determined
by and , respectively. The effect of new evidence is dependent on the most
recent belief only, and not on t. This is a known characteristic of the anchor and
adjustment model. This recency behavior contrasts with the primacy bias of the
FILUM approach.
This new interval version of Hogarths and Einhorns belief updating model has
a number of advantages over the FILUM method. Recency characteristics allow
the anchor and adjustment model to reclassify users quickly. The order effects
of this model are related to human behavior, and this seems to be an important
consideration when we are recognizing human users. In addition, this method
allows us to control the effects of positive and negative evidence separately. This
last feature may be especially important in medical user modeling applications,
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
UNCOOPERATIVE
TITFORTAT
RANDOM
RESPD
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Description
Cooperative
Uncooperative
Tit-for-tat
Random
Respd
cooperate, etc). From the past history of each player, and using the techniques
described earlier (with = = 0.3), they are classified into the five behavior
classes. The winning class is taken as the class in which minimum membership
(i.e., the lower bound of the membership interval) is greatest. If the winning class
matches the actual class in Table 2, then the classification is recorded as a
success.
To recreate the situation where user behavior changes, after 60 rounds, the
behaviors of all 10 prisoners are changed, as shown in the third column of
Random
Cooperative
Random
Uncooperative
Cooperative
Tit-for-tat
Cooperative
Respd
Uncooperative
Random
Uncooperative
Respd
Tit-for-tat
Cooperative
Tit-for-tat
Random
Respd
Tit-for-tat
10
Respd
Uncooperative
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
63.6%
57.3%
63.3%
22.2%
Table 2. After this point, the game is continued for 15 rounds. We compare
classification results using the interval anchor and adjustment belief updating
method with the FILUM method described in Martin (2000). The whole process
is repeated five times, and the mean of the results is taken.
As can be seen from Table 3, classification results before the 60th round (the
point of behavior change) are similar between the two methods. After the 60th
round, however, there is a marked difference in the results, with a large fall in
the performance of the FILUM approach. These results show the primacy
effects present in the FILUM method and the recency effects characteristic of
the interval anchor and adjustment approach. We highlight these effects as
important points to consider when implementing user recognition in any specific
user modeling application.
Results from the iterated prisoners dilemma test bed suggest that the recency
bias of the anchor and adjustment approach is more suitable to the problem of
object-oriented user modeling, where the behaviors of users change over time.
Future work in this area will consider the cases where user behavior is
represented by fuzzy sets. For example, a user buys a large number of
inexpensive items. More investigation is also needed in determining ranges for
the values of R- and R+ in the interval anchor and adjustment approach.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
process as well as the final model can be based upon a calculus of linguistic
labels.
The goal of modeling with words is the generation of linguistic models from a
combination of data and background information. Commonly, the background
information can be elicited from domain-specific experts in the form of linguistic
rules. An important feature of modeling with words is that the models generated
must in some way be insightful. By insightful, we mean that some useful
information can be gained by examining the model without having to apply the
model to any classification or prediction problem.
Modeling with words uses simple linguistic variables and sentences to build
models that can be interpreted by all, including those with no technical training.
This is in contrast with many conventional machine-learning paradigms, where
insight into the learned model is restricted by the representation, which is
typically numeric (e.g., x = 0.98), comparative (e.g., n < p), or algebraic
(e.g., z = an + bn2). These representations are comprehensible to those experts
trained to understand them, but are frequently incomprehensible to nonexperts.
We might say that numeric, comparative, and algebraic representations result in
black box models, which require some degree of technical skill to interpret.
With linguistic models, on the other hand, some of the blackness of the model is
cleared and the goal is to produce glass box, or transparent, models.
A typical approach to modeling with words involves modeling individual words
as information granules, as proposed by Zadeh (1996). The modeling of granular
information can also be modulated by studies into computing with perceptions
(Zadeh, 1999). The resulting granules correspond to a vocabulary of words that
can then be used for modeling with words. Unfortunately, the restrictions of
granular computation and computing with perceptions result in a vocabulary that
is also restricted.
As a result of this restricted vocabulary, the models generated are not, in fact,
perfectly transparent. Rather, the models are grey, murky, or foggy in
nature. Clearly, this is less than ideal and may result in some reduction in model
comprehension. Even so, we can say that a murky insight into a model is better
than no insight. In other words, we will accept the restriction imposed by this
restricted vocabulary in order to at least gain some insight into the linguistic
model, and hence, the problem domain.
The restricted vocabulary described above enables us to create simple linguistic
sentences such as the tree is tall. In the real world, however, humans find it
natural to classify real-world concepts into taxonomical hierarchies, or at least
into a set of related ontological specifications. We therefore propose extending
modeling with words with taxonomical (and, hence, ontological) information.
Consider the simple hierarchy of trees in Figure 9. In our extended modeling with
words framework, we can now create slightly more sophisticated linguistic
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
EVERGREEN
Clear Representation
An hierarchical representation of classes reflects our natural taxonomic view of
the real world. Take, for example, the scientific classification of all living
organisms. The top-most superclass is called ORGANISM ; the next level in the
hierarchy defines the domains EUKARYA, EUBACTERIA, and A RCHAEA; the next
level defines the kingdoms (e.g., ANIMALIA); and the next defines phylus (e.g.,
VERTEBRATE); and so on until we reach the species MAN. We apply class
hierarchies to all parts of our lives, even when we do not have specific scientific
knowledge such as in the previous example. For example, we may classify trees
into LARGETREE and SMALLTREE. We may then split LARGETREE into QUITELARGETREE
and VERY LARGE TREE. The important thing to see here is that the linguistic terms
commonly used in computing with words (large, small, very large, etc.) may
also be integral to class descriptions.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
OAK
OAK
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Conclusions
We introduced a logic-based probabilistic and fuzzy object-oriented model in
which each class property is represented by a fuzzy rule weighted by probability
lower and upper bounds. We then proposed probabilistic default reasoning on
fuzzy events as a suitable approach to uncertain property inheritance and class
recognition problems. The intractable steps of general probabilistic default
reasoning are reduced to polynomial time ones, using Jeffreys rule and its
inverse for a weaker notion of consistency and for local inference.
On the formal basis of this model, we designed and implemented FRIL++ as the
object-oriented extension of FRIL, a logic programming language dealing with
both probability and fuzziness. We presented the basic features of FRIL++ with
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
References
Axelrod, R. (1985). The evolution of cooperation. New York: Basic Books.
Baldwin, J. F., & Martin, T. P. (1995). Refining knowledge from uncertain
relations a fuzzy data browser based on fuzzy object-oriented programming in FRIL. In Proceedings of the Fourth IEEE International Conference on Fuzzy Systems (pp. 2734).
Baldwin, J. F., Lawry, J., & Martin, T. P. (1996). A note on probability/possibility
consistency for fuzzy events. In Proceedings of the 6th International
Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, 521-526.
Baldwin, J. F., Lawry, J., & Martin, T. P. (1996). Efficient algorithms for
semantic unification. In Proceedings of the Sixth International Conference on Information Processing and Management of Uncertainty in
Knowledge-Based Systems (pp. 527532).
Baldwin, J. F., Lawry, J., & Martin, T. P. (1998). The application of generalised
fuzzy rules to machine learning and automated knowledge discovery.
International Journal of Uncertainty Fuzziness and Knowledge-Based
Systems, 6, 459487.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Baldwin, J. F., Martin, T. P., & Pilsworth, B. W. (1995). FRIL Fuzzy and
evidential reasoning in artificial intelligence. Hertfordshire, United
Kingdom: Research Studies Press.
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Towards soft
computing object-oriented logic programming. In Proceedings of the
Ninth IEEE International Conference on Fuzzy Systems (pp. 768773).
Blanco, I., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the objectoriented database model: Imprecision, uncertainty & fuzzy types. In
Proceedings of the First International Joint Conference of the International Fuzzy Systems Association and the North American Fuzzy
Information Processing Society (pp. 23232328).
Bordogna, G., Pasi, G., & Lucarella, D. (1999). A fuzzy object-oriented data
model managing vague and uncertain information. International Journal
of Intelligent Systems, 14, 623651.
Cao, T. H. (2001). Uncertain inheritance and recognition as probabilistic default
reasoning. International Journal of Intelligent Systems, 16, 781803.
Cao, T. H., & Creasy, P. N. (2000). Fuzzy types: A framework for handling
uncertainty about types of objects. International Journal of Approximate
Reasoning, 25, 217253.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2002). On the
implementation of FRIL++ for object-oriented logic programming with
uncertainty and fuzziness. In B. Bouchon-Meunier et al. (Eds.), Technologies for constructing intelligent systems, studies in fuzziness and soft
computing (Vol. 90, pp. 393406). Heidelberg: Physica-Verlag.
Cao, T. H., Rossiter, J. M., Martin, T. P., & Baldwin, J. F. (2001). Inheritance
and recognition in uncertain and fuzzy object-oriented models. In Proceedings of the First International Joint Conference of the International
Fuzzy Systems Association and the North American Fuzzy Information
Processing Society (pp. 23172322).
Cross, V. V. (2003). Defining fuzzy relationships in object models: Abstraction
and interpretation. International Journal of Fuzzy Sets and Systems,
140, 527.
De Tr, G. (2001). An algebra for querying a constraint defined fuzzy and
uncertain object-oriented database model. In Proceedings of the First
International Joint Conference of the International Fuzzy Systems
Association and the North American Fuzzy Information Processing
Society (pp. 21382143).
Dubitzky, W., Bchner, A. G., Hughes, J. G., & Bell, D. A. (1999). Towards
concept-oriented databases. Data & Knowledge Engineering, 30, 23
55.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Dubois, D., Fargier, H., & Prade, H. (2000). Multiple-sources information fusion
a practical inconsistency-tolerant approach. In Proceedings of the
Eighth International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems (pp. 1047
1054).
Einhorn, H. J., & Hogarth, R. M. (1985). Ambiguity and uncertainty in probabilistic inference. Psychological Review, 93, 433461.
Eiter, T., Lu, J. J., Lukasiewicz, T., & Subrahmanian, V. S. (2001). Probabilistic
object bases. ACM Transactions on Database Systems, 26, 264312.
Gaines, B. R. (1978). Fuzzy and probability uncertainty logics. Journal of
Information and Control, 38, 154169.
Geffner, H., & Pearl, J. (1992). Conditional entailment: Bridging two approaches
to default reasoning. Artificial Intelligence, 53, 209244.
George, R., Buckles, B. P., & Petry, F. E. (1993). Modelling class hierarchies
in the fuzzy object-oriented data model. International Journal for Fuzzy
Sets and Systems, 60, 259272.
Hogarth, R. M., & Einhorn, H. J. (1992). Order effects in belief updating: The
belief-adjustment model. Cognitive Psychology, 24, 155.
Itzkovich, I., & Hawkes, L. W. (1994). Fuzzy extension of inheritance hierarchies. International Journal for Fuzzy Sets and Systems, 62, 143153.
Jeffrey, R. (1965). The logic of decision. New York: McGraw-Hill.
Kohavi, R., Sommerfield, D., & Dougherty, J. (1996). Data mining using
MLC++: A machine learning library in C++. In Tools with Artificial
Intelligence (pp. 234245). Washington: IEEE Computer Society Press.
Lukasiewicz, T. (2000). Probabilistic default reasoning with conditional constraints. Proceedings of the Eighth International Workshop on NonMonotonic Reasoning, Special Session on Uncertainty Frameworks in
Non-Monotonic Reasoning.
Martin, T. P. (2000). Incremental learning of user models an experimental
testbed. In Proceedings of the Eighth International Conference on
Information Processing and Management of Uncertainty in Knowledge-Based Systems (pp. 14191426).
McCabe, F. G. (1992). Logic and objects. New York: Prentice Hall.
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Moss, C. (1994). Prolog++: The power of object-oriented and logic programming. Reading, MA: Addison-Wesley.
Rossazza, J. -P., Dubois, D., & Prade, H. (1997). A hierarchical model of fuzzy
classes. In R. De Caluwe, Fuzzy and uncertain object-oriented databases: Concepts and models (pp. 2161). Singapore: World Scientific.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2000). A FRIL++
compiler for soft computing object-oriented logic programming. In Proceedings of the Sixth International Conference on Soft Computing (pp.
340345).
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2001a). User
recognition in uncertain object-oriented user modelling. In Proceedings of
the 10th IEEE International Conference on Fuzzy Systems.
Rossiter, J. M., Cao, T. H., Martin, T. P., & Baldwin, J. F. (2001b). Objectoriented modelling with words. In Proceedings of the 10 th IEEE International Conference on Fuzzy Systems, Workshop on Modelling with
Words.
Shastri, L. (1989). Default reasoning in semantic networks: A formalization of
recognition and inheritance. Artificial Intelligence, 39, 283355.
Stroustrup, B. (1997). The C++ programming language (3rd ed.). Reading,
MA: Addison-Wesley.
Van Gyseghem, N., & De Caluwe, R. (1997). The UFO database model: Dealing
with imperfect information. In R. De Caluwe (Ed.), Fuzzy and uncertain
object-oriented databases: Concepts and models (pp. 123185).
Singapore: World Scientific.
Yazici, A., & George, R. (1999). Fuzzy database modelling. Studies in
fuzziness and soft computing (Vol. 26). Heidelberg: Physica-Verlag.
Zadeh, L. A. (1996). Fuzzy logic = computing with words. IEEE Transactions
on Fuzzy Systems, 4, 103111.
Zadeh, L. A. (1999). From computing with numbers to computing with words
from manipulation of measurements to manipulation of perceptions. IEEE
Transactions on Circuits and Systems, 45, 105119.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
SECTION II
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter V
Fuzzy Information
Modeling with the UML
Zongmin Ma
Universit de Sherbrooke, Canada
Abstract
Computer applications in nontraditional areas have put requirements on
conceptual data modeling. Some conceptual data models, being the tool of
design databases, were proposed. However, information in real-world
applications is often vague or ambiguous. Currently, less research has
been done in modeling imprecision and uncertainty in conceptual data
models. The UML (Unified Modeling Language) is a set of object-oriented
modeling notations and is a standard of the Object Data Management
Group (ODMG). It can be applied in many areas of software engineering
and knowledge engineering. Increasingly, the UML is being applied to
data modeling. In this chapter, different levels of fuzziness are introduced
into the class of the UML and the corresponding graphical representations
are given. The class diagrams of the UML can hereby model fuzzy
information.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
154 Ma
Introduction
One of the major areas of research in databases has been the continuous effort
to enrich existing database models with a more extensive collection of semantic
concepts. Databases have gone through the development from hierarchical and
network databases to relational databases. As computer technology moves into
nontraditional applications such as CAD/CAM, knowledge-based systems,
multimedia, and Internet systems, many feel the limitations of relational databases in these data-intensive application systems. Therefore, some nontraditional data models for databases, such as the entity-relationship (ER) data model
(Chen, 1976), the object-oriented data model, and the logic data model, being the
tool of modeling databases, have been proposed.
One of the semantic needs not adequately addressed by traditional models is that
of uncertainty. Traditional models assume the database model to be a correct
reflection of the world being captured and assume that the data stored is known,
accurate, and complete. It is rarely the case in real life that all or most of these
assumptions are met. Different models have been proposed to handle different
categories of data quality (or lack thereof). Five basic kinds of imperfection have
been identified: inconsistency, imprecision, vagueness, uncertainty, and ambiguity (Bosc & Prade, 1993). Inconsistency is a kind of semantic conflict when some
aspect of the real world is irreconcilably represented more than once in a
database or in several different databases. Inconsistency has traditionally been
applied to data. In the context of multidatabases, where multiple sources are
integrated, attention was given to inconsistency at the modeling level. Imprecision and vagueness are two closely related qualities. They both relate to the
context in which the value attributed to an attribute (or the interpretation assigned
to a concept) is known to come from a given interval (or set of values) but we
do not know exactly which one to choose at present. In general, vague
information is represented by linguistic values. Uncertainty refers to those
situations in which we can apportion some, but not all, of our belief to the fact
that an attribute took a given value or a group of values. The random uncertainty,
described using probability theory, is not considered in this chapter. Finally,
ambiguity means that some elements of the model lack complete semantics,
leading to several possible interpretations. Generally, several different kinds of
imperfection coexist with respect to the same piece of information. A large
number of models have been proposed to handle uncertainty and vagueness.
Most of these models are based on the same paradigms. Vagueness and
uncertainty are generally modeled with fuzzy sets and possibility theory (Zadeh,
1965, 1978). Many of the existing approaches dealing with imprecision and
uncertainty are based on the theory of fuzzy sets. Fuzzy information has been
extensively investigated in the context of the relational model (Buckles & Petry,
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1982; Ma, Zhang, & Ma, 1999; Prade & Testemale, 1984; Raju & Majumdar,
1988). Recent efforts have extended these results to object-oriented databases
by introducing the related notions of classes, generalization/specialization, and
inheritance (Bordogna, Pasi, & Lucarella, 1999; Cross, Caluwe, & Vangyseghem,
1997; Cross & Firat, 2000; Dubois, Prade, & Rossazza, 1991; George, Srikanth,
Petry, & Buckles, 1996; Gyseghem & Caluwe, 1998; Lee et al., 1999; Ma,
Zhang, & Ma, 2004; Marn, Vila, & Pons, 2000; Marn et al., 2003). However,
most of this research is focusing on modeling uncertainty at the data level; fewer
results exist when it comes to uncertainty at the conceptual model level. It is
especially true for modeling uncertain information in object-oriented data
models.
The UML (Booch, Rumbaugh, & Jacobson, 1998; OMG, 2001) is a set of objectoriented modeling notations that was standardized by the ODMG. The power of
the UML can be applied to many areas of software engineering and knowledge
engineering (Mili, Shen, et al., 2001). The complete development of relational and
object relational databases from business requirements can be described by the
UML. The database has traditionally been described by notations called entityrelationship (ER) diagrams, using graphic representation that is similar but not
identical to that of the UML. Using the UML for database design has many
advantages over the traditional ER notations (Naiburg, 2000). The UML is based
largely upon the ER notations and includes the ability to capture all information
that is captured in a traditional data model. The additional compartment in the
UML for methods or operations allows you to capture items like triggers,
indexes, and the various types of constraints directly as part of the diagram. By
modeling this, rather than using tagged values to store the information, it is now
visible on the modeling surface, making it more easily communicated to everyone
involved. So, increasingly, the UML is being applied to data modeling (Ambler,
2000a, 2000b; Blaha & Premerlani, 1999; Naiburg, 2000). More recently, the
UML was used to model XML conceptually (Conrad, Scheffiner, & Freytag,
2000).
Note that while the UML reflects some of the best object-oriented modeling
experiences available, it suffers from a lack of some necessary semantics. One
thing lacking can be generalized as the need to handle imprecise and uncertain
information. To our knowledge, the issues on fuzzy UML data model have not
been addressed in the literature, although imprecise and uncertain information
exists in knowledge engineering and database systems and have extensively
been studied. In this chapter, different levels of fuzziness will be introduced into
the class in the UML, and the corresponding graphical representations are given.
The class diagrams of the UML can hereby model fuzzy information. The
contribution of this chapter is that an object-oriented conceptual modeling
methodology is fully developed for fuzzy information modeling.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
156 Ma
The remainder of this chapter is organized as follows. The second section gives
basic knowledge concerning fuzzy set and possibility distribution theories as well
as knowledge of the UML class model. The fuzzy extension to class model in the
UML is presented in the third section. The fourth section discusses related work,
and the last section concludes this chapter.
Basic Knowledge
Fuzzy Set and Possibility Distribution
The concept of fuzzy sets was originally introduced by Zadeh (1965). Let U be
a universe of discourse. A fuzzy value on U can be characterised by a fuzzy set
F in U. A membership function F: U [0,1] is defined for the fuzzy set F, where
F (u), for each u U, denotes the degree of membership of u in the fuzzy set
F. Thus, the fuzzy set F is described as follows:
F = { (u1)/u1, (u2)/u2, ..., (un)/un}
where the pair (ui)/ui represents the value ui and its membership degree (ui).
The membership function F (u) can be interpreted as a measure of the possibility
that the value of variable X is u. A fuzzy set is equivalently represented by its
associated possibility distribution X (Zadeh, 1978):
X = {X (u1)/u1, X (u2)/u2, ..., X (un)/un}
Here, X (ui), ui U, denotes the possibility that ui is true. Let X and F be the
possibility distribution representation and the fuzzy set representation for a fuzzy
value, respectively. It is apparent that X = F is true (Raju & Majumdar, 1988).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Classes
Being the descriptor for a set of objects with similar structure, behavior, and
relationships, a class represents a concept within the system being modeled.
Classes have data structure and behavior and relationships to other elements. A
class is drawn as a solid-outline rectangle with three compartments separated by
horizontal lines. The top name compartment holds the class name and other
general properties of the class (including stereotype); the middle list compartment holds a list of attributes; the bottom list compartment holds a list of
operations. Either or both of the attribute and operation compartments may be
suppressed. A separator line is not drawn for a missing compartment. If a
compartment is suppressed, no inference can be drawn about the presence or
absence of elements in it. Figure 1 shows a class.
Relationships
Another main structural component in the class diagram of the UML is
relationships for the representation of relationship between classes or class
instances. UML supports a variety of relationships:
1.
Engine
Interior
Chassis
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
158 Ma
Car
Truck
3.
4.
Car
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Employee
Fuzzy Class
Objects with the same properties are gathered into classes that are organized into
hierarchies. Theoretically, a class can be considered from two different viewpoints:
1.
An extensional class, where the class is defined by the list of its object
instances
2.
Therefore, a class is fuzzy because of the following several reasons. First, some
objects are fuzzy ones, which have similar properties. A class defined by these
objects may be fuzzy. These objects belong to the class with membership degree
of [0, 1]. Second, when a class is intensionally defined, the domain of an attribute
may be fuzzy, and a fuzzy class is formed. Third, the subclass produced by a
fuzzy class by means of specialization and the superclass produced by some
classes (in which there is at least one class that is fuzzy) by means of
generalization are also fuzzy.
Following on the footsteps of Zvieli and Chen (1986), we define three levels of
fuzziness. In the context of classes, the three levels of fuzziness are defined as
follows:
1.
Fuzziness in the extent to which the class belongs in the data model as well
as fuzziness on the content (in terms of attributes) of the class
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
160 Ma
3.
In order to model the first level of fuzziness, i.e., an attribute or a class with
degree of membership, the attribute or class name should be followed by a pair
of words WITH mem DEGREE, where 0 mem 1 and it is used to indicate the
degree to which the attribute belongs to the class or the class belongs to the data
model (Gyseghem & Caluwe, 1998; Marn, Vila, & Pons, 2000). For example,
Employee WITH 0.6 DEGREE and Office Number WITH 0.8DEGREE are
class and attribute with the first level of fuzziness, respectively. Generally, an
attribute or a class will not be declared when its degree is 0. In addition, WITH
1.0 DEGREE can be omitted when the degree of an attribute or a class is 1. It
should be noted that attribute values might be fuzzy. In order to model the third
level of fuzziness, a keyword FUZZY is introduced and is placed in front of the
attribute. In the second level of fuzziness, we must indicate the degree of
membership to which an instance of the class belongs to the class. For this
purpose, an additional attribute is introduced into the class to represent instance
membership degree to the class, with an attribute domain that is [0, 1]. We denote
such special attribute with . In order to differentiate the class with the second
level of fuzziness, we use a dashed-outline rectangle to denote such class.
Figure 6 shows a fuzzy class Ph.D. student. Here, attribute Age may take fuzzy
values, namely, its domain is fuzzy. Ph.D. students may or may not have their
offices. It is not known for sure if class Ph.D. student has attribute Office. But
we know Ph.D. students may have their offices with high possibility, say 0.8. So
attribute Office uncertainly belongs to the class Ph.D. students. This class has
the fuzziness at the first level and we use with 0.8 membership degree to
describe the fuzziness in the class definition. In addition, we may not determine
if an object is the instance of the class because the class is fuzzy. So an additional
attribute is introduced into the class for this purpose.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Fuzzy Generalization
The concept of subclassing is one of the basic building blocks of the object model.
A new class, called subclass, is produced from another class, called superclass,
by means of inheriting some attributes and methods of the superclass, overriding
some attributes and methods of the superclass, and defining some new attributes
and methods. Because a subclass is the specialization of the superclass, any one
object belonging to the subclass must belong to the superclass. This characteristic can be used to determine if two classes have a subclass-superclass
relationship.
However, classes may be fuzzy. A class produced from a fuzzy class must be
fuzzy. If the former is still called subclass and the later superclass, the subclasssuperclass relationship is fuzzy. In other words, a class is a subclass of another
class with membership degree of [0, 1] at this moment. Correspondingly, we
have the following method for determining a subclass-superclass relationship:
1.
For any (fuzzy) object, if the membership degree that it belongs to the
subclass is less than or equal to the membership degree, then it belongs to
the superclass.
2.
The subclass is then a subclass of the superclass with the membership degree,
which is the minimum in the membership degree to which these objects belong
to the subclass.
Formally, let A and B be (fuzzy) classes and be a given threshold. We say B
is a subclass of A if
( e) ( B (e) A (e))
The membership degree that B is a subclass of A should be minB (e) (B (e)).
Here, e is the object instance of A and B in the universe of discourse, and A (e)
and B (e) are membership degrees of e to A and B, respectively.
It should be noted that, however, in the above-mentioned fuzzy generalization
relationship, we assume that classes A and B can only have the second level of
fuzziness. It is possible that classes A and B are the classes with membership
degree, namely, with the first level of fuzziness. Assume that we have two
classes A and B as follows:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
162 Ma
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
164 Ma
Young Student
Young Faculty
classes with membership degree. These two classes can be generalized into
class Youth, a class with the second level of fuzziness.
Fuzzy Aggregation
An aggregation captures a whole-part relationship between an aggregate and a
constituent part. These constituent parts can exist independently. Therefore,
every instance of an aggregate can be projected into a set of instances of
constituent parts. Let A be an aggregation of constituent parts B1, B2, , and Bn.
For e A, the projection of e to Bi is denoted by eBi. Then we have (eB1)
B1, (eB2) B2, , (eBn) Bn.
A class aggregated from fuzzy constituent parts must be fuzzy. If the former is
still called aggregate, the aggregation is fuzzy. At this point, a class is an
aggregation of constituent parts with membership degree of [0, 1]. Correspondingly, we have the following method for determining a fuzzy aggregation
relationship:
1.
For any (fuzzy) object, if the membership degree to which it belongs to the
aggregate is less than or equal to the membership degree to which its
projection to each constituent part belongs to the corresponding constituent
part.
2.
The aggregate is then an aggregation of the constituent parts with the membership degree, which is the minimum in the membership degrees to which the
projections of these objects to these constituent parts belong to the corresponding
constituent parts.
Let A be a fuzzy aggregation of fuzzy class sets B1, B2, , and Bn, with instance
membership degrees that are A, B1, B2, ..., and Bn, respectively. Let be a given
threshold. Then,
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
166 Ma
Fuzzy Association
Two levels of fuzziness can be identified in the association relationship. The first
level of fuzziness means that an association relationship fuzzily exists in two
associated classes, namely, this association relationship occurs with a degree of
Figure 8. A fuzzy aggregation relationship
Old Car
Old Engine
Interior
Chassis
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Car
(a)
installing
CD Player
Car
(b)
Car
(c)
possibility. Also, it is possible that it is unknown for certain if two class instances
respectively belonging to the associated classes have the given association
relationship, although this association relationship must occur in these two
classes. This is the second level of fuzziness in the association relationship and
is caused because an instance belongs to a given class with membership degree.
It is possible that the two levels of fuzziness mentioned above may occur in an
association relationship simultaneously. That means that two classes have a
fuzzy association relationship at a class level on one hand. On the other hand, the
class instances of these two classes may have a fuzzy association relationship
at the class instance level.
We can place a pair of words WITH mem DEGREE (0 mem 1) after the role
name of an association relationship to represent the first level of fuzziness in the
association relationship. We use a double line with an arrowhead to denote the
second level of fuzziness in the association relationship. Figure 9 shows two
levels of fuzziness in fuzzy association relationships. In part (a), it is uncertain
if the CD player is installed in the car, and the possibility is 0.8. Classes CD
Player and Car have the association relationship installing with an 0.8
membership degree. In part (b), it is certain that the CD player is installed in the
car, and the possibility is 1.0. Classes CD Player and Car have an association
relationship installing with 1.0 membership degree. But at the level of instances,
there exists the possibility that the instances of classes CD Player and Car may
or may not have the association relationship installing. In part (c), two kinds of
fuzzy association relationships in parts (a) and (b) arise simultaneously.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
168 Ma
It has been shown above that three levels of fuzziness can occur in classes. The
classes with the second level of fuzziness generally result in the second level of
fuzziness in the association, if this association definitely exists (that means there
is no first level of fuzziness in the association). Let A and B be two classes with
the second level of fuzziness. Then, the instance e of A is one with membership
degrees A (e), and the instance f of B is one with membership degrees B (f).
Assume that the association relationship between A and B, denoted ass (A, B),
is one without the first level of fuzziness. It is clear that the association
relationship between e and f, denoted ass (e, f), is one with the second level of
fuzziness, i.e., with membership degree, which can be calculated by the
following:
(ass (e, f)) = min (A (e), B (f))
The first level of fuzziness in the association relationship can be indicated
explicitly by the designers, even if the corresponding classes are crisp. Assume
that A and B are two crisp classes and ass (A, B) is the association relationship
with the first level of fuzziness, denoted ass (A, B) WITH degree_ass DEGREE.
At this moment, A (e) = 1.0 and B (f) = 1.0. Then,
(ass (e, f)) = degree_ass
The classes with the first level of fuzziness generally result in the first level of
fuzziness of the association, if this association is not indicated explicitly. Let A
and B be two classes only with the first level of fuzziness, denoted A WITH
degree_A DEGREE and B WITH degree_B DEGREE, respectively. Then the
association relationship between A and B, denoted ass (A, B), is one with the first
level of fuzziness, namely, ass (A, B) WITH degree_ass DEGREE. Here
degree_ass is calculated by the following:
degree_ass = min (degree_A, degree_B)
For the instance e of A and the instance f of B, in which A (e) = 1.0 and B (f)
= 1.0, we have:
(ass (e, f)) = degree_ass = min (degree_A, degree_B)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Finally, let us focus on a situation in which the classes are the first level and the
second level of fuzziness, and there is an association relationship with the first
level of fuzziness between these two classes, which is explicitly indicated. Let
A and B be two classes with the first level of fuzziness, denoted A WITH
degree_A DEGREE and B WITH degree_B DEGREE, respectively. Let ass
(A, B) be the association relationship with the first level of fuzziness between A
and B, which is explicitly indicated with WITH degree_ass DEGREE. Also, let
the instance e of A be with membership degrees A (e), and the instance f of B
be with membership degrees B (f). Then we have:
(ass (e, f)) = min (A (e), B (f), degree_A, degree_B, degree_a)
Fuzzy Dependency
Let us now focus on the fuzzy dependency relationship between the source class
and the target class. The dependency relationship is only related to the classes
and does not require a set of instances for its meaning. Therefore, the secondlevel fuzziness and the third-level fuzziness in class do not affect the dependency
relationship.
Fuzzy dependency relationship is a dependency relationship with a degree of
possibility. Just like the fuzzy association relationship above, the fuzzy dependency relationship can be indicated explicitly by the designers or be implied
implicitly by the source class based on the fact that the target class is decided
by the source class. Assume that the source class is fuzzy, with the first level of
fuzziness. The target class must be fuzzy, with the first level of fuzziness. The
degrees of possibility that the target class is decided by the source class are the
same as the membership degrees of source classes. For source class Employee
WITH 0.85 DEGREE, for example, the target class Employee Dependent
should be Employee Dependent WITH 0.85 DEGREE. The dependency
relationship between Employee and Employee Dependent should be fuzzy, with
an 0.85 degree of possibility. Notice that, not being like the fuzzy association
relationship, only one level of fuzziness can be identified in a dependency
relationship, which is implied by the first level of fuzziness of the source class if
it is not given explicitly.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
170 Ma
Dependent
Car
using
Middle Employee
Old Employee
Old Car
Young Employee
New Car
liking WITH 0.9 DEGREE
Engine
Interior
ID
Turbo
FUZZY Size
ID
Dashboard
Seat
Chassis
ID
An Illustrative Example
In Figure 11, we give a simple fuzzy UML data model utilizing some notations
introduced in this chapter. Class Car is a superclass, and New Car and Old Car
are its two fuzzy subclasses, namely, they may have fuzzy instances. Similarly,
class Employee has three fuzzy subclasses: Young Employee, Middle Employee, and Old Employee. Classes Employee and Car have a fuzzy association
relationship using, which has a fuzziness at the second level. Again, fuzzy
classes Young Employee and New Car have a fuzzy association relationship
like, which has fuzziness at the first level. In addition, class Car is aggregated
by three classes: Engine, Chassis, and Interior. Class Engine has three
attributes. The attributes Id and turbo have crisp values, whereas size is a fuzzy
attribute that can take a fuzzy value. Classes Chassis and Interior are crisp
classes, and they have no fuzziness at the three levels.
Related Work
By using fuzzy set theory, Zvieli and Chen (1986) introduced three levels of
fuzziness in the ER model, corresponding to three levels of database abstract:
schema (metadata), instance (data), and value (data element). At the first level,
entity sets, relationships, and attribute sets may be fuzzy they have an
associated membership degree in the model. The second level is related to the
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
172 Ma
Conclusions
We present a fuzzy extended UML to cope with fuzzy as well as complex objects
in the real world at a conceptual level. Different levels of fuzziness are
introduced into the class diagram of the UML, and the corresponding graphical
representations are developed. It is not difficult to see that the classical UML is
essentially a subset of the fuzzy UML. When there is not any fuzziness in the
universe of discourse, the fuzzy UML can be reduced to the classical UML.
The focus of this chapter is on fuzzy data modeling in the UML. As we know,
the UML can be used for knowledge modeling, and knowledge may generally be
imprecise and uncertain. In future work, we will concentrate on the study of class
operations, constraints, and rules in the fuzzy UML modeling. In addition,
mapping the fuzzy UML data model into object-oriented databases will be
interesting.
References
Abiteboul, S., & Hull, R. (1987). IFO: A formal semantic database model. ACM
Transactions on Database Systems, 12(4), 525565.
Ambler, S. W. (2000a). The design of a robust persistence layer for relational
databases. Retrieved from the World Wide Web: http://www.ambysoft.com/
persistenceLayer.pdf
Ambler, S. W. (2000b). Mapping objects to relational databases. Retrieved from
the World Wide Web: http://www.AmbySoft.com/mappingObjects.pdf
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Toward soft
computing object-oriented logic programming. In Proceedings of the
Ninth IEEE International Conference on Fuzzy Systems (pp. 768773).
Blaha, M., & Premerlani, W. (1999). Using UML to design database applications. Retrieved from the World Wide Web: http://www.therationaledge.com/
rosearchitect/mag/archives/9904/f8.html
Booch, G., Rumbaugh, J., & Jacobson, I. (1998). The Unified Modeling
Language user guide. Reading, MA: Addison-Wesley.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Bordogna, G., Pasi, G., & Lucarella, D. (1999). A fuzzy object-oriented data
model for managing vague and uncertain information. International
Journal of Intelligent Systems, 14, 623651.
Bosc, P., & Prade, H. (1993). An introduction to fuzzy set and possibility theory
based approaches to the treatment of uncertainty and imprecision in
database management systems. In Proceedings of the Second Workshop
on Uncertainty Management in Information Systems: From Needs to
Solutions.
Buckles, B. P., & Petry, F. E. (1982). A fuzzy representation of data for
relational database. Fuzzy Sets and Systems, 7(3), 213226.
Cao, T. H. (2001). Uncertain inheritance and recognition as probabilistic default
reasoning. International Journal of Intelligent Systems, 16, 781803.
Chaudhry, N. A., Moyne, J. R., & Rundensteiner, E. A. (1999). An extended
database design methodology for uncertain data management. Information Sciences, 121(12), 83112.
Chen, G. Q., & Kerre, E. E. (1998). Extending ER/EER concepts towards fuzzy
conceptual data modeling. In Proceedings of the 1998 IEEE International Conference on Fuzzy Systems, 2, 13201325.
Chen, P. P. (1976). The entity-relationship model: Toward a unified view of data.
ACM Transactions on Database Systems, 1(1), 936.
Conrad, R., Scheffiner, D., & Freytag, J. C. (2000). XML conceptual modeling
using UML. In Proceedings of the 19 th International Conference on
Conceptual Modeling (pp. 558571).
Cross, V., & Firat, A. (2000). Fuzzy objects for geographical information
systems. Fuzzy Sets and Systems, 113, 1936.
Cross, V., Caluwe, R., & Vangyseghem, N. (1997). A perspective from the
Fuzzy Object Data Management Group (FODMG). In Proceedings of the
1997 IEEE International Conference on Fuzzy Systems, 2, 721728.
Dubois, D., Prade, H., & Rossazza, J. P. (1991). Vagueness, typicality, and
uncertainty in class hierarchies. International Journal of Intelligent
Systems, 6, 167183.
George, R., Srikanth, R., Petry, F. E., & Buckles, B. P. (1996). Uncertainty
management issues in the object-oriented data model. IEEE Transactions
on Fuzzy Systems, 4(2), 179192.
Gyseghem, N. V., & Caluwe, R. D. (1998). Imprecision and uncertainty in UFO
database model. Journal of the American Society for Information
Science, 49(3), 236252.
Lee, J., Xue, N. L., Hsu, K. H., & Yang, S. J. (1999). Modeling imprecise
requirements with fuzzy objects. Information Sciences, 118, 101119.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
174 Ma
Liu, W. Y., & Song, N. (2001). The fuzzy association degree in semantic data
models. Fuzzy Sets and Systems, 117(2), 203208.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (1999). Assessment of data redundancy
in fuzzy relational databases based on semantic inclusion degree. Information Processing Letters, 72(12), 2529.
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2004). Extending object-oriented
databases for fuzzy information modeling. Information Systems, 29(5),
421435.
Ma, Z. M., Zhang, W. J., Ma, W. Y., & Chen, G. Q. (2001). Conceptual design
of fuzzy object-oriented databases using extended entity-relationship model.
International Journal of Intelligent Systems, 16, 697711.
Marn, N., Medina, J. M., Pons, O., Snchez, D., & Vila, M. A. (2003). Complex
object comparison in a fuzzy context. Information and Software Technology, 45(7), 431444.
Marn, N., Vila, M. A., & Pons, O. (2000). Fuzzy types: A new concept of type
for managing vague structures. International Journal of Intelligent
Systems, 15, 10611085.
Mili, F., Shen, W., Martinez, I., Noel, Ph., Ram, M., & Zouras, E. (2001).
Knowledge modeling for design decisions. Artificial Intelligence in
Engineering, 15, 153164.
Naiburg, E. (2000). Database modeling and design using Rational Rose 2000.
Retrieved from the World Wide Web: http://www.therationaledge.com/
rosearchitect/mag/current/spring00/f5.html
OMG. (2001). Unified Modeling Language (UML), version 1.4. Retrieved from
the World Wide Web: http://www.omg.org/technology/documents/formal/
uml.htm
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for
the treatment of incomplete or uncertain information and vague queries.
Information Sciences, 34, 115143.
Raju, K. V. S. V. N., & Majumdar, K. (1988). Fuzzy functional dependencies
and lossless join decomposition of fuzzy relational database systems. ACM
Transactions on Database Systems, 13(2), 129166.
Vila, M. A., Cubero, J. C., Medina, J. M., & Pons, O. (1996). A conceptual
approach for deal with imprecision and uncertainty in object-based data
models. International Journal of Intelligent Systems, 11, 791806.
Yazici, A., Buckles, B. P., & Petry, F. E. (1999). Handling complex and
uncertain information in the ExIFO and NF2 data models. IEEE Transactions on Fuzzy Systems, 7(6), 659676.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
176 Ma
SECTION III
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter VI
A Framework to Build
Fuzzy Object-Oriented
Capabilities Over
an Existing
Database System
Fernando Berzal
University of Granada, Spain
Nicols Marn
University of Granada, Spain
Olga Pons
University of Granada, Spain
M. Amparo Vila
University of Granada, Spain
Abstract
Fuzzy object-oriented database models allow the representation, storage,
and retrieval of complex imperfect information according to the objectoriented data paradigm. This chapter describes both a framework and an
architecture that can be used to develop fuzzy object-oriented capabilities
using the conventional features of the object-oriented data paradigm. We
present a framework composed of a set of classical classes, which gives
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
In the last decade, an important group of database researchers focused its
studies on the adaptation of existing data models to imperfect information
management, most using the Fuzzy Subset Theory, which has proven to be a good
tool for handling this kind of information. At the same time, the object-oriented
data paradigm increased in popularity among programmers and designers, mainly
due to its powerful modeling capabilities.
Most of the commercial database management systems that allow the manipulation of objects belong to the following two categories:
1.
2.
On the one hand, object-oriented databases are designed to easily work with
object-oriented programming languages such as Java, C#, and C++. OODBMSs
use the same model as object-oriented programming languages. In spite of the
difficulties and complexity involved by this approach, some commercial products
can be found (like O2, ObjectStore, Objectivity, and Versant), although they
represent only a small part of the market.
On the other hand, ORDBMSs span object and relational technology. Many of
the traditional relational products now incorporate the object-relational framework (like Oracle and Postgres).
Nowadays, most of the development efforts in the software world use the objectoriented data paradigm to represent and manipulate their data. When these
applications are related to soft computing, then fuzzy modeling and representation capabilities are required.
In the world of databases, this fact has motivated the study and development of
fuzzy object-oriented database modeling tools. They arise from the combination
of object-oriented and fuzzy concepts in order to permit the representation of
complex imperfect information (Kuo et al., 2001; Caluwe, 1997).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Background
The beginning of the study of fuzziness in object-oriented models is in close
relation with advanced semantic data models (Ruspini, 1986; Zivieli et al., 1986;
Vanderberghe et al., 1991). After these initial steps, many relevant works can
be found in the literature:
1.
2.
3.
4.
N. Van Gyseghem et al. (1998) developed the UFO model, one of the most
complete proposals that can be found in the literature.
Other relevant works in this area can be consulted (Na et al., 1996, 1996b;
Baldwin et al., 2000, 2000b; Cao, 2001). Some define complex algebraic models,
while others are focused on the logic world. Even the entity/relationship model
is being studied as a design tool for object-oriented databases (Ma et al., 2001).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
to-use transparent mechanism to develop applications dealing with fuzzy information. Following our proposal, programmers and designers should be able to
directly use new structures developed to store fuzzy information without the
need for any special treatment, without altering the underlying programming
platform, and with the most possible transparency. Our proposal allows the
programmer to handle data imperfection in an important set of the situations, where it can appear in an object-oriented software development
effort.
This proposal resulted in the implementation of a framework that can be used in
two ways:
1.
Programmers and designers can directly use our proposal over an existing
conventional database system.
2.
The proposal could be the basis for the development of a fuzzy objectoriented database system built over an existing conventional database
system.
As we will see in this chapter, the underlying system must include some objectoriented capabilities among its characteristics. In fact, though existing OODBMSs
are a good choice, some advanced ORDBMSs could also be used, like the last
versions of Oracle RDBMS.
This chapter is devoted to the explanation of our proposal and is organized as
follows, in sections: Fuzziness and Object-Orientation describes the main
features of our proposal for dealing with fuzziness in an object-oriented context;
in A Supporting Framework section, we explain how to deal with fuzziness
using classical object-oriented concepts as the basis of the discussion; A
FOODBS Architecture presents an architecture that can be used to develop a
system able to store fuzzy information in a classical object-oriented system using
the framework described in previous parts of the chapter. Some concluding
remarks and future work trends end the chapter.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Both sentences are expressed with some lack of precision. We use linguistic
labels to express the imprecise values in each of these sentences, but each label
matches a different semantic pattern. An underlying basic domain exists below
the label used in the first sentence (positive real numbers). In contrast, it is not
easy to find such an underlying domain for the label high of the last sentence.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
High
High
Regular
Low
0.8
0.8
Regular
Low
compare values of the domain, instead of the classical equality. (For example,
Table 1 contains the definition of a resemblance relation for quality labels.)
1
0
S ( x, y ) =
l ( z )
sup zB ( x ( z ) y ( z ))
( x = y ) ( x, y B )
( x y ) ( x, y B )
(( x = l L) ( y = z B))
(( y = l L) ( x = z B))
otherwise
(1)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Fuzzy Collections
We now know how to deal with disjunctive fuzzy sets of values. However, we
may have to use fuzzy collections of values in order to express some information
about the object we want to represent in the system. These collections of values
have a conjunctive interpretation and, thus, need special treatment. For example,
consider that we want to represent the set comprised of students who attend their
lessons in a given room. We can relate each student with a room, taking into
account the amount of daytime he or she spends in this room attending his or her
lessons. According to this, the set of student of a room may be expressed as
follows:
(st 1)/st1 + (st2)/st 2 + . + (st n)/stn
where st i is a student, and (st i) is the degree with which the student belongs to
the room.
If we want to represent this kind of fuzzy value in our system, we also need
suitable operators to compare the fuzzy values, taking into account that, now, the
semantics of the fuzzy set are conjunctive.
Conjunctive fuzzy set comparison is often done by means of the concept of
inclusion:
A = B if and only if (A B) (B A)
To compute the inclusion degree of a fuzzy set A in a fuzzy set B, we can use the
following operator (Rossaza et al., 1998):
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
yU
where
A, B , S ( x, y ) = ( I ( A ( x), B ( y )), S ( x, y ))
2.
S , ( A, B ) = ( S ( B | A), S ( A | B))
1,
if A = B =
F( A, B) = min(| A |,| B |)
, otherwise
max(| A |,| B |)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Room2
=?
Quality: regular
Extension: big
Floor: high
Students: 1/stu1+1/stu5+
0.75/stu3+0.6/stu6
Student1
Student2
Student3
(1)Name: John
(0.75)Age: young
(0.75)Height: 1.85m
Name: Peter
Age: young
Height: 1.7m
Name: Mary
Age: middle-aged
Height: short
Student4
Student5
Student6
Name: Tom
Age: 24
Height: tall
Name: Peter
Age: 25
Height: medium
Name: Tom
Age: young
Height: 1.9m
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
3.
The set of students is fuzzy, taking into account the percentage of time each
student spends receiving the lessons in each room to compute the membership degrees.
We need to use recursion in order to deal with complex objects (objects with
attributes values that are also objects). That is, to compute resemblance
between rooms, we have to compare students. During this process, we
have to deal with the possible presence of cycles in the data graph (i.e., it
is possible that we have to compute the resemblance of objects o1 and o2
in order to compute the resemblance between objects o1 and o2).
2.
Taking into account the ideas presented in Marn et al. (2003), we can define the
calculus of the resemblance between two objects o1 and o2 of a given class C,
with a type that is made up by the set of attributes Str C = {a1, a2, ..., an}, by means
of a function FE:
FE: FC O(FC) O(FC) P(P2 (O(FC))) P(P2 (O(FC))) [0,1]
where FC is the family of all the classes, and O(FC) is the set of all the class
instances. P stands for the power set, and P2 represents those members of the
power set whose cardinality is 2 (i.e., pairs of objects in our context). The
calculus of FE(C, o1, o2, visited, aprox) involves the recursive computation
described below.
There are two basic cases:
1.
If o1 = o2, then FE = 1
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The first time that the couple {o1,o2} produces a cycle, which is detected
because {o1,o2} is already in visited, then the couple is inserted into approx
in order to compute an approximation that focuses only on nonproblematic
attributes (those that do not lead to cycles).
FE = FE(C,o 1,o 2, , approx {{o 1,o 2}})
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
If the couple of objects are in approx (i.e., its resemblance is currently being
approximated), then we do not calculate a resemblance value, and the
function FE is undefined.
Attribute values
2.
3.
Class extents
4.
Inheritance relationships
5.
Let us describe how fuzziness can be added in these levels in order to improve
the modeling capability of the object-oriented model.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
We can use different scales to express the explicit uncertainty that affects an
attribute value: we can use probability (w.r.t. possibility) measures defined
within the [0,1]-interval, linguistic labels of probability (w.r.t. possibility) with
semantic representation that is a disjunctive fuzzy set, certainty measures,
evidences, etc. Though we can express imprecision and explicit uncertainty, to
deal with them we have to take into account that they are convertible (Gonzalez
et al., 1999).
Attributes values, which involve a functional approach: Using this alternative, we relate a class with the classes that are its attribute domains.
2.
2.
We may want to consider that the connection among the objects admits
degrees of importance. This situation arises when not all the relationship
instances have the same strength. In these situations, as suggested by
Bordogna et al. (1994), we can use numerical or linguistic values to express
this strength.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Notice that the strength has a semantic interpretation different from the one
given to uncertainty. We know that the objects are related, but we consider
different strengths in their connection. Moreover, both semantic nuances can be
used at the same time, if required.
2.
The answers to these questions are not trivial and could lead us to situations in
which we are not sure about an object of a given class. In such situations, the
gradual membership is substituted by an uncertain membership.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
So far, in the object-oriented model, every instance of a class could reference any
of the attributes of the class (instance variables). However, with our new kind
of type, an instance of a given class may not incorporate certain attributes
depending on the a-cut of the class structure with which it was created.
Each one of the methods defined in a class must have an associated precision
level (as is the case with the attributes or instance variables) that indicates the
minimum precision that an instance must have to incorporate a method in its
behavior. This level of precision depends on attributes and other methods
referenced in the code of the method.
The change proposed in the concept of type involves modifications to the idea of
instantiation. In order to create a new object of a given class, we must be able
to choose the a-cut of properties of the type that will be used to represent it. To
do that, the model has a generic method new() (with (0; 1]), called fuzzy
constructor. The receptor of this method can be any class C, while the parameter
is the level a of the structure of this class C needed to represent the new object.
The effect of sending the message new() to a class C with structural
component S and behavior component B, consists of creating an object incorporating the set S of attributes. The set B of methods defines the behavior of this
object.
The inheritance mechanism H must enable part of the class structure and
behavior to be inherited by its subclasses. As we have done with the instantiation
mechanism, we add a threshold to indicate what proportion of the properties we
want to be inherited. Two different forms of inheritance can be considered:
1.
2.
A Supporting Framework
As we mentioned in the introduction, our research is mainly motivated by the
need for an easy-to-use transparent mechanism to develop applications dealing
with fuzzy information. Following our proposal, programmers and designers
should be able to directly use new structures developed to store fuzzy information
without the need for any special treatment, without altering the underlying
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
DomainWithoutRepresentation
DisjunctiveDomain
ConjunctiveDomain
fuzzyEquals(fuzzyObject)
fuzzyEquals(fuzzyObject)
fuzzyEquals(fuzzyObject)
DisjunctiveFiniteObject
ConjunctiveFiniteObject
fuzzyEquals(fuzzyObject)
fuzzyEquals(fuzzyObject)
TrapezoidalObject
fuzzyEquals(fuzzyObject)
BasicObject
fuzzyEquals(fuzzyObject)
To enhance the way this framework can be used when writing a soft computing
application using one of the foremost programming platforms, consider the
following java code for the example of rooms and students (Figure 2).
1.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
this.students = students;
}
// Field importance
public static float fieldImportance (String ieldname) {
String fields[] = new String[] { "quality", "extension",
"floor", "students" };
float importance[] = new float[] { 0.5f, 0.8f, 1.0f, 1.0f };
for (int i=0; i<fields.length; i++)
if (fields[i].equals(ieldname))
return importance[i];
return 1.0f;
}
}
3.
4.
Finally, the set of students is a fuzzy collection of students, where the fuzzy
collection StudentCollection inherits from ConjunctiveFiniteObject, and
students are similarly defined as a classroom is described.
The following code shows the creation of both rooms, once the classes
mentioned above are defined:
// Label definitions for students
Age young = new Age (new Label("young"), 0, 0, 23, 33 );
Age middle = new Age (new Label("middle-aged"), 23, 33, 44, 48 );
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
To represent new fuzzy extensions using classical object-oriented structures as the basis
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
As we saw in the previous section, we can consider the following list of fuzzy
extensions of classical object-oriented concepts:
1.
2.
3.
4.
5.
All of these characteristics can be directly translated into classical objectoriented structures. Table 2 summarizes how to deal with these fuzzy extensions
according to our proposal (Blanco et al., 2001).
A FOODBS Architecture
In this section, we present an architecture that can be used to develop a system
able to store fuzzy information in a classical object-oriented system using the
model described in the previous parts of the chapter. According to the principle
that guided our approach, all the proposed extensions of the object-oriented data
model are built by means of structures that can be directly translated into a set
of standard classes. This feature allows us to decrease the development effort
needed to implement a fuzzy object-oriented database system with the capabilities we propose.
Let us briefly examine our development strategy. Figure 4 depicts a simplification of the ANSI/SPARC standard database architecture with little modification.
External views are organized in such a way that the user can transparently
manage data imperfection. This is the fuzzy view of the system. At the same
time, the conceptual schema is divided into two different layers: the upper layer
contains fuzzy schemata definitions, while the lower layer holds the corresponding classical object-oriented representation needed to support these fuzzy
schemata. The internal schema is that of the classical database system being
used as the basis for the fuzzy database system.
The strategy discussed leads us to an architecture organized into three levels
(see Figure 5):
1.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Classical Implementation
Explicit uncertainty
in attribute values
Semantically
enhanced
relationships among
objects
We only have to add an extra attribute to the class that we want to extend in
a fuzzy way. The domain of this attribute could be:
- The interval [0,1]
- A set of linguistic labels that express membership, defined over the
aforementioned interval
Fuzzy inheritance
connections
Some important models (Rossaza et al., 1998; George et al., 1993) consider
that the superclasssubclass relationship can admit the use of degrees,
founded on the idea of inclusion or matching between typical subclass
attribute values and typical superclass attribute values. This characteristic
can be represented in a classical object-oriented model by means of the use
of static variables that express these connection degrees using suitable
scales.
Fuzzy-type
definitions
This new way of considering the type definition can be easily modeled over
a traditional object-oriented model, using the concept of 1-ramified
hierarchy of classes (Marn et al., 2001).
A 1-ramified hierarchy of classes is defined as a series of classes C1, ..., Ci,Ci, Ci+1, ..., Cn verifying the following properties:
- For any i 1..n - 1, Sub{Ci} = {Ci+1} (Sub{Ci} stands for the set of
subclasses of Ci).
- For any i 2..n, Sup{Ci} = {Ci-1} (Sup{Ci} stands for the set of
superclasses of Ci).
- A finite sequence of values i exists, associated with the hierarchy, such
that 1 = 1, n > 0, and i > i+1.
Each class of the hierarchy is used to represent an -cut of the type being
defined.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
The conceptual fuzziness handler will augment the classical system capabilities to allow imperfect data manipulation.
3.
The interface will communicate with the previous level, hiding the underlying complexity, and will allow users to develop their fuzzy object-oriented
databases.
A metadata catalog will store the fuzzy schemata defined by the user.
2.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
A Prototype
In order to experiment with the mentioned architecture and framework, a
prototype was developed. FoodBi is a graphical system that allows the creation
and management of fuzzy object-oriented schemata. By means of this interface,
the user can build a hierarchy of classes with fuzzy types, using, at the same time,
suitable attribute domains for imperfection handling.
This prototype uses Java as the target object-oriented language and Oracle 9i (an
advanced object-relational DBMSs) as the DBMS back-end.
General metadata that describe the class: identifier, kind of extent (crisp or
fuzzy), description, and so on;
2.
3.
Set of methods that conform its behavioral component (which can also be
fuzzy); and
4.
Figure 6 illustrates FoodBi class inspector, when defining the structural part of
a class Image, which is organized in three levels of precision and has some
attributes that may have imprecise values (age and quality). The information
provided by the user when defining an attribute determines the way in which
fuzziness will be handled:
1.
In the case of attributes with imprecise values, the user can build labeled
domains by choosing among different semantics: with or without underlying
basic domain, disjunctive, conjunctive, etc.
2.
In case the attribute value can be affected by explicit uncertainty, the user
can attach to the attribute domain a set of linguistic labels or the [0,1]
interval in order to express this explicit uncertainty.
3.
The user can even graduate the relationship expressed by the attribute,
combining the attribute domain with a suitable linguistic domain for expressing strength values.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Conclusions
In this chapter, we studied several suitable strategies to face the representation
of the different kinds of imperfections that may arise when a database is being
designed in an object-oriented paradigm, according to the level at which these
imperfections may occur.
As part of our proposal, we demonstrated how to implement reusable fuzzy
comparison capabilities in modern programming platforms through the use of
reflection and theoretical results that help us apply fuzzy techniques in objectoriented models.
We also presented an architecture for the development of a fuzzy object-oriented
database management system. This architecture is founded on the idea of
minimizing the development effort needed to obtain data imperfection manage-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
ment capabilities. As the new structures needed to support data imperfection are
implemented using standard object-oriented techniques, we can use an existing
classical database system as the basis for our fuzzy one. This way, we only have
to develop an upper layer on top of the classical system, avoiding the effort
required by the implementation of a whole new system. A prototype was
developed to verify the viability of our proposals.
The theoretical approach is currently being extended in order to deal with
queries: in fact, the FuzzyEquals method described in the chapter is being used
as the basis in order to perform object queries (Marn et al., 2004).
The prototype is currently the basis for two main development efforts:
1.
2.
Acknowledgment
This work was partially supported by the Spanish Comisin Interministerial de
Ciencia y Tecnologa under grants TIC2003-08687-C02-02 and TIC200204021-C02-02.
References
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter, J. M. (2000). Toward soft
computing object-oriented logic programming. In Proceedings of the
Ninth IEEE International Conference on Fuzzy Systems (pp. 768773).
Baldwin, J. F., Cao, T. H., Martin, T. P., & Rossiter J. M. (2000b). Implementing
Fril++ for uncertain object-oriented logic programming. In Proceedings of
the Eighth IEEE International Conference on Information Processing
and Management of Uncertainty in Knowledge-Based Systems (pp.
496503).
Berler, M., Eastman, J., Jordan, D., Russell, C., Schadow, O., Stanienda, T., &
Velez, F. (2000). The object data standard: ODMG 3.0. New York: Morgan
Kaufmann Publishers.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Blanco, I. J., Marn, N., Pons, O., & Vila, M. A. (2001). Softening the objectoriented database-model: Imprecision, uncertainty, and fuzzy types. In
Proceedings of IFSA/NAFIPS World Congress.
Bordogna, G., Lucarella, D., & Pasi, G. (1994). A fuzzy object oriented data
model. In Proceedings of FUZZ-IEEE (pp. 313317).
Caluwe, R. de. (1997). Fuzzy and uncertain object-oriented databases:
Concepts and models. Advances in fuzzy systemsapplications and
theory (Vol. 13). Singapore: World Scientific.
Cao, T. H. (2001). Uncertain inheritance and recognition as probabilistic default
reasoning. International Journal of Intelligent Systems, 16, 781803.
Cubero, J.C., Marn, N., Medina, J. M., Pons, O., & Vila M. A. (2004). Fuzzy
object management in an object-relational framework. In Proceedings of
IPMU, pp.1767-1774.
George, R., Buckles, B. P., & Petry, F. E. (1993). Modelling class hierarchies
in the fuzzy object-oriented data model. Fuzzy Sets and Systems, 60, 259
272.
Gonzalez, A., Pons, O., & Vila, M. A. (1999). Dealing with uncertainty and
imprecision by means of fuzzy numbers. International Journal of Approximate Reasoning, 21, 233256.
Gyseghem, N. Van, & Caluwe, R. de. (1998). Imprecision and uncertainty in the
UFO database model. Journal of the American Society for Information
Science, 49, 236252.
Koyuncu, M., & Yazici, A. (2003). IFOOD: An intelligent fuzzy object-oriented
database architecture. IEEE Transactions on Knowledge and Data
Engineering, 15(5), 11371154.
Kuo, J. -Y., Lee, J., & Xue, N. -L. (2001). A note on current approaches to
extend fuzzy logic to object oriented modeling. International Journal of
Intelligent Systems, 16, 807820.
Ma, Z. M., Zhang, W. J., Ma, W. Y., & Chen, C. Q. (2001). Conceptual design
of fuzzy object-oriented databases using extended entity-relationship model.
International Journal of Intelligent Systems, 16, 697711.
Marn, N., Pons, O., & Vila M. A. (2001). A strategy for adding fuzzy types to
an object-oriented database system. International Journal of Intelligent
Systems, 16, 863880.
Marn, N., Medina, J. M., Pons, O., Snchez, D., & Vila, M. A. (2003). Complex
object comparison in a fuzzy context. Information and Software Technology, 45, 431444.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Marn, N., Pons, O., & Vila M. A. (2000). Fuzzy types: A new concept of type
for managing vague structures. International Journal of Intelligent
Systems, 15, 10611085.
Na, S. L., & Park, S. (1996). Management of fuzzy objects with fuzzy attribute
values in new fuzzy object oriented data model. In Proceedings of the
Second International Workshop on FQAS (pp. 1940).
Na, S. L., & Park, S. (1996b). A fuzzy association algebra based on fuzzy object
oriented data model. In Proceedings of the 20th International Conference on Compsac (pp. 624630).
Rossazza, J. -P., Dubois, D., & Prade, H. (1998). A hierarchical model of fuzzy
classes. In Fuzzy and uncertain object-oriented databases. Concepts
and models, Advances in fuzzy systemsapplications and theory (Vol.
13, pp. 2161).
Ruspini, E. H. (1986). Imprecision and uncertainty in the entity-relationship
model. In H. Prade, & C. V. Negiota (Eds.), Fuzzy logic and knowledge
engineering (pp. 1828). Heidelberg: Verlag TUV Reheiland.
Stonebraker, M., & Brown, P. (1999). Object/relational DBMSs: Tracking
the next great wave. New York: Morgan Kaufmann Publishers.
Vanderberghe, R. M., & Caluwe, R. de. (1991). An entity-relationship approach
to the modeling of vagueness in databases. In Proceedings of ECSQAU
Symbolic and quantitative approaches to uncertainty (pp. 338343).
Vila, M. A., Cubero, J. C., Medina, J. M., & Pons, O. (1995). The generalized
selection: An alternative way for the quotient operations in fuzzy relational
databases. In B. Bouchon-Meunier, R. Yager, & L. Zadeh (Eds.), Fuzzy
logic and soft computing. Singapore, World Scientific Press.
Yazici, A., George, R., & Aksoy, D. (1998). Design and implementation issues
in the fuzzy object-oriented data model. Journal of Information Sciences,
108, 241260.
Zivieli, A., & Chen, P. P. (1986). Entity-relationship modeling and fuzzy
databases. In Proceedings of the Second International Conference on
Data Engineering IEEE (pp. 1828).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
206
Helmer
Chapter VII
Abstract
This chapter gives an overview of indexing techniques suitable for fuzzy
object-oriented databases (FOODBSs). First, typical query patterns used
in FOODBSs are identified, namely, single-valued, set-valued, navigational,
and type hierarchy access. The description of the patterns does not follow
a particular fuzzy object-oriented data model but is kept general enough to
be used in different FOODBS contexts. Second, for each query pattern,
index structures are presented that support the efficient evaluation of these
queries. These range from standard index structures (like B-trees) to
sophisticated access methods (like Join Index Hierarchies). Due to space
constraints, an explanation of the basic techniques is given rather than an
exhaustive description. However, the interested reader is supplied with a
broad list of references for further reading. Finally, a summary and
outlook conclude the chapter.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
One important technique used to accelerate the associative access in database
management systems (DBMS) is the use of index structures. When searching
for data, we want to avoid the worst case, i.e., having to scan through the whole
database and test every data object, because this is inefficient. Index structures
help here as they allow fast access to data by content.
Due to the semantic richness of object-oriented DBMSs, we have different
methods for indexing than, e.g., in relational DBMSs. Adding fuzziness increases
the number of possibilities even further. Unfortunately, publications on indexing
in fuzzy object-oriented DBMSs are few and far between. Although indexing in
advanced DBMSs (e.g., object-oriented, spatial, image, temporal, or XML
databases) is an established research topic (for overviews see Bertino, 1997;
Liu, 1996; Luk, 2002; Manolopoulos, 1999; Mueck, 1997), indexing in fuzzy
databases has not yet received much attention.
This chapter is organized as follows. First, we give a brief introduction to the
concepts of object-oriented DBMSs needed in the remainder of the chapter.
Next, we give an overview of the different aspects of accessing data in fuzzy
object-oriented DBMSs. In the next section, we investigate several index
structures supporting these access patterns. We then express our opinion on
future trends in the area of access methods for FOODBS systems. Finally, in the
last section, we conclude with a brief summary.
Preliminaries
Storage Hierarchy
In every computing system, also in every DBMS, we have several layers of
storage (Figure 1). Generally, the higher a memory type is positioned in this
hierarchy, the faster, the costlier, and the smaller it becomes. The differences
between the levels are usually several orders of magnitude. We divide this
hierarchy into three subcategories: primary, secondary, and tertiary storage.
Primary storage consists of CPU-registers, cache memory, and main memory;
secondary storage comprises the disk level; and tertiary storage includes the tape
level. We restrict ourselves to the levels that are most important for index
structures in DBMSs: main memory and disks.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
208
Helmer
Object Model
Now we present a brief introduction to a (nonfuzzy) object-oriented database
model. For a detailed definition see the standard by the Object Data Management
Group (ODMG) (Cattell, 2000). We introduce fuzziness to this model in the next
section when describing the access patterns.
Central to the object-oriented model are objects, which are database entities
described by their identities, their types, and their states. The identity of an
object is defined by a unique object identifier (OID), which never changes
during the lifetime of the object. Each object is also an instance of a certain type
(this also does not change for an object). The type determines the behavior and
structure of an object. The behavior is constituted by a set of operations the
object is able to execute. The structure, in turn, is described by a set of attributes
and the possible relationships the object can enter into with other objects.
Attributes are not restricted to domains with atomic values but are allowed to be
collections, like sets, lists, or tuples. At each point in time, an object has an
internal state. The state of an object is defined by the values of its attributes and
the current relationships it sustains.
A type can inherit its basic structure and behavior from another type and extend
this structure and behavior. In this case, we speak of inheritance: a subtype
inherits properties from a supertype. All objects belonging to a type (and all its
subtypes) are combined in an extent of this type. Another important feature of
an object model is substitutability, i.e., an object can be used at any place in
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
which an object of one of its supertypes is used. Last but not least, there is
polymorphy. A polymorphic operation is defined for a set of types, not only for
a single type. In this way, types that may otherwise be unrelated can show the
same behavior. For example, the operator + (addition) has to be implemented
differently for integers than it does for floats, but it has the same semantics.
2.
3.
Navigational access via paths, i.e., objects are linked together with pointers
[Not all fuzzy object-oriented data models support fuzzy associations
between objects, among those that do are by Bordogna (1994), Na (1996),
and Yazici (1997, 1998).]
4.
Access via type hierarchies, i.e., queries may refer to specific types or a
subhierarchy of types [Again, not all fuzzy object-oriented models support
fuzzy type hierarchies, among those that do are by Bordogna (1994),
George (1992), Na (1996), and Yazici (1997, 1998).]
Single-Valued Attributes
For our first access pattern, we are going to look at single-valued attributes that
have a grade of certainty (usually ranging from 0 to 1) attached to their values.
This grade reflects the level of belief in this value and is based on certainty theory
(Durkin, 1994; Shortliffe, 1975). Assume that we have a database for the
administration of a university. We could have a type called Staff that holds the
data for employees:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
210
Helmer
class Staff {
attribute String Name;
attribute String Position : degree;
attribute Integer Age : degree;
}
The addition of the clause : degree after the attributes Position and Age tells
us that the values of these two attributes can be uncertain. So, if we are unsure
whether a person works as an assistant professor, we can store the value
"Assistant Professor" (0.6) in the attribute Position. (Note that this
approach could also be modeled in crisp object models by adding another
attribute holding the corresponding degree for each attribute that can contain
uncertain data.)
Possible queries in this context would be: Give me the names of all staff that
work as an assistant professor with at least a degree of 0.7 or Give me the
names and positions of all persons who are younger than 30 with a certainty of
0.4.
This approach is popularly applied for inexact reasoning in expert systems. As
a matter of fact, the expert system MYCIN provided the basis for certainty
theory.
Set-Valued Attributes
A more flexible approach than the previous one is to represent the value of an
attribute by means of a (disjunctive) fuzzy set. Look at the following example
(again, we use a general notation for fuzzy attributes):
class Staff {
attribute String Name;
fuzzy attribute String Position;
fuzzy attribute Integer Age;
}
Now the two attributes Position and Age are declared as fuzzy. What does
this mean? If we want to express that it is perfectly possible that a person works
as a research assistant or assistant professor, maybe is even an associate
professor, but probably not a full professor, we can describe this fact by the fuzzy
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1.0
0.5
0.5
20
40
60
80
set in Figure 2(a). Describing the age of this person as young could be done with
the fuzzy set described by the membership function in Figure 2(b).
Querying on fuzzy sets is more flexible but also more complex than querying on
single-valued attributes. One popular approach is based on the possibility theory
by Prade and Testemale (1984). So, we are going to concentrate on this
technique and give a brief description in the following. We want to fetch all
objects (with a fuzzy attribute A) that satisfy a query condition a, meaning Aa
is satisfied, where is a (fuzzy) comparison operator and a is a (fuzzy) constant,
represented by and a, respectively. As the values of A (and the query
condition) can be fuzzy, there is some uncertainty as to whether a data item
satisfies the condition or not. Two fuzzy measures are used to express this
degree of uncertainty. One is the possibility measure defined as follows:
X P () : ( X ) = max A( oi ) ( )
X
(1)
where is the domain of attribute A, while P() denotes the power set of . The
value of attribute A of object oi is described by a possibility function A(oi) on
(which basically is a normalized fuzzy set, i.e., at least one item has a membership
degree of 1.0). Associated with each possibility measure is a necessity measure
N(X):
X P( ) : N ( X ) = 1 ( X )
(2)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
212
Helmer
The possibility that the value of attribute A of data item oi belongs to the set of
values determined by and a is equal to
( a o | A(oi )) = max min( ao ( ), A( oi ) ( ))
(3)
(4)
with
'
(5)
Let us also present an example query for this access pattern. We want to find
all persons who are approximately young [see Figure 2(b) for the fuzzy set
young]. The comparison operator for approximately equal to could be
defined similarly to the one found in Prade (1984):
| '|
1
for | ' | 5
( , ' ) =
5
0
else
(6)
youngo ( ) =
max min( ( , ' ), young ( ' ))
'
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Navigational Access
Relationships between objects are described by references from one object to
another. In Figure 3, we see a schema graph describing the fact that a
department employs several people who are engaged in different projects.
In a FOODBS system, the relationships may be fuzzy, i.e., each link from one
object to another has a degree of uncertainty associated with it. Figure 4 shows
an excerpt of an instantiation of the above schema. Looking at this example, we
see that the person with identification s1 is certainly employed at the department
d1, while we are not 100% sure that this person is working on project p1.
A possible query in this context could be: Give me all departments that probably
(with a degree larger than 0.8) employ people who are almost surely (with a
degree greater than 0.95) involved in the projects p3 or p4.
Type Hierarchies
A query in an object-oriented database system may refer to objects of a certain
type or to a certain type and all its subtypes. Look at the hierarchy of types
p1
0.9
s1
1.0
d1
0.8
1.0
s2
0.5
0.4
0.4
0.6
s3
s4
p3
1.0
0.3
d2
1.0
p2
0.8
p4
1.0
0.8
p5
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
214
Helmer
Administrative
Technical
Academi
c
Teaching
Research
depicted in Figure 5. In the case of FOODBS systems, we may have objects that
are not clearly assigned to a certain type. We do not look at how the membership
grades are determined exactly but assume that we are able to compute them in
some way.
A typical query involving type hierarchies might be: List the names of all
academics who are older than 40 years. Make sure that the degree of
membership to the class Academics or a subclass is at least 0.9. As we will see,
efficiently evaluating queries in which type hierarchies and other properties are
mixed is not straightforward.
Single-Valued Attributes
Accesses to single-valued attributes are easiest to handle, as we can use the
standard index structures of (relational) DBMSs. We present two of the most
widely known index structures: B-trees and external hashing.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Figure 6. B-tree
19
14
27
17
11
18
20
56
23
58
38
63
54
B-trees
B-trees (Bayer, 1972) (or the more advanced B+-trees) are the standard index
structures in relational database systems. They are balanced multiway trees, i.e.,
in contrast to binary trees, a node can have more than one key and more than two
children (multiway), and all leaves are on the same level (balanced). The keys
in a node N are sorted, and a subtree is assigned to each key. All keys in a subtree
are less than the assigned key. All keys greater than the keys in node N are saved
in an additional subtree (see Figure 6 for an example).
In a database system, the nodes of a B-tree are mapped to pages in the secondary
storage. A B-tree is much shallower than a binary tree, because the fan-out is
much higher. For this reason and because of the balancing, only a few page
accesses are necessary to find a key. To increase branching even further, B+trees are used. In B+-trees, all records are kept in the leaves the inner nodes
contain only reference keys. Normally, these keys are much smaller than the
records. Thus, the level of branching is increased, and the height of the tree
decreases.
More details on B-trees and B+-trees can be found in standard textbooks on
database systems (e.g., Silberschatz, 2001).
External Hashing
We describe an extendible hashing index here, as it is a typical representative of
an external hashing scheme. An extendible hashing index is divided into two
parts: a directory and buckets (for details, see also Fagin, 1979). In the buckets,
we store the full hash keys of and pointers to the indexed data items. We
determine the bucket into which a data item is inserted by looking at a prefix hd
of d bits of the hash key h. For each possible bit combination of the prefix, we
find an entry in the directory pointing to the corresponding bucket. The directory
has 2d entries, where d is called global depth (see also Figure 7). When a bucket
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
216
Helmer
001
010
011
100
101
110
111
d =3
d=2
d=3
d=3
h 2 =00 h3 =010 h 3 =011
d=1
h1 =1
overflows, it is split, and all its entries are divided among the two resulting
buckets. In order to determine the new home of a data item, the length of the
inspected hash key prefix has to be increased until at least two data items have
different hash key prefixes. The size of the current prefix d' of a bucket is called
local depth. If we notice after a split that the local depth d' of a bucket is larger
than the global depth d, we have to increase the size of the directory. This is done
by doubling the directory as often as needed to have a new global depth d equal
to the local depth d'. For the bucket that was split, the new pointers are put into
the directory. For the other buckets, the directory entries are copied.
B-trees and external hashing assume that we want to submit queries involving
one attribute: List the names of all persons that are 35 years old or Return all
persons on whose age we are certain (degree = 1.0). Usually, queries will
combine attribute values with certainty degrees and will even use ranges: I want
to have a list of all persons older than 40 with a certainty degree of at least 0.8.
In such cases, B-trees and external hashing will not be efficient. We need
multidimensional access methods like grid files or k-d trees (to name prominent
representatives).
Grid Files
A grid file can be seen as a generalization of hashing to multiple dimensions
(Nievergelt, 1984). Let us assume that we want to index the attribute Age with
its corresponding degree of uncertainty. Figure 8 shows an example of a grid file
for that case.
The data space is partitioned into cells. The cells can share data pages as
indicated by the dashed lines in Figure 8. For each dimension, we provide a linear
scale that partitions the particular dimension in a uniform way, mapping the
domain to an index. Accesses to the grid are done via these linear scales to
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
<20
0
<0.25 0
5 1
0 3
determine the correct cell index. Range queries pose no problems. We just have
to be careful to eliminate false drops caused by the page sharing.
K-d Trees
The original k-d tree is a generalization of a binary tree to many dimensions
(Bentley, 1975). In an ordinary (balanced) binary tree, each node splits the
remaining data objects beneath it roughly into two halves. All objects with values
smaller than the node value are found to the left of the node, all those with greater
values are found to the right of the node. At each level of a k-d tree, a different
dimension is chosen to divide the data objects. In our running example, we would
first split according to age, then according to the uncertainty degree, then age
again, and so on.
As binary trees are not well suited for secondary storage structures, several
extensions and modifications to k-d trees were proposed, e.g., k-d B-trees
(Robinson, 1981) and hB-trees (Evangelidis, 1995). (For a general overview of
multidimensional access methods, see Gaede, 1998.)
Set-Valued Attributes
This query type is more flexible than the previous one on single-valued attributes.
Therefore, this is the area where the most work has been done (Bosc, 1989, 1988;
Boss, 1999; Helmer, 2001). (Additionally, all of these techniques can be used in
other fuzzy DBMSs and are not restricted to object-oriented DBMS.) The basic
principle (as introduced by Prade, 1984) is to look at fuzzy attribute values in
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
218
Helmer
(7)
where L >0 are -cuts of fuzzy sets. An -cut of a fuzzy set F is defined as
(0 1)
L ( F ) = { | F ( ) }
(8)
(9)
There are two special -cuts, the core L 1(F) and the support L>0(F) of a fuzzy
set F.
For more selective queries, an acceptance threshold can be provided by the
user. Determining qualifying data items then boils down to
( o | A(oi )) > L ( A( oi ) ) L ( ao )
(10)
L>0 ( A(oi ) ) L ( ao )
(11)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(12)
(13)
L1 ( A( oi ) ) L ( a o )
(14)
Hereafter, when searching for supports that intersect with the -cut of the query
predicate, we call this a nonempty intersection query. When looking for cores
that are a subset of the query predicate, we call this a subset query.
Queries using this principle are supported by indexing the cores and supports of
the fuzzy sets, respectively. In the literature we find two different approaches.
The first approach assumes that the cores and supports may contain an infinite
number of elements from a (continuous) domain. However, we have to be able
to describe the cores and supports by closed intervals (Bosc, 1989). The second
approach assumes that the cores and supports contain a finite number of
elements from a (discrete) domain. An advantage here is that we are not
restricted to intervals. In the following, we are going to discuss index structures
capable of supporting the interval-based approach and then continue with those
for discrete values.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
220
Helmer
2
L2 U 2
(1,3) (1,3
)
1
L1
U1
(3,7
) (2,5
)
L6
3
L3
(3,3
)
5
U3
L5
U5
U6
5
(,6) (5,7
)
(5,7
) (5,6
)
L7
U7
3
(,3)
lower and upper bounds of our query interval q, respectively. While descending
down the tree, we have to distinguish three different cases. Figure 10 (taken from
Kriegel, 2000) illustrates this for intersection queries. When v < , we have to
check Uv for possibly intersecting intervals. As soon as we fail to find intersecting
intervals, we can stop, as Uv is sorted by ui. We then continue by following the
reference to the right child. When < v, we have to check Lv for possible query
answers and continue down the left child of v. In case of v , we output
all intervals in Lv (or U v) and visit both children.
Searching for subintervals of q (in the case of subset queries) is not hard to do
either. We just have to look at the nodes for which v and search for
candidates in Lv and Uv. We can utilize the ordering of the lists by searching them
from back to front.
For the relational interval tree, the backbone is not actually materialized, as it has
a regular structure. We create three different relations: i(v, li, ui) for the
intervals, l(v, li) for the lists Lv, and u(v, ui) for the lists U v (each with an
appropriate index). Querying is done by computing the numbers of the visited
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
nodes and submitting the corresponding range queries to the list relations l
and u.
G-trees
G-trees are a combination of grid files with B-trees using a clever partition
numbering. In order to describe the index structure in an understandable way, we
restrict ourselves to the two-dimensional case (which is also the case we need
to index fuzzy data). Assume that each partition can hold no more than two data
objects. We start with an initial partitioning as depicted in Figure 11(a) (taken
from Kumar, 1994), where we split along the first dimension and number the
partitions using the binary strings 0 and 1. After inserting some more objects, we
have to split the partition 0 [see Figure 11(b)]. We do so along the second
dimension, numbering the newly created partitions 00 and 01. As more overflows
occur, we alternate between the two dimensions and number the partitions
accordingly [see Figures 11 (c) and (d)]. This regular numbering scheme has
several advantages, e.g., finding parent and child partitions is straightforward, as
is finding complements of a partition (for details see Kumar, 1994).
01
00
(a)
(b)
0111
010
010
011
0110
1
00
00
(c)
(d)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
222
Helmer
The partitions are indexed using a B-tree-like structure called G-tree. First, the
binary numbers are converted to decimals in the following way. All binary
numbers are brought to the same length by padding them with trailing 0s. In our
example in Figure 11, we would have 0 (0000), 4 (0100), 6 (0110), 7 (0111), and
8 (1000). These numbers are inserted into the G-tree like into a B-tree. When
searching the tree, we have to compute the relevant partition numbers and then
look them up. When inserting and deleting objects, we have to adjust the
partitioning scheme accordingly (for details see Kumar, 1994).
Liu et al. adapted G-trees for fuzzy data by mapping fuzzy queries onto range
searches (Liu, 1996). The intervals of supports (and cores) of fuzzy sets are
mapped to two-dimensional space by considering the lower bound of the interval
as x-value and the upper bound as y-value. Possible candidates for nonempty
intersection queries are found by retrieving objects for which 0 x and
y . For subset queries, we need to check x and 0 y .
Signatures
We will now turn to index structures assuming a finite set of discrete values in
the cores and supports of the fuzzy sets to be indexed. First, we give a brief
review of the superimposed coding technique, and then we will discuss index
structures built around this method.
Superimposed coding is based on the idea of hashing values into random k-bit
codes in a b-bit field and superimposing the codes for each value in a signature
(Knuth, 1973). The fixed size b is called the signature length. We use signatures
to represent the -cuts of a and the supports and cores of the indexed fuzzy
sets. There are two advantages to signatures. One is their constant length; keys
of constant length are easier to manage than keys of variable length. The other
advantage is the great speed with which signatures can be compared by using
only bit operations.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Example 1: An example for encoding the core of the fuzzy set position from
Figure 2 in an 8-bit signature with k = 2 is:
value
bitcode
Research Assistant
1001 0000
Assistant Professor
0001 0010
Signature
1001 0010
We cannot assume that the signatures of distinct sets are distinct. Still
s t sig ( s ) sig (t ) for {, }
(15)
where s and t are arbitrary sets, and sig (s) sig (t) and sig (s) are
defined as
with & denoting bitwise and and denoting bitwise complement. Hence, a
pretest based on signatures can be fast because it involves only bit operations.
Now, instead of comparing L(a) to the support or core of each A(oi), we first
compare the signature of L(a) to the signature of each support or core. During
the evaluation of a query, if sig(L >0 ( A(o i ) ) sig(L a ( a )) or
sig(L1(A(oi))sig(L(a)) holds, we call oi a drop. Additionally, if (a
A(oi)) > or N(a A(oi)) > also holds, we have a right drop, else oi is a
false drop. After determining all data items that are drops, we have to filter out
the false drops. [The probabilities that data items turn out to be false drops have
been studied intensively in Ishikawa (1993). We will not go into detail here.]
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
224
Helmer
SSF/Compressed SSF
A sequential signature file (SSF) (Ishikawa, 1993) is a simple index structure. It
consists of a sequence of pairs of signatures (of supports or cores, depending on
the supported query type) and references to data items.
During retrieval, the SSF is scanned and all data items oi with matching
signatures are fetched and tested for false drops. Boss and Helmer (1999)
showed that SSF and its compressed counterpart, compressed signature file
(CSF), can be used to index fuzzy sets, and that this approach is faster than
scanning all fuzzy sets. In the following section we will discuss how the usual
ways of structuring indexes, namely, hierarchical organization and partitioning,
are applied to signatures.
.A)2 | sig(o
3
.A) | sig(o
[sig(o
.A),
.A)5 | ]sig(o
4
6
.A)
] | sig(o
[sig(o .A),
.A)8 | sig(o9
7
.A)
] | sig(o
| denotes bitwise or
[sig(o
1
.A),
ref(o
[sig(o
)].A),
ref(o
[sig(o
)].A),
ref(o
1
2
2
3
3
[sig(o
4
)]
[sig(o
7
.A),
ref(o
[sig(o
ref(o
[sig(o
)].A), r
8 )].A),
7
8
9
.A),
[sig(o
ref(o
[sig(o
)].A),
ref(o
4 ref(o
5 )].A),
5
6
6
)]
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
h2 (x) = 00
[sig(o
1
.A),1 ref(o
0 01
d =3
h3 (x) = 010
0 10
[sig(o
2
.A),
ref(o
2
1 00
d =3
h3 (x) = 011
1 01
[sig(o
3
.A),3 ref(o
d =1
h1 (x)
1 =
[sig(o
4
.A),
4 ref(o
d =3
)],
0 00
)],
0 11
)],
1 10
1 11
)],
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
226
Helmer
index. This is due to the fact that we store partial signatures in the directory of
ESH. Let part d(sig(s)) denote the first d bits of the signature of set s. Then,
sig( s ) sig(t ) part d (sig( s )) part d (sig(t ))
but
sig( s ) sig(t ) part d (sig( s )) part d (sig(t )) .
Inverted Files
An inverted file (see Figure 14) consists of a directory containing all distinct
values in the domain W, and a list for each value consisting of the references to
data items with support or core of A(oi) contains this value. For an overview on
traditional inverted files, see Kitagawa (1996) and Sacks-Davis (1997). As done
frequently, we can hold the search values of the directory in a B+-tree. Moreover,
the lists are modified by storing the cardinality of the cores with each data item
reference (denoted by oixy.Ain Figure 14). This enables us to answer subset
vn
[ref(o
i11
),
i
11
|o
i12 .A|],
i
12
[ref(o
i13
),
i
13
|o
[ref(o
i21
),
21
i
|o
22
i22 .A|],
i
[ref(o
i32
),
32
i
|o
[ref(o
i31
),
i
31
|o
i32 .A|],
i
32
[ref(o
i33
),
i
33
|o
[ref(o
i41
),
i
41
|o
i42 .A|],
i
42
[ref(o
i43
),
i
43
|o
[ref(o
in1
),
n1
i
|o
n2
in2 .A|],
i
[ref(o
in3
),
n3
i
|o
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
queries efficiently by using the cardinalities as a quick pretest. The lists can also
be compressed using, for example, lightweight compression techniques
(Westmann, 2000).
When evaluating a nonempty intersection query, we simply fetch the lists for all
items in L ( a o ) and form the union of the retrieved data items.
When evaluating a subset query, we traverse all lists associated with the values
in L(a). We count the number of occurrences for each reference appearing
in a retrieved list. When the counter for a reference is not equal to the cardinality
of its core, we eliminate that reference. We can do this because this reference
also appears in lists associated with values that are not in L(a). The
referenced core cannot be a subset of L(a).
In cases of subset and nonempty intersection queries, we have to check whether
the retrieved data items satisfy the query possibly (or necessarily) as the
supports (and cores) serve only as filters.
Paths
In this section, we investigate index structures for indexing paths in objectoriented DBMSs and show how they can be adapted to fuzzy object-oriented
DBMSs. We are going to look at two index structures in particular: access
support relations (ASRs) (Kemper, 1992) and join index hierarchies (Han, 1999)
and their respective adaptions to fuzzy DBMSs.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
228
Helmer
Department.employs
s1
p1
0.9
s1
p2
1.0
d1
s1
1.0
s2
p2
0.8
d1
s2
0.8
s2
p3
0.4
d3
s8
0.2
s2
p4
0.6
s6
p6
0.7
s7
p7
0.4
p1
Natix
p2
Timber
p3
Tamino
p4
Rainbow
p6
Galax
p8
IPSI-XQ
1.0
s1
0.9
p1
Natix
d1
1.0
s1
1.0
p2
Timber
d1
0.8
s2
0.8
p2
Timber
d1
0.8
s2
0.4
p3
Tamino
d1
0.8
s2
0.6
p4
Rainbow
Canonical extensions contain only information on complete paths, i.e., paths that
start at department objects and end at the names of projects (Figure 16).
Left-complete extensions include all paths starting at department objects but not
necessarily ending at projects (Figure 17).
Similar to this are right-complete extensions, which end at names of projects but
do not necessarily go all the way to department objects (Figure 18).
Full extensions also comprise all partial paths (Figure 19).
Usually we do not materialize all extensions but a mix of different extensions and
decompositions. A decomposition of an ASR is a projection on relevant (consecutive) attributes of an extension. Those access relations that are materialized
are indexed using B +-trees, speeding navigational accesses considerably. For
details on how to optimize ASRs for specific applications see Kemper (1989).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
1.0
s1
0.9
p1
Natix
d1
1.0
s1
1.0
p2
Timber
d1
0.8
s2
0.8
p2
Timber
d1
0.8
s2
0.4
p3
Tamino
d1
0.8
s2
0.6
p4
Rainbow
d3
0.2
s8
1.0
s1
0.9
p1
Natix
d1
1.0
s1
1.0
p2
Timber
d1
0.8
s2
0.8
p2
Timber
d1
0.8
s2
0.4
p3
Tamino
d1
0.8
s2
0.6
p4
Rainbow
s6
0.7
p6
Galax
p8
IPSI-XQ
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
230
Helmer
1.0
s1
0.9
p1
Natix
d1
1.0
s1
1.0
p2
Timber
d1
0.8
s2
0.8
p2
Timber
d1
0.8
s2
0.4
p3
Tamino
d1
0.8
s2
0.6
p4
Rainbow
d3
0.2
s8
s6
0.7
p6
Galax
p8
IPSIXQ
s7
0.4
p7
p1
Natix
d1
p2
Timber
d1
p3
Tamino
d1
p4
Rainbow
JIHs generalize the decomposition principle by allowing the omission of intermediate objects in a path (Han, 1999). For example, we could have an index that
jumps from department objects right to projects, skipping staff objects (see
Figure 20). A complete JIH schema for our example can be seen in Figure 21(a)
(d = Department, s = Staff, p = Project, n = Name). The lower part is the base
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(a)
(b)
JIH, which consists of all binary relationships. Due to space constraints, usually
only part of a full JIH is materialized [see Figure 21(b)].
However, two difficulties have to be overcome. We have to guarantee the
correctness of updates on intermediate links in paths and have to find a way to
handle the intermediate uncertainty degrees.
Updates
Look at part of the schema instantiation in Figure 22. Clearly, there are two paths
from d1 to p2. When deleting one of them (e.g., d1 s2, because s2 starts working
at another department), we have to decide what to do with our relationship d1 p2
in Figure 20. By looking at the JIH in Figure 20, we cannot decide whether d1 p2
should be deleted or not.
Han et al. solved this problem by counting the number of links between each pair
of objects. For the base JIH, this is trivial. For our example in Figure 22, we would
store the following four tuples in the appropriate base JIH relations: (d1, s1, 1),
(d1, s2, 1), (s1, p2, 1), and (s2, p2, 1). This is also done for the relations on higher
levels, e.g., in the tuple (d1, p2, 2). When deleting a link in the base JIH, we
propagate these changes to the higher levels. In this case, we would subtract one
for the counter for d1 p2 and would know that d1 and p2 are still connected.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
232
Helmer
Uncertainty Degrees
The question remains how we handle the uncertainty degrees of the intermediate
links we cut away on the higher levels of a JIH. (It is no problem to store them
in the base JIH.) One possible solution is to allocate space in the levels above the
base JIH in which to store all the intermediate uncertainty degrees in lists.
However, we expect this to bloat the index significantly.
A more elegant solution can be found if we are interested in an overall
uncertainty degree of all paths. Assume that the function used to compute this
overall uncertainty degree is reversible, like multiplying the degrees along each
path and averaging all paths. For example, in Figure 22, we would store the sum
of the products of the uncertainty degrees (1.0 1.0 + 0.8 0.8 = 1.64) and the
number of paths in the tuple (d1, p2, 2, 1.64). When deleting a path, the sum is
reduced by the appropriate value, and the counter is decremented by one.
Type Hierarchies
In this section, we will briefly present the conventional techniques used for type
hierarchy indexing in non-FOODBS systems. In a second step, we will show how
to combine and extend these methods for FOODBS systems. One difficulty in
indexing type hierarchies is that we can either group the objects by type or by key
values. Each approach has its advantages and disadvantages, as we will see.
SC-trees
An SC-tree (Kim, 1989) is straightforward. Basically, we build a separate B+tree for each type. When querying a subhierarchy of our example in Figure 5, we
determine all types included in the subhierarchy and evaluate a query on each
corresponding B+-tree. When interested in all academics, we have to query the
B+-trees for the class Academic, Teaching, and Research.
H-trees
While an SC-tree maintains a set of isolated structures for each type, an H-tree
(Low, 1992) nests these B+-trees to avoid a full search of each component. This
means that the nodes of a superclass B +-tree may contain pointers to nodes of
subclass B+-trees. There are two important rules for nesting the B+-trees. First,
we have to make sure that the ranges of the nesting node and the nested node
are compatible, so that we do not accidentally end up in a different part of the
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
domain when traversing pointers to a different B+-tree. Second, all leaf nodes in
a subclass B+-tree have to be reachable from the corresponding superclass B+tree. Due to space constraints, we are not going to present the details on how this
is done (for further explanations see Low, 1992).
CH-index
A CH-index (Kim, 1989) uses a different approach than SC- or H-trees. Here,
the objects are indexed using a single B+-tree structure, and the inner pages look
like the inner pages of a regular B+-tree storing the values of the indexed
attributes. The leaf pages look different, however. In the leaf pages, we
distinguish between the different types of objects. Figure 23 shows a simplified
view of a CH-index (for details see Kim, 1989) indexing the ages of staff
members (with a path from the root node to a leaf). For each value (in a leaf
page), we have a list for each type for which objects exist that have this value.
CG-trees
Depending on the size of the indexed type hierarchy, we have many entries in a
leaf page of a CH-index that we are not interested in during query evaluation.
For example, if we want to retrieve all academics, we can ignore objects of the
types Staff, Administrative, Technical, and Nontechnical. Unfortunately, pointers to these objects are contained in the leaf pages of a CH-index.
In a CG-tree, we have at most one pointer per type. Figure 24 shows the two
lowest levels of a (slightly simplified) CG-tree (for implementation details, see
Kim, 1989). The objects belonging to the type Academic are stored on the pages
28 Academic
25
35
........
28
33
........
29 Academic
Research
.......
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
234
Helmer
Research
33 Academic
P1
35 Academic
P2
Research
P3
P4
P5
P1, P2, and P3, and those for type Research are stored on pages P4 and P5. As
objects of different types are probably not distributed in the same way, pages can
be shared. So, for example, if pages P1 and P2 are only lightly filled, they can be
merged to one page that is shared between the two entries for Academic on the
level above (for details on how to balance the leaf pages, see also Kilger, 1994).
Multikey Index
The basic idea in using multikey indexes is to consider the type information as just
another dimension describing an object. The main problem with this approach is
the partial ordering of the types. We would like to impose a total order on the
types in such a way that all queries regarding subhierarchies map to contiguous
range queries. Assume that we want to retrieve all academics between the ages
of 25 and 50. Figure 25(a) shows an optimal way to linearize all the types of the
staff hierarchy, while Figure 25(b) shows a suboptimal solution. When the
objects are optimally arranged by type on disk, we can (for all subtype
Research
Teaching
Academic
Research
Teaching
N
Technical
Academic
Adminstrative
Staff
Technical
Adminstrative
Staff
20 40 60 80
(a)
20 40 60 80
(b)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
hierarchies) retrieve all objects belonging to a certain subtype hierarchy via one
sequential scan without gaps. Mueck and Polaschek gave an algorithm that finds
an optimal linearization (if one exists) (Mueck, 1996, 1997).
After linearizing the type hierarchy, we can use any standard multikey index
structure. However, it is not always possible to find an optimal linearization in the
case of multiple inheritance. This is also important in the context of FOODBS
systems, because fuzzy membership of objects in classes may lead to similar
problems.
Future Trends
Developing new and improving existent index structures for FOODBS systems
will remain a viable research topic in the future, as many open problems still exist.
Let us name a few important ones here.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
236
Helmer
Conclusions
Efficient data retrieval is a necessity for a database system in order to be
accepted by end users. The history of database systems is full of examples to
prove this. Relational systems were able to replace network and hierarchical
database systems only after their performances were increased considerably.
Non-FOODBS systems can only be found in niche applications today, as their
performance and scalability could not keep up with relational systems. One issue
today is the performance of native XML database systems, which still lags
behind expectations. In our opinion, the fate of each new kind of database system
will be partly decided by whether or not its performance will improve significantly
over time.
This is also true for FOODBS systems. The task to improve their performance
will not be easy, because in addition to the fuzzy components, the regular objectoriented components also need to be improved. One important step in improving
the efficiency of a database system is the introduction of powerful index
structures. Although a promising start has been made for FOODBS systems, this
research area has not yet received enough attention. Especially in the area of
path accesses and fuzzy type hierarchies, there are still plenty of opportunities
left for future research.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
References
Bayer, R., & McCreight, E. (1972). Organization and maintenance of large
ordered indexes. Acta Informatica, 1, 173189.
Bentley, J. L. ( 1975). Multidimensional binary search trees used for associative
searching. Communications of the ACM, 18(9), 509517.
Bertino, E., Ooi, B. C., Sacks-Davis, R., Tan, K. -L., Zobel, J., Shidlovsky, B.,
& Catania, B. (1997). Indexing techniques for advanced database
systems. Dordrecht: Kluwer Academic Publishers.
Bordogna, G., Lucarella, D., & Pasi, G. (1994). A fuzzy object oriented data
model. In Proceedings of the Third IEEE Conference on Fuzzy Systems
(pp. 313318).
Bosc, P., & Galibourg, M. (1989). Indexing principles for a fuzzy database.
Information Systems, 14(6), 493499.
Bosc, P., Galibourg, M., & Hamon, G. (1988). Fuzzy querying with SQL:
Extensions and implementation aspects. Fuzzy Sets and Systems, 28, 333
349.
Boss, B., & Helmer, S. (1999). Index structures for efficiently accessing fuzzy
data including cost models and measurements. Fuzzy Sets and Systems,
108(1), 1137.
Cattell, R., Barry, D. K., Berler, M., Eastman, J., Jordan, D., Russell, C.,
Schadow, O., Stanienda, T., & Velez, F. (Eds.). (2000). The Object Data
Standard: ODMG 3.0. San Francisco: Morgan Kaufmann.
Deppisch, U. (1986). S-tree: A dynamic balanced signature index for office
retrieval. In Proceedings of the 1986 ACM Conference on Research
and Development in Information Retrieval (pp. 7787).
Durkin, J. (1994). Expert systems: Design and development. Upper Saddle
River, NJ: Prentice Hall.
Evangelidis, G., Lomet, D., & Salzberg, B. (1995). The hb -tree: A modified hbtree supporting concurrency, recovery and node consolation. In Proceedings of the 21st VLDB Conference (pp. 551561).
Fagin, R., Nievergelt, J., Pippenger, N., & Strong H. R. (1979). Extendible
hashing a fast access method for dynamic files. ACM Transactions on
Database Systems, 4(3), 315344.
Gaede, V., & Gnther, O. (1998). Multidimensional access methods. ACM
Computing Surveys, 30(2), 170231.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
238
Helmer
George, R., Buckles, B. P., & Petry, F. E. (1992). An object-oriented data model
to represent uncertainty in coupled artificial intelligence-database systems.
In M. P. Papazoglou, & J. Zeleznikow (Eds.), The next generation of
information systems: From data to knowledge A selection of papers
presented at two IJCAI-91 workshops, Sydney, Australia, August 26,
1991 (Vol. 611 of Lecture Notes in Computer Science, pp. 3748). Berlin:
Springer.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching.
In Proceedings of the 1984 ACM SIGMOD (pp. 4757).
Han, J., Xie, Z., & Fu, Y. (1999). Join index hierarchy: An indexing structure for
efficient navigation in object-oriented databases. ACM Transactions on
Knowledge and Data Engineering, 11(2), 321337.
Hellerstein, J. M., & Pfeffer, A. (1994). The RD-tree: An index structure for
sets. Technical Report 1252. Madison: University of Wisconsin.
Helmer, S. (2001). Indexing fuzzy data. In Proceedings of the Joint Ninth
IFSA World Congress and 20th NAFIPS International Conference (pp.
21202125).
Helmer, S., & Moerkotte, G. (2003). A performance study of four index
structures for set-valued attributes of low cardinality. VLDB Journal,
12(3), 244261.
Ishikawa, Y., Kitagawa, H., & Ohbo, N. (1993). Evaluation of signature files as
set access facilities in OODBs. In Proceedings of the 1993 ACM
SIGMOD (pp. 247256).
Kemper, A., & Moerkotte, G. (1989). Access support in object bases. Technical
Report 17/89. Karlsruhe: University of Karlsruhe.
Kemper, A., & Moerkotte, G. (1992). Access support relations: An indexing
method for object bases. Information Systems, 17(2), 117146.
Kilger, C., & Moerkotte, G. (1994). Indexing multiple sets. In Proceedings of
20th International Conference on Very Large Data Bases (pp. 180
191).
Kim, W., Kim, K. -C., & Dale, A. (1989). Indexing techniques for objectoriented databases. In W. Kim, & F. H. Lochovsky (Eds.), Objectoriented concepts, databases, and applications (pp. 371394). Reading, MA: Addison-Wesley.
Kitagawa, H., & Fukushima, K. (1996). Composite bit-sliced signature file: An
efficient access method for set-valued object retrieval. In Proceedings of
the International Symposium on Co-operative Database Systems for
Advanced Applications (CODAS) (pp. 388395).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Knuth, D. E. (1973). The art of computer programming (Vol. 3): Sorting and
searching. Reading, MA: Addison Wesley.
Kriegel, H. -P., Ptke, M., & Seidl, T. (2000). Managing intervals efficiently in
object-relational databases. In Proceedings of the 26th VLDB Conference (pp. 407418).
Kumar, A. (1994). G-tree: A new data structure for organizing multidimensional
data. Transactions on Knowledge and Data Engineering, 6(2), 341
347.
Liu, C., Ouksel, A. M., Sistla, A. P., Wu, J., Yu, C. T., & Rishe, N. (1996).
Performance evaluation of G-tree and its application in fuzzy databases. In
CIKM 96, Proceedings of the Fifth International Conference on
Information and Knowledge Management (pp. 235242).
Low, C. C., Ooi, B. C., & Lu, H. (1992). H-trees: A dynamic associative search
index for OODB. In Proceedings of the 1992 ACM SIGMOD Conference (pp. 134143).
Luk, R. W. P., Leong, H. V., Dillon, T. S., Chan, A. T. S., Croft, W. B., & Allan,
J. (2002). A survey in indexing and searching XML documents. Journal of
the American Society for Information Science and Technology, 53(6),
415437.
Manolopoulos, Y., Theodoridis, Y., & Tsotras, V. J. (1999). Advanced database indexing. Dordrecht: Kluwer Academic Publishers.
Mueck, T. A., & Polaschek, M. L. (1996). Indexing type hierarchies with
multikey structures. In Proceedings of the Seventh Workshop on Persistent Object Systems (POS) (pp. 184193).
Mueck, T. A., & Polaschek, M. L. (1997). Index data structures in objectoriented databases. Dordrecht: Kluwer Academic Publishers.
Na, S., & Park, S. (1996). A fuzzy association algebra based on a fuzzy object
oriented data model. In Proceedings of the 20th Computer Software and
Applications Conference (COMPSAC 96) (pp. 276281).
Nievergelt, J., & Hinterberger, H. (1984). The grid file: An adaptable, symmetric
multikey file structure. ACM Transactions on Database Systems, 9(1),
3871.
Prade, H., & Testemale, C. (1984). Generalizing database relational algebra for
the treatment of incomplete or uncertain information and vague queries.
Information Sciences, 34, 115143.
Preparata, F. P., & Shamos, M. I. (1993). Computational geometry: An
introduction. Berlin: Springer.
Robinson, J. T. (1981). The k-d B-tree. In Proceedings of the 1981 ACM
SIGMOD (pp. 1018).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
240
Helmer
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter VIII
Introducing
Fuzziness in Existing
Orthogonal Persistence
Interfaces and Systems
Miguel ngel Sicilia
University of Alcal, Spain
Elena Garca-Barriocanal
University of Alcal, Spain
Jos A. Gutirrez
University of Alcal, Spain
Abstract
Previous research has resulted in generalizations of the capabilities of
OODB models and query languages to cope with imprecise and uncertain
information in several ways, informed by previous research in fuzzy
relational databases. As a result, a number of models and techniques to
integrate fuzziness in its various facets in object data stores are available
for researchers and practitioners, and even extensions to commercial
systems have been implemented. Nonetheless, for those models and
techniques to become widespread in industrial contexts, more attention
should be paid to their integration with current database design and
programming practices, so that the benefits of fuzzy extensions could be
easily adopted and seamlessly integrated in current applications. This
chapter attempts to provide some criteria to select the fuzzy extensions that
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
A number of research groups has investigated the problem of modeling fuzziness
in the context of object-oriented databases (OODBs), e.g., De Caluwe (1998),
Ma, Zang, and Ma (2003), and some of their results include research implementations on top of commercial systems, e.g., those reported in Yazici, George, and
Aksoy (1998) and in Schenker, Last, and Kandel (2001). Despite the considerable amount of significant research in the field, no commercial system is available
today that supports fuzziness explicitly in its core physical or logical model, and
existing database standards regarding object persistence sources like those
of the Object Data Management Group (ODMG) (Cattell, 2000) and JavaData
Objects (JDO) (Russell et al., 2001) do not support vagueness or any other
kind of generalized uncertainty information representation (Klir & Wierman,
1998) in their data models.
One possible reason for this lack of integration of fuzziness in industrial practices
may be found in the relative complexity of modeling with fuzzy mechanisms,
which makes it difficult for average practitioners to fully understand and exploit
the potential of fuzzy techniques. Studies coming from the field of psychology of
programming, like those by Green and Petre (1996) and Kao and Archer (1997),
may serve as points of departure to investigate how fuzziness affects the mental
models of programmers and designers. In any case, further research is needed
in how to extend existing (crisp) database programming technology to its fuzzy
generalization in an acceptable and usable way for the average developer. In
addition, some of these generalizations may eventually lead to reduced performance and other inefficiencies, precluding a priori their acceptability. This
chapter aims at providing an overview of some of the issues regarding the just
described situation, and at serving as a point of departure for further research in
the area.
The rest of this chapter is structured as follows. The second section provides a
brief review of existing research on extending OODB models, and the motivation
for research on usability and acceptability of fuzzy constructs in orthogonal
persistence systems and programming interfaces. The third section deals with
the introduction of specific fuzzy constructs in orthogonal persistence systems,
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Background
Several fuzzy OODB models and applications have been reported to date.
Similarity-based models like the one described in Aksoy, Yazici, and George
(1996) provide class definitions based on similar value ranges of instances.
Models based on possibility theory (Dubois, Prade, & Rossazza, 1991) are able
to represent vagueness and uncertainty in class hierarchies by introducing
constraints in attribute values. Models like UFO (De Caluwe, 1998) provide a
variety of representations for imperfect information, separating concerns for
vagueness and for uncertainty. Other authors proposed fuzzy sets as first-class
programming objects (Inoue, Yamamoto, & Yasunobu, 1991). Existing applications of fuzzy object databases include geographical information systems (Cross
& Firat, 2000), applications to multimedia (Koprulu, Cicekli, & Yazici, 2003), and
retrieval in image databases (Nepal, Ramakrishna, & Thom, 1999).
Database models like FOOD (Yazici & Koyuncu, 1997) and FRIL++ (Cao &
Rossiter, 2003) integrate with logics or deductive capabilities to provide support
for fuzzy inference, but we will not deal with this issue here, because most
current industrial applications do not include reasoning and are not based on a
sort of knowledge representation formalism, in the sense given by Davis, Shrobe,
and Szolovits (1993).
Despite the fact that current approaches to uncertainty and imprecision in object
databases are fairly diverse in their supporting mathematical frameworks and
assumptions, for now, they are relegated to research systems for specific
applications. In fact, fuzzy object models are not considered in standard modeling
languages like the Unified Modeling Language (UML), and they are not
supported by any kind of free or commercial persistence system. This situation
is aggravated by the fact that object databases are currently considered niche
technologies (Kim, 2003) that have not reached a state of wide industrial
adoption, except for specialized applications like CAD/CAM, resulting in a lack
of common physical and distribution architectures.
Consequently, the case for fuzzy extensions to object databases requires the
practical integration of research models in existing products and programming
interfaces. Such pragmatically directed integration efforts should take as a point
of departure the existing mindset conformed by the most-used object-oriented
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
languages (like Java or C++) and database systems (converging on ODMG and
more recently, on JDO), considering consistency and ease of understanding as
the primary concerns. Extensions to database or object design artifacts should
first come in the forms of strictly additive increments, so that the (crisp)
semantics of the previous models remain unaffected for backward compatibility.
But this is not always easy, because generalizations often require changes in
basic model definitions, like those of existing extensions to ODMG type systems
(De Tr & De Caluwe, 2003) and to UML basic cardinality definitions (Sicilia,
Garca, & Gutirrez, 2002). This chapter describes a concrete selection of basic
fuzzy extensions and their rationales, along with some implementation concerns
regarding their suitability in practical settings.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The extensions must be consistent with existing OODB design or implementation elements. That is, they must be recognizable as generalized or
decorated variants or well-known elements.
2.
3.
The selected extensions at the conceptual level must not express a concrete
imprecision or uncertainty handling procedure but only reflect properties
that can be captured by average modelers from the domain being modeled.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Figure 1. Example UML diagram with fuzziness at attribute and class levels
fuzzy
A
-a : AValueScale
interval -b : double(idl)
poss -c
enumeration
AValueScale
+very_low
+low
+medium
+high
+very-high
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
R = {((x, y ), R ( x, y ) ) | ( x, y X Y )}
(1)
All the relation concepts can be extended to the n-ary case, where
R (X 1 , X 2 , K , X n ) X 1 X 2 K X n
(2)
We will restrict ourselves to the binary case, because it is the most common case
in database applications. Fuzzy associations can be represented as literal tuples
between model elements that hold an additional value representing their membership grade to the association. This assumption implies some constraints in the
implementation of bidirectional associations, because both association ends
should be aware of updates on the other.
Fuzzy associations are represented in UML models by simply adding a <<fuzzy>>
stereotype, for the sake of maximum consistency, as first proposed in (Gutierrez,
Sicilia, & Garcia, 2002). The interpretation of the association is expressed by
additional substereotypes, but at the modeling and database representation level,
the top stereotype could suffice in most common domain modeling situations.
Additional restrictions on associations are represented, as usual, with OCL
constraints. The use of fuzzy cardinalities would require a change in the UML
meta-model, so that we could use annotations for the many (denoted by the
symbol *) cardinality to specify them. In any case, cardinality restrictions do not
affect physical representation but only update semantics, which are usually
enforced by the application, even in the crisp case. An example of association
design will be described later.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
The three elements can be used to make a choice for the underlying collections
supporting them, which may eventually be changed dynamically, reflecting
changes in the cardinality of the participating instances. Cardinalities of classes
and associations become the raw data required to build benchmarking suites, but
also consider the tolerance of queries for each given application to low
membership (relevance) of retrieved objects in general. This indicates that
tolerance becomes a dimension that must be considered when evaluating a fuzzy
OODBMS.
Information granulation is viewed as a form of compression inspired in human
perceptual processes (Zadeh, 1997). As such, the degree of granulation a given
application tolerates impacts on the storage requirements and on the domain of
the types that hold the information, also constituting a dimension in the assessment of database systems for which further research would be necessary.
In addition, the adequacy of fuzzy databases can be approached from the
perspective of the concept of epistemological adequacy, proposed by McCarthy
(1981). Here the perspective is that of assessing the matching of the representational structures used with the actual forms of uncertainty or imprecision
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
inherent to the domain being modeled. Currently, this kind of assessment can only
be carried out by contrasting taxonomies of information imperfection (Smets,
1997) with an explicit modelers concern for these kinds of imperfection in the
domain.
Case Studies
In this section, we illustrate some of the issues described in the previous sections
through concrete technological artifacts. First, the extension of JDO database
programming interfaces is discussed, and then performance issues regarding a
small footprint persistence engine and a full-fledged database server are
described.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
The query language JDOQL uses Java syntax for the specification of queries,
which are essentially Boolean filters on instance collections. Because queries
are specified as Strings, the approach, to provide maximum consistency and roleexpressiveness, is that of leaving the syntax unaffected and simply handling
fuzziness implicitly in operators. A typical extended query example is the
following:
String filter =
address.state == state && +
salary >= sal && +
department.name.startsWith(deptName) && +
projects.contains(proj) && +
proj.budget > 10000000;
Extent extent = pm.getExtent(ProductiveEmployee.class, true, asc;min=0.01);
Query query = pm.newFuzzyQuery(extent, filter);
((FuzzyQuery)query).interpretAllFuzzy();
query.declareImports(import Project);
query.declareVariables(Project proj);
query.declareParameters(
String state, String deptName, int sal);
Collection result = (Collection)query.execute(
Georgia, Network, new Integer(100000));
In the above example, ProductiveEmployee is a fuzzy subclass of the employees
who performed properly in the last quarter, according to imprecise criteria. Their
extents are filtered with a degree of 0.01, and then a conventional JDOQL query
is passed to a query object with fuzzy capabilities. The invocation to
interpretAllFuzzy indicates to the query resolution process that all the operators in
its filters are to be interpreted in fuzzy terms, and consequently, the and logical
operator (&&) will also produce the combination of scores according to a T-norm.
Alternatively, the interfaces of FuzzyQuery could be used to force the interpretation of fuzziness only in some of the filters that are affecting the query. This
approach to extending JDOQL is similar to that used in fJDBC (Sicilia, Garca,
Daz, & Aedo, 2002), and puts fuzziness as an optional feature, because
subsequent iteration may choose to discard membership values. It should also be
noted that complex approaches to object comparison (Marn, Medina, Pons,
Snchez, & Vila, 2003) could be implemented without changing the JDOQL
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
FuzzyUnorderedAssociationEnd
-assoc
fuzzy
-assoc
(a)
(b)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
(3)
where R is the level set of R, R denotes an a-cut of the fuzzy relation, and R
is a fuzzy relation as defined in Equation (4):
R ( x, y ) = R ( x, y )
(4)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
: Set
: FUAE
u1 : User
mu = 1
: Set
mu = 0.6
u2 : User
music : Subject
: Set
: FUAE
mu = 0.2
u3 : User
0.45 : Set
mu
0.8 : Set
u3 : User
mu
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
requiring instance selection based on fuzzy degrees. This situation points out the
necessity of separating the fuzzy mappings from the rest of the information on
fuzzy objects. That separation of objects and their membership degrees is a
concrete realization of the HeadBody Split technique described in Visnick
(2003). As a general database design pattern, it can be synthesized in the
following Java-like declarations using a simple delegation scheme:
// Original class
private XN xN;
// membership grade:
private double mu;
// methods
Once the split into two classes is done, the database designer must allocate
instances of FuzzyClass_Crisp classes in separate physical units, so that only the
lighter version of the instances of fuzzy class X are required to filter by
membership, resulting in decreased data transfer loads.
In the case of fuzzy associations, the collections that hold the mappings of pairs
of instances should be isolated in independent clusters, so that clients are able to
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
first retrieve the entire fuzzy subset of the Cartesian product, select the fuzzy
links that are interesting for the given functionality, and then retrieve the subset
of pairs of instances that are relevant according to their degrees. The rationale
for such a technique is analogous to the Isolate Index technique described in
Visnick (2003). To summarize, cache-based object architectures require that
computations with membership grades be handled on the client side, so that
degrees of fuzzy classes or associations that are in a working set should be
clustered together.
Future Trends
The eventual widespread adoption of fuzzy object-oriented technology will be,
necessarily, accompanied by a generalized interest in fuzziness as a first-class
citizen in conceptual models and programming technology. Fuzziness generalizes
common crisp modeling constructs to a higher level of flexibility that is not always
required, so that a careful and progressive selection of the fuzzy extensions that
are introduced becomes crucial. A modular extension for fuzziness of the UML
language continuing previous work (Sicilia, Garca, & Gutirrez, 2002) and
leveraging existing research on fuzzy conceptual models (Chen, 1998) may
represent an important step in that direction, especially now that its 2.0 major
version provides improved extension mechanisms.
Moreover, one of the major current drivers of database technology is the
specificity of Web information, which benefits from the navigational structure of
object stores. Recent advances in Web information storage and management
(May & Lausen, 2004) go a step further in the integration of object models with
the specifics of the hypermedia structure of the Web. In addition, provided that
the vision of a Semantic Web (Berners-Lee, Hendler, & Lassila, 2001) eventually becomes a reality, the amount of metadata expressed in XML-based
languages like RDF will call for new requirements on object models and
databases, and also new query languages (Karvounarakis et al., 2003). Consequently, research on the integration of fuzziness in languages for the description
of Web resources represents an important direction that has yet to be addressed
in a number of research works regarding fuzzy description logics (see, for
example, Straccia, 2001) and their practical applications for Web management
issues (Sicilia, 2003).
With respect to the design and implementation of ODB systems, aspect-oriented
design (AOD) represents a promising new technology that may eventually be
used to add fuzziness to object database models, isolating the storage and
computation of membership degrees from the functionality that is not affected
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
by them, extending existing related work (Rashid & Sawyer, 2001). Consequently, fuzziness can be considered a cross-cutting concern in information
systems, and its management can be modularized in aspects or other similar
design-level constructs to clearly differentiate it (Sicilia & Garca, 2004). This
would eventually result in aspect-enabled object data stores enabling the storage
handling of uncertainty and imprecision at the programming language level (e.g.,
using the popular aspect-j Java extension5), without changing the crisp
classes. This would result in a cleaner separation of concerns than those using
conventional inheritance (Yazici, George, & Aksoy, 1998).
Conclusions
The introduction of fuzziness in existing OODB models must be carried out by
considering existing database design and programming practices to make the
extensions easier to understand and adopt by practitioners not knowledgeable in
fuzzy set theory or related mathematical frameworks for uncertainty. This
approach is proposed as a way to foster fuzzy technology adoption by the
community of orthogonal-persistence developers. Using consistency and selfand domain closeness as general criteria, a restricted subset of the rich array of
proposed fuzzy extensions is selected, comprising fuzzy classes and inheritance
(respecting intensional definitions), fuzzy associations as specific fuzzy relations,
and fuzziness at the attribute level implemented as class responsibilities.
A number of issues regarding the physical storage and representation of such
fuzzy extensions were described and illustrated through case studies. First, the
integration of fuzziness with standard fuzzy database access interfaces was
illustrated with the JDO API. Second, the importance of representing membership degrees in compact form was illustrated through a case study about the db4o
database engine. This association design approach provides improved performance in operations that involve link retrieval by membership value, and adds no
significant time overhead in common collection iteration processes. In addition,
it was illustrated how cache-based architectures for ODBs like that of
ObjectStore call for physical grouping techniques that must take into account
the fact that computation with membership degrees occurs previous to actual
data transfer processes.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
References
Aksoy, D., & Yazici, A. (1993). Criteria for evaluating fuzzy object oriented
database models. In E. Gelenbe (Ed.), Proceedings of the Eighth International Symposium on Computer and Information Sciences (pp. 136
143).
Aksoy, D., Yazici, A., & George, R. (1996). Extending similarity-based fuzzy
object-oriented data model. In K. M. George, J. H. Carroll, D. Oppemheim,
& J. Hightower (Eds.), Proceedings of the 1996 ACM Symposium on
Applied Computing (pp. 542546). New York: ACM Press.
Atkinson, C., & Khne, T. (2000). Strict profiles: Why and how. In A. Evans,
S. Kent, & B. Selic (Eds.), UML 2000 The Unified Modeling
Language, Third International Conference (Lecture Notes in Computer
Science 1939, pp. 309322). New York: Springer.
Atkinson, M. P., Daynes, L., Jordan, M. J., Printezis, T., & Spence, S. (1996).
An orthogonally persistent Java. ACM Sigmod Record, 25(4), 6875.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic Web.
Scientific American, 284(5), 3443.
Boss, B., & Helmer, S. (1999). Index structures for efficiently accessing fuzzy
data including cost models and measurements. Fuzzy Sets and Systems
108(1), 1137.
Cao, T. H., & Rossiter, J. M. (2003). A deductive probabilistic and fuzzy OODB
language. Fuzzy Sets and Systems 140(1), 129150.
Cattell, R., Barry, D., Berler, M., Eastman, J., Jordan, D., Russell, C., et al.
(2000). The object data standard: ODMG 3.0. San Francisco, CA:
Morgan Kaufmann Publishers.
Chen, G. (1998). Fuzzy logic in data modeling: Semantics, constraints, and
database design. Norwell, MA: Kluwer.
Cross, V., & Firat, A. (2000). Fuzzy objects for geographical information
systems. Fuzzy Sets and Systems 113(1), 1936.
Davis, R., Shrobe, H., & Szolovits, P. (1993) What is a knowledge representation? AI Magazine, 14(1), 1733.
de Caluwe, R. (Ed.). (1998). Fuzzy and uncertain object-oriented databases: Concepts and models (Advances in Fuzzy Systems, Applications
and Theory, Vol. 13). River Edge, NJ: World Scientific.
de Tr, G., & De Caluwe, R. (2003). Level-2 fuzzy sets and their usefulness in
object-oriented database modeling. Fuzzy Sets and Systems 140(1), 29
49.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Dubois, D., Prade, H., & Rossazza, J. P. (1991). Vagueness, typicality and
uncertainty in class hierarchies. Int. Journal Intelligent Systems, 6, 167
183.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns:
Elements of reusable object oriented design. Boston, MA: Addison
Wesley.
Green, T. R. G. (2000). Instructions and descriptions: Some cognitive aspects of
programming and similar activities. In V. Di Ges, S. Levialdi, & L.
Tarantino (Eds.), Proceedings of Working Conference on Advanced
Visual Interfaces (pp. 2128). New York: ACM Press.
Green, T. R. G., & Petre, M. (1996). Usability analysis of visual programming
environments: A cognitive dimensions framework. Journal of Visual
Languages and Computing, 7(2), 131174.
Gutirrez, J. A., Sicilia, M. A., & Garca, E. (2002). Integrating fuzzy associations and similarity relations in object oriented database systems. In
Proceedings of the International Conference on Fuzzy Sets Theory
and Its Applications (pp. 6667).
Hansen, D., Adams, D., & Gracio, D. (1999). In the trenches with ObjectStore.
Theory and Practice of Object Systems, 5(1) 201207.
Hosking, A. (1995). Benchmarking persistent programming languages: Quantifying the language/database interface. In Proceedings of the OOPSLA95
Workshop on Object Database Behavior, Benchmarks, and Performance.
Inoue, Y., Yamamoto, S., & Yasunobu, S. (1991). Fuzzy set object: Fuzzy set as
first-class object. In Proceedings of IFSA 1991 (pp. 7073).
Kao, D., & Archer, N. P. (1997) Abstraction in conceptual model design.
International Journal of HumanComputer Studies, 46(1), 125150.
Karvounarakis, G., Magkanaraki, A., Alexaki, S., Christophides, V., Plexousakis,
D., Scholl, M., et al. (2003). Querying the semantic Web with RQL.
Computer Networks, 42(5), 617640.
Kim, W. (2003). A retrospection on niche database technologies. Journal of
Object Technology, 2(2), 3542.
Klir, G., & Wierman, M. (1998). Uncertainty-based information: Elements of
generalized information theory (Studies in Fuzziness and Soft Computing, Vol. 15). New York: Springer-Verlag.
Koprulu, M., Cicekli, N. K., & Yazici, A. (2003). Spatio-temporal querying in
video databases. Information Sciences (to appear).
Ma, Z. M., Zhang, W. J., & Ma, W. Y. (2003). Extending object-oriented
databases for fuzzy information modeling, Information Systems (in press).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Marn, N., Medina, J. M., Pons, O., Snchez, D., & Vila, M. A. (2003). Complex
object comparison in a fuzzy context. Information and Software Technology, 45(7), 431444.
May, W., & Lausen, G. (2004). A uniform framework for integration of
information from the Web. Information Systems, 29(1), 5991.
McCarthy, J. L. (1981). Epistemological problems of artificial intelligence. In B.
L. Webber, & N. J. Nilsson (Eds.), Readings in artificial intelligence (pp.
459465). Los Altos, CA: Kaufmann.
Medina, J. M., Pons, O., & Vila, M. A. (1994). GEFRED. A generalized model
of fuzzy relational databases. Information Sciences, 76(12), 87109.
Nepal, A., Ramakrishna, M. V., & Thom, J. A. (1999). A fuzzy object query
language (FOQL) for image databases. In A. L. P. Chen, & F. H.
Lochovsky (Eds.), Proceedings of the Sixth International Conference
on Database Systems for Advanced Applications (pp. 117127).
Piscataway, NJ: IEEE Press.
Object Management Group: OMG Unified Modeling Language Specification, Version 1.3 (1999).
Rashid, A., & Sawyer, P. (2001). Aspect-orientation and database systems: An
effective customisation approach. IEE Proceedings Software, 148(5),
156164.
Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., & Lorenson, W. (1996).
Object oriented modeling and design. Upper Saddle River, NJ: Prentice
Hall.
Russell, C. et al. (2001). Java Data Objects (JDO) Version 1.0 proposed final
draft, Java Specification Request JSR000012.
Schenker, A., Last, M., & Kandel, A. (2001). Fuzzification of an object-oriented
database system. International Journal of Fuzzy Systems, 3(2), 432
441.
Sicilia, M. A. (2003). The role of vague categories in semantic and adaptive Web
interfaces. In R. Meersman, & Z. Tari (Eds.), Proceedings of the
Workshop on Human Computer Interface for Semantic Web and Web
Applications (Lecture Notes in Computer Science 2519, pp. 210222).
New York: Springer Verlag.
Sicilia, M. A., & Garca, E. (2004). On imperfection in information as an early
crosscutting concern and its mapping to aspect-oriented design. In Proceedings of the Early Aspects Workshop: Aspect-Oriented Requirements Engineering and Architecture Design (to appear).
Sicilia, M. A., Garca, E., & Gutirrez, J. A. (2002). Integrating fuzziness in
object oriented modelling languages: Towards a fuzzy-UML. In Proceed-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Yazici, A., George, R., & Aksoy, D. (1998). Design and implementation issues
in the fuzzy object-oriented data model. Information Sciences, 108(14),
241260.
Zadeh, L. (1997). Toward a theory of fuzzy information granulation and its
centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems,
90(2), 111127.
Endnotes
1
http://java.sun.com/products/jdo/
http://www.db4o.com/
http://sodaquery.sourceforge.net/
http://www.objectstore.net/
http://eclipse.org/aspectj/
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
SECTION IV
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Chapter IX
An Object-Oriented
Approach to Managing
Fuzziness in Spatially
Explicit Ecological
Models Coupled to a
Geographic Database
Vincent B. Robinson
University of Toronto at Mississauga, Canada
Phil A. Graniero
University of Windsor, Canada
Abstract
This chapter uses a spatially explicit, individual-based ecological modeling
problem to illustrate an approach to managing fuzziness in spatial databases
that accommodates the use of nonfuzzy as well as fuzzy representations of
geographic databases. The approach taken here uses the Extensible
Component Objects for Constructing Observable Simulation Models (ECOCOSM) system loosely coupled with geographic information systems. ECOCOSM Probe objects flexibly express the contents of a spatial database
within the context of an individualized fuzzy schema. It affords the ability
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
270
to transform traditional nonfuzzy spatial data into fuzzy sets that capture
the uncertainty inherent in the data and models semantic structure. The
ecological modeling problem was used to illustrate how combining Probes
and ProbeWrappers with Agent objects affords a flexible means of handling
semantic variation and is an effective approach to utilizing heterogeneous
sources of spatial data.
Introduction
Progress in global connectivity has led to a situation where we now need to deal
with more heterogeneous information consisting of a broad variety of digital
spatial/geographical data and address operational sources, such as simulation
models, which create new data and information. The scale of the problem has
changed from just a few databases to thousands, perhaps millions, as geographical information resources. Such new resources are most often added independently to the accessible set of resources without regard to the myriad end-uses
that may be applied to them (Mackay, 1999). Thus, spatially explicit information
resources may be used in many different contexts without regard for the
underlying uncertainties of the data, or their relationships to the semantics of the
problem domain (Robinson & Frank, 1985; Burrough & Frank, 1996). Although
such uncertainties in geographic databases have been recognized for decades,
it would be extraordinary to have institutional databases contain anything as
detailed as fuzzy membership values or other detailed measures of uncertainty
attached to objects or tuples.
Geographic databases with no explicitly recorded uncertainty measures are
commonly used as the basis for computationally intensive investigations of
complex ecological systems. One major approach that developed over the past
few decades is individual-based modeling (IBM) (Grimm, 1999; Lomnicki, 1999;
Bian, 2003). It is a computational approach to modeling a system through the
interaction of atomic models of each individual inhabiting the system. They
provide several advances over traditional ecosystem models. Foremost among
the advances is the fact that they discard the assumption that there is some
average, or mean, individual that adequately represents every individual in a
population. They also dispose of the assumption that significant interactions take
place evenly across populations. Such models are usually spatially explicit,
allowing interaction between individuals to occur over a wide range of space.
Importantly, they are able to represent the biological, physiological, and behavioral distinctions seen in individuals in the real world. Because the individual is
the atomic unit, the simulation is able to take spatially explicit localized interac-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
272
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
The next section outlines the key concepts that link individual-based ecological
models, agent-based modeling, object-oriented design, and GIS databases, and
presents the primary challenges of representing fuzziness in such a complex
application domain. Then we present a conceptual overview of the squirrel
dispersal model we use as an illustrative example throughout this chapter. The
architecture of the modeling framework that was used to implement the model
is then described, and some of its key features that provide a solution to the
challenges of this problem domain are explained. The section on fuzzy spatial
relations and database query illustrates how context-specific fuzzy spatial
relations can be created ad hoc to constrain database queries. Then we present
an innovative way to add fuzzy information to a conventional, nonfuzzy GIS
database not only within a models context, but also within the variable context
of individual model objects. The next section demonstrates the utility of deriving
fuzzy information from a nonfuzzy GIS database at the individual level by
presenting differences in modeled squirrel dispersal according to individual
variation in perception of the environment and variation in the decision-making
process. We conclude the chapter with discussion of the strengths, limitations,
and future possibilities of this approach.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
274
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
276
in which the modeled agents should perceive the same landscape if they are
to remain operationally consistent with the modeled domain.
In this approach to modeling individual animals dispersing across a landscape,
animal objects pose spatial queries to the landscape to acquire information. Like
their counterparts in the real world, they are able to acquire information about the
landscape only within a certain distance determined by the animals perceptual
range (Mech & Zollner, 2002; Zollner, 2000) or finite range of vision (Fahse et
al., 1998). That information is then processed to determine the specifics of which
movement behavior to pursue. Ruckelshaus et al. (1997) suggested that errors
in dispersal parameters have much larger consequences for predicting dispersal
success than do errors in landscape classification. Their conclusions suggest that
uncertainty surrounding dispersal parameters is a significant problem that
ecological models and modelers must face.
The role of fuzzy sets in the representation of objects in geographic databases
for a variety of applications has received considerable attention. However, the
usual approach is to address the representation of uncertainty directly, in some
fashion, with the objects stored in a database (Cross & Firat, 2000; Yazici &
Akkaya, 2000) or as part of the query subsystem (Yazici & Akkaya, 2000;
Morris, 2003). Although appropriate in many applications, such approaches have
limitations when using geographic databases in the context of information-based
simulation modeling of complex environmental and ecological processes. The
simulation models have their own semantics that may be distinct from or
unknown to the database author, the user, or other models (or submodels). This
is especially relevant when trying to reconcile the semantics of the original
observations with the semantics of a simulation modeling domain. In addition,
most complex environmental modeling domains contain many models and
submodels that interact with one another, consequently generating semantic
errors (see Mackay & Robinson, 2000; Mackay, 1999). Furthermore, Robinson
(2000) showed that in an object-oriented database with a visual query system,
environmental simulation models may be embedded in the query or in the query
results. In this case, the user may have one set of semantics in mind that may,
or may not, be consistent with the semantics of the simulation models being used
to generate the answer to the query. In fact, there may be no reconciliation
process. That led to research into methods for modeling semantic agreement and
model self-evaluation (Mackay & Robinson, 2000; Mackay, 1999) and would
seem to justify embedding more intelligence into such systems. Therefore, we
use the concept of Probes in an object-oriented, agent-based system as a
practical means of addressing issues of fuzziness in spatially explicit data, while
at the same time maintaining the integrity of large, complex simulation projects.
From a modeling perspective, this approach can substantially reduce artifacts
caused by parameter uncertainty (Robinson & Graniero, in press).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
278
given the nature of the problem, it is possible that more than one location will have
the same maximum value. In that case, should there be ties, the first one in the
list is chosen (i.e., a lazy sufficing strategy). On moves beyond the first, there
is the question of directional bias. Based on previous work reported in the
ecological literature, a bias to move in the general direction of the last move is
incorporated in the decision set. In that case, should there be ties, a random
location among the candidate set (D M) is chosen (i.e., an exploratory sufficing
strategy).
Once the animal object has moved to a location, it must then decide whether it
is a location suitable for stopping its dispersal movement. Like the movement
decision model, this is one in which relevant goals (G R) and constraints (CR) are
expressed in terms of fuzzy sets, and a decision is determined by an appropriate
aggregation of the fuzzy sets (Bellman & Zadeh, 1970; Klir & Yuan, 1995). In
the residence decision model, the animal is constrained by whether or not its
current location is sufficiently spatially separated from conspecifics that a home
range can be established, while the goal is to have habitat of sufficient area.
Finally, a decision rule is applied to the decision set that leads to the animal taking
up residence at the location or attempting a move to another location. The details
of this decision model are presented in Robinson and Graniero (in press).
Because this work is focused on modeling natal dispersal, we use the residence
decision primarily as a stopping rule. Future elaborations will incorporate
exploratory movement so that the agent explores the vicinity around its destination and uses that information in a more sophisticated decision process than
presented here, to choose whether to establish a home range or not. However,
at the present, we simplified the decision to address just a few key criteria that
were suggested by the literature (Allen, 1987; Wolff, 1999).
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Scheduling Subsystem
Central to the operation of the system is the Scheduling subsystem. The Clock
and Schedule objects are the primary component objects of the Scheduling
subsystem. Each program is constrained to include only one instance of each.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
280
Any object in the simulation program may access the Clocks time or add actions
to the Schedule. The Schedule object keeps track of all pending actions. It
decides which action should occur next and triggers that event. Currently,
scheduling is an event-driven structure, but discrete time step models may be
constructed by adding regularly occurring step actions that reschedule themselves every time step.
Modeling Subsystem
The Modeling subsystem provides the main components for constructing a
simulated world. The spatial and temporal structure of the world is defined by the
specific choice of object modules. The primary high-level object is the World,
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
282
Instrumentation Subsystem
The Instrumentation subsystem provides the information-access structures that
allow model components to discover the state of other components in a controlled
and safe fashion, ensuring the consistency and integrity of the source databases
and the models overall operating state. The ability to collect data from the
running model is made possible by the Probe/Probeable interface mechanism.
Many of the objects in the Modeling subsystem implement the Probeable
interface as well as fulfill their own modeling functions. Probes can only be
created by Probeable objects; a request is made to the target Probeable object
via its getProbe() method, specifying the desired type of Probe using a keyword.
Each type of Probe is designed to query a specific aspect of the Probeable
objects state. Whenever the Probes probe() method is invoked (e.g., by a
ProbeCommand on the Schedule, or by an Agent requiring current information about another object), the Probeables appropriate private data access
method or database query is invoked. As an example, in order to access the data
within a Grid (which is a Probeable object), the client object must call the
Grids getProbe() method, and the Grid will return an appropriate Probe
object. When that Probes probe() method is invoked, it will invoke its target
Grids getValueAt() method using the Probes current Location as a parameter. The resulting value is passed to the Probe, which in turn queries the Grids
state at that Location and passes the result to the object using the Probe. Using
this structure, a Probeable object only exposes attributes that are deemed
public knowledge to external objects. In order to keep other attributes
inaccessible, it does not distribute Probe objects that expose those attributes. At
the same time, the Probeable object keeps the access mechanism for those
attributes hidden from public knowledge. All Probes simply respond to a
probe() method, and what happens within that method is kept opaque to the user.
This allows database sources, implementations, or architectures to change with
Figure 4. Structure of the Probe, Probeable, and ProbeWrapper relationship
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
no effect on the other model components. All Probes create read-only mechanisms; external objects never have direct access to the Probeable objects state,
which means that they cannot accidentally change the object due to programming errors.
ProbeWrappers extend the power of the Probe mechanism. A ProbeWrapper
is a specialized Probe that has another Probe embedded within it (Figure 4). A
ProbeWrapper is used to modify the pure result retrieved from a Probeable
object in some way (Figure 5). For example, the land-cover type observed at a
distance may be subject to random misclassification due to limits of perceptual
range. Alternatively, the states description scheme may be modified to suit the
purpose of the observer: the grid cell may be described as mature oak in the
land- cover Layer, but the observing Agent may perceive it as suitable location
for inhabiting.
Because ProbeWrappers are also Probes, an object (such as an Agent) can
use either pure Probes or Probes that are modified by ProbeWrappers
transparently, with no knowledge of the difference. By wrapping Probes in
slightly different ways for different individual Agents of a common type, it is
possible for the modeler to introduce variation in an individuals ability to perceive
the world, while using the same basic decision-making process. ProbeWrappers
may be nested as deeply as desired, so highly sophisticated perceptual filters
may be constructed. In addition, some specialized ProbeWrapper objects can
take the results of many nested Probes and combine their results in some
fashion, for example, returning the land-cover class that appears in the majority
of grid cells in a 55 window centered on the Probes Location. In this way,
it is possible to create views of the modeled world and its components at different
scales of observation, yet treat them all in the decision-making process as
identical, localized observations.
The Instrumentation subsystem also allows the modeler to instrument the
operating simulation model in order to monitor the models evolution and collect
data for later analysis. A Sampler is made up of a set of one or more Probes
Figure 5. When the client object invokes the Probes (in this case a
ProbeWrapper) probe() method, the call passes through to the embedded
Probe. The Probeable object returns the state value x to the Probe, which
passes the value on to the ProbeWrapper. The ProbeWrapper transforms
the value by some function F(x), and returns the transformed value to the
client.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
284
that perform the actual queries about system state. The Sampler will typically
take the Probe results and format them in an organized fashion for output to a
file on disk, or for periodic output to the computer console to inform the user on
progress. Data files produced by a Sampler may be used in other separate
analysis programs to generate summary statistics from a large number of model
runs.
The Simulation object acts as the core engine of the simulation model. It
manages the interaction of the components in the three subsystems. The setup()
method structures the simulation appropriately for the desired model, attaches
any instrumentation desired, and acquires any necessary memory or file resources required for the model. The run() method is simple: until the Schedule
is finished, it will trigger the next pending item on the Schedule. The teardown()
method releases any memory or file resources and gets ready for program
termination. The Simulation object may be instantiated and executed as an
independent, stand-alone program. It can also act as a pure object that is
contained in a larger program, such as a simulation experiment that executes
many instances of the Simulation object, each of which has slight variations in
its selection and configuration of model components.
1
if
c
P ( x; , ) = p ( x) = ( d x ) + 1 if
0
if
d xc
< d xc < + 1 /
+ 1 / d xc
(1)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
286
Once the perceptual range over an assumed flat surface, i.e., P, is specified, the
next step is to determine to what degree each location is within the visible
perceptual range. In other words, the influence of local topography is taken into
consideration. Let L: X [0,1] be the fuzzy set describing the degree to which
location x is visible from a particular squirrel. The membership function for L is
defined by Equation (2) as a closed-form triangular function, where loscx is the
angle at which location x is visible from location c. It is based on the output style
of GRASS GIS (Neteler & Mitasova, 2002), where 90 is looking straight ahead,
below the line of sight is less than 90 , and above the line of sight is greater than
90. If the local terrain creates a physical obstruction to visibility between c and
x, then L = 0.
los c los xc
,
L ( x; , , ) = L ( x ) = max(min x
,0 )
(2)
The degree to which a cell is both visible and falls within the perceptual range
is defined by = PL. This operation takes into account the level plain
perceptual distance and the potential effect topography may have on the ability
of an object to perceive a location. To make it an efficient process, we need only
calculate the value of L for the locations that fall in 0+P, thus 0+P defines spatial
extent over which information from the spatial database is extracted and utilized
by the individual agent. In the code for defining an Agent, the statement
spots = regPerceptualRange.getAllLocations();
in effect limits the calculation of L to those locations (x), spots that fall within
the set
. Subsequently, the membership values in lyrPerceptualRange and
lyrVisibility are combined using an aggregation operator to arrive at a spatial
object, lyrVisiblePerceptual, which is referenced by an Agent as its individual
visible perceptual range at that particular location at that time step in the
simulation.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
288
1.0
0.9
0.75
LC ( ) = LC ( ) = 0.0
0.0
0.0
0.0
if
oak _ forest
if
if
if
deciduous _ forest
conifer _ forest
if
if
if
(3)
When the Agent must assess the habitat quality, it does so by requesting the
habitat quality membership value from its corresponding Probe, which acts as
its sensory interaction with the surrounding environment. Operationally, the
Probe queries the spatial database for the habitat quality membership value at
the Agents current location and returns that membership value to the Agent.
In addition to land cover, we use the size of an oak/deciduous forest patch as an
important factor in the residence decision. In Equation (4), we define a fuzzy set,
HA, to express the degree to which a location falls within the class of
minimum_habitat_area. The setting of the parameters HA and HA will vary
depending on the species being modeled. The area measurement is based on the
sizes of patches formed from contiguous cells that were classified as oak,
deciduous, or oak/deciduous bottomland. Let farea() be the area in hectares of
the oak/deciduous forest patch within which that location falls.
Cognitively, the Agent is assessing the size of the oak/deciduous forest patch;
operationally, it is calculating a new fuzzy membership based on forest patch
sizes encoded in a raster, which resulted from a clumping operation on the
same land-cover raster used for evaluating habitat quality. The minimum area
Probe accesses the value of the forest patch grid cell corresponding to the
Agents location and returns the value to the Agent, which then calculates the
fuzzy membership according to Equation (4).
Thus, each Agent has a number of SpatialProbes, that is, Probes that can each
be directed to a specified Location on a target Layer in order to collect
information from that specific Layer. Figure 6 shows how an Agent gets the
Spatial Probe prblLCHabitat for the Probeable Layer lchabitat, which
corresponds to LC above, and the Spatial Probe prbForestArea for the
Probeable Layer forarea, which corresponds to farea() in Equation (4).
farea ( ) HA
HA( ) = HA ( ) = max 0, min1,
HA
HA
(4)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
)
);
HabitatGoal = FuzzyOp.compensatoryIntersection(
((Number)(prbLCHabitat.probe())).doubleValue(),
_HabitatArea
);
Recall from above that the land-cover type layer was preprocessed by the GIS;
lchabitat contains the fuzzy membership value, not the actual land-cover type.
This means that the SpatialProbe retrieves the fuzzy membership value and
passes it to Agent without any intermediate processing. Notice that in the case
of forarea, the Agent must do additional processing on the Probes result
before forming the goal set, as shown in Figure 7. In contrast to the preprocessed
fuzziness for LC, HA is fuzzified after crisp data are queried from the database.
In the earlier description of how an Agent uses a Probe to assess the local
habitat suitability, the entire land-cover raster was preprocessed according to
Equation (3), and the Probe accessed the grid cell values in the transformed
raster. This approach requires that each grid cell be converted only once rather
than every time the grid cell is considered by an Agent, thus streamlining the
computation. However, this restricts the flexibility for more sophisticated IBM
models, because it presumes that all Agents in the system perceive the habitat
quality of a particular land-cover type in the same way.
Different animal species, and perhaps even different individuals of the same
species, may map land-cover classes to slightly different membership values.
This necessitates the calculation of separate rasters for each remap equation,
which creates a much larger database. It also creates risk for database integrity
should the original land-cover map change and the remapped rasters not be
updated accordingly. Also, consider the case of a more intelligent agent that
evolves its perception of habitat quality as it gains experience over its lifetime.
Each change to the remap equation, i.e., each evolution in the Agents
perception, would require a recalculation of its corresponding habitat quality
raster, increasing the computational burden for the model.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
290
The ProbeWrapper provides the key mechanism with which to avoid these
problems. Recall that every instance of a ProbeWrapper implementation has
another Probe (possibly a ProbeWrapper) embedded within it. When the
ProbeWrappers probe() method is invoked, it, in turn, invokes the embedded
Probes probe() method. When it receives the embedded Probes result, the
ProbeWrapper may perform any kind of operation on it before passing it on as
its own result.
As such, the remap equation can be embedded within a habitat quality
ProbeWrapper that contains the following:
1.
2.
3.
2.
Program logic that applies Equation (4) with the ProbeWrappers particular parameters
3.
The transformation code shown in Figure 7 moves out of the Agent and into its
minimum habitat area ProbeWrapper.
By using the ProbeWrapper approach, the Agent directly perceives the
habitat quality of its current position according to its own value scheme, and all
model logic occurs within the universe of discourse defined in the fuzzy problem
domain. The Probe handles the mechanics of accessing the spatial database,
thereby insulating the Agents model logic from database-dependent programming issues. The ProbeWrapper takes the query result from the Probe and
independently manages the transformation from the GIS relatively applicationneutral, crisp land-cover scheme to the Agents application-specific, fuzzy
perception of habitat quality. They may all access a single, shared land-cover
raster, and they may modify their perceptions of habitat quality at any time, with no risk
of compromising the database integrity or the behavioral integrity of other Agents.
There are many other ways in which the ProbeWrapper structure may be used
to control fuzzification of a spatial database. To illustrate, take an example based
on an early work demonstrating the use of fuzzy sets in the query of land-cover
databases (Robinson, 1988). Rather than simply retrieving a membership value
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
that was assigned to a land-cover class that represents the land-cover classs
degree of membership in the habitat set, let there be a similarity relation between
the land-cover types. This similarity relation can be a function of the degree of
confidence, or accuracy, felt to be likely at the location. If we assume that a
location has deciduous forest, then we would expect the similarity to be greater
with other forest types, especially oak. Consequently, the ProbeWrapper may
take this knowledge into consideration by assigning a final membership in the set
habitat based on a combination of the land-cover type at a location and its
similarity relation with other land-cover types. A more realistic approach would
be to take into consideration the surrounding cells as an additional information
channel to inform the ProbeWrapper how similar the location is to surrounding
locations. This can provide additional information to be used to estimate how well
the location fits in habitat. For example, a deciduous forest cell surrounded by
water, i.e., an island, would be poor habitat, whereas a deciduous forest cell
surrounded by deciduous forest might be judged high-quality habitat.
As another example, it is well known that no land-cover database is error free.
One long-standing problem has been the mixed pixel problem, where one grid cell
may have more than one land cover present but be forced by classification
methods to be classified as being in a single type of land cover (Robinson &
Thongs, 1986). The inherently fuzzy nature of land-cover classifications was
discussed by many researchers (Robinson & Frank, 1985; Robinson, 2002;
Matsakis et al., 2000; Cross & Firat, 2000; Hagen, 2003; Foody, 1996; Zhang &
Stuart, 2001). In ECO-COSM, ProbeWrappers can be used to implement an
information-processing function that applies a mixed pixel model to the underlying land-cover data, allowing the Agent to evaluate how closely its current
location conforms to a particular land-cover type.
Because land-cover classification is accomplished using remote sensing or other
classification methods that can incorporate fuzziness, the process can be used
to generate fuzzy geographical objects (Matsakis et al., 2000; Cross & Firat,
2000; Foody, 1996). In a simple case that is analogous to the Semantic Import
model (Robinson, 1988), each cell would have a vector of membership values
indicating the degree to which it belonged to a particular land-cover type. Thus,
a ProbeWrapper can use a Probe to access that information and process it
before passing it to the Agent. For example, a vector might look like {0.8, 0.75,
0.66, 0.3, 0.2, 0.2, 0.0}. Now, what information is passed to the Agent? Perhaps
the whole vector is passed, which means that the Agent would need to have a
method of combining it with the function that determines how well the location
fits the set habitat. Notice in Equation (3) that each land-cover type is associated
with a membership in LC, and that in the vector {0.8, 0.75, 0.66, 0.3, 0.2, 0.2, 0.0},
associated with a single grid cell, provides information on the degree to which
that grid cell belongs to a particular land-cover class. Let kLC(x) be the
membership in LC of land-cover type k while kGIS(x) is the membership value
of grid cell x in land cover k. Thus, we have two vectors LC and GIS:
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
292
1LC
1GIS
LC
GIS
LC = 2 , GIS = 2
M
M
LC
GIS
k
k
1LC ,GIS
LC ,GIS
LC
GIS
min( k , k ) = 2
M .
LC ,GIS
k
Then take the maximum value from this vector to represent the degree to which
the grid cell falls in the set habitat. This formulation is able to be quickly
computed by a ProbeWrapper, and there would be no changes required in the
Agent code. In this manner, the Agent only sees the information presented
to it by the ProbeWrapper, and it focuses strictly on the behavioral elements of
the model and leaving the retrieval or derivation of the fuzzy value to the
ProbeWrapper. Thus, with this simple example, we illustrated how fuzziness
could be represented in a geographic database in two different ways and be used
by a ProbeWrapper to deliver meaningful fuzzy information to an Agent, with
no need to adjust the decision model of the Agent.
The other major informational component of the habitat portion of the residence
decision model is membership in HA, the minimum habitat area. It is a function
of the area of the forest patch. The forest patch is defined in a raster GIS as a
collection of grid cells contiguous with one another and of the same type. In a
vector representation, it would be a polygon. One approach is to represent a
fuzzy region, A, as composed of three parts: the core, the indeterminate
boundary, and the exterior. The indeterminate edge can further be decomposed
into the inside edge and the outside edge. If Z is a referential set of a finite number
of attributes and region A is a fuzzy subset defined in a two-dimensional space
42 over Z, the membership function of A can be defined as A: X Y Z [0,1].
Each point is assigned a membership value for attribute z, where z Z (Zhan,
1998). This suggests several possible approaches to representing forest area
patches in this problem domain. In the current illustrative example, the forest
patches are determined according to a crisp membership rule of adjacency, and
then the area is calculated, followed by calculation of HA for each grid cell. This
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
means that the Layer forarea is composed of grid cells, each of which is coded
with membership values that are a function of the area of the patch in which it
belongs. However, if forest patches are fuzzy regions, then this simplistic
approach would need to be changed. Because the grid cell is the atomic spatial
element in our GIS database, the upshot of this approach would be that each cell
(i.e., location) could be a member of more than one patch. In other words, a patch
object may share a location (cell) with another patch object. This problem has
been addressed elsewhere, so the problem is one that has received some
attention in the fuzzy database community (Yazici & Akkaya, 2000; Cross &
Firat, 2000; Cheng et al., 2002; Robinson, 2000; Bordogna & Chiesa, 2003). Of
course, this implies that when estimating the area of a patch for habitat selection
purposes, a location (cell) will contribute to the area of more than one patch.
Hence, fuzzy set theory effectively expands the conventional assumptions
regarding the total area extent of thematic map classes used in nonfuzzy
geographic databases (Ricotta & Avena, 1999). Due to this characteristic of
fuzzy regions, a number of approaches were suggested for estimating the area
of a fuzzy region (Ricotta & Avena, 1999; Schneider, 2001; Yuan & Shen, 2001).
There has been some work on modeling fuzzy regions that exploits the concept
of the -cut, some of which is explicitly linked to the query process (Morris, 2003;
Zhan, 1998; Schneider, 2001; Schneider, 2000). Previous work suggests that the
area of a fuzzy region might be computed as a weighted sum of the areas of all
-level regions (Schneider, 2001). Consider that if F% is a fuzzy region, i.e., a
forest patch, and consists of a finite collection {F1, ..., Fn} of crisp -level
regions, then the area of F% can be computed as in Equation (5):
(5)
In this case, area ( F% ) is a real number that could be used in Equation (4),
corresponding to farea(). There is a problem with this straightforward linkage,
because it is entirely possible, given the nature of fuzzy region objects, that a
single cell will be associated with more than one fuzzy region with a membership
level greater than 0.0. In such a case, a simple rule can be used such that area ( F% )
is calculated for the fuzzy region that bestows the highest membership value on
cell .
An Agent obtains information about the area of forest patch through the Probe
prbForestArea that samples the Layer forarea that contains the value of
farea(). Likewise, it is possible to construct a Layer forarea that would be the
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
294
value of area ( F% ) for that region to which cell belongs to the greatest degree.
Using Probes and ProbeWrappers, it would be possible to develop a multiLayer, multi-Probe approach so that all degrees of membership could be seen
by the Agent. This would necessitate the management of multiple Probes by a
ProbeWrapper. It is possible that the ProbeWrapper might then combine that
information before passing it to the Agent, which would still rely on something
like Equation (4) in its decision-making model. Thus, the decision model would
be kept essentially the same, but through the use of Probes and ProbeWrappers,
the values of inputs used in the decision model would be changed.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Concluding Discussion
We used a spatially explicit individual-based model of a small mammal species
natal dispersal behavior across a real-world landscape to illustrate an objectoriented approach to creating and managing operational fuzzy information in a
spatial database for use in a spatially explicit simulation model. Even though a
small subset of problems in spatially explicit ecological modeling was addressed
in this chapter, it highlights the breadth and depth of the problems that can be
usefully explored in this problem domain. The illustrative problems presented
here have demonstrated that this is a database and modeling domain rich in fuzzy
information-processing challenges. Hence, it is a scientific field of endeavor that
can benefit greatly from advances in fuzzy database modeling and application.
We would also argue that advances in the theoretical realm of fuzzy objectoriented databases could result by devoting attention to the needs of this problem
domain.
One of the major consequences of our use of the ECO-COSM modeling
framework has been our demonstration of the utility of using Probe objects and
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
296
Acknowledgments
Partial support in the form of operating research grants to each of the authors
from the Natural Sciences and Engineering Research Council (NSERC) of
Canada is gratefully acknowledged. We are especially grateful to Professor
Haluk Cetin and the Mid-America Remote Sensing Center (MARC) for
graciously providing the digital elevation and Kentucky GAP land-cover datasets.
Comments by anonymous reviewers improved the quality of this chapter.
References
Allen, A. W. (1987). Habitat suitability index models: Gray squirrel, revised
(United States Fish Wildlife Service Biological Report 82 10.135). Washington, D.C.: United States Department of the Interior.
Anderson, J. (2002). Providing a broad spectrum of agents in spatially explicit
simulation models: The Gensim approach. In H. R. Gimblett (Ed.), Integrating geographic information systems and agent-based modeling
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
298
Neteler, M., & Mitasova, H. (2002). Open source GIS: A GRASS GIS
approach. Boston, MA: Kluwer Academic Publishers.
Petry, F. E. (1996). Fuzzy databases, principles, and applications. Boston,
MA: Kluwer Academic Publishers.
Petry, F. E., Cobb, M. A., Ali, D., Angryk, R., Paprzycki, M., Rahimi, S., Wen,
L., & Yang, H. (2002). Fuzzy spatial relationships and mobile agent
technology in geospatial information systems. In P. Matsakis, & L. M.
Sztandera (Eds.), Applying soft computing in defining spatial relations
(pp. 121155). Heidelberg: Physica-Verlag.
Petry, F. E., Cobb, M. A., Wen, L., & Yang, H. (2003). Design of system for
managing fuzzy relationships for integration of spatial data in querying.
Fuzzy Sets and Systems, 140, 5173.
Rickel, B. W., Anderson, B., & Pope, R. (1998). Using fuzzy systems, objectoriented programming, and GIS to evaluate wildlife habitat. AI Applications, 12(13), 3140.
Ricotta, C., & Avena, G. C. (1999). The influence of fuzzy set theory on the areal
extent of thematic map classes. International Journal of Remote Sensing, 20(1), 201205.
Robinson, V. B. (1988). Some implications of fuzzy set theory applied to
geographic databases. Computers, Environment, and Urban Systems,
12(2), 8997.
Robinson, V. B. (2000). On fuzzy sets and the management of uncertainty in an
intelligent geographic information system. In G. Bordogna, & G. Pasi
(Eds.), Recent issues on fuzzy databases (pp. 109127). Berlin: SpringerVerlag.
Robinson, V. B. (2002). Using fuzzy spatial relations to control movement
behavior of mobile objects in spatially explicit ecological models. In P.
Matsakis, & L. M. Sztandera (Eds.), Applying soft computing in defining
spatial relations (pp. 158178). Heidelberg: Physica-Verlag.
Robinson, V. B., & Frank, A. U. (1985). About different kinds of uncertainty in
collections of spatial data. In Proceedings of Seventh International
Symposium on Automated Cartography (Auto-Carto 7) (pp. 440450).
Bethesda, MD: American Society for Photogrammetry and Remote Sensing and American Congress on Surveying and Mapping.
Robinson, V. B., & Graniero, P. A. (in press). Spatially explicit individual-based
ecological modeling with mobile fuzzy agents . In M. A. Cobb, F. E. Petry,
& V. B. Robinson (Eds.), Fuzzy modeling with spatial information for
geographic problems. Heidelberg: Springer.
Robinson, V. B., & Thongs, D. (1986). Fuzzy set theory applied to the mixed
pixel problem of multispectral landcover databases. In B. K. Opitz (Ed.),
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
300
Geographic information systems in government (pp. 871885). Hampton, VA: A. Deepak Publishing.
Ruckelshaus, M., Hartway, C., & Kareiva, P. (1997). Assessing the data
requirements of spatially explicit dispersal models. Conservation Biology,
11(6), 12981306.
Russell, S., & Norvig, P. (1995). Artificial intelligence: A modern approach.
Upper Saddle River, NJ: Prentice Hall.
Schneider, M. (2000). Metric operations on fuzzy spatial objects in databases.
In Proceedings of the Eighth ACM International Symposium on Advances in Geographic Information Systems (pp. 2126). New York:
ACM Press.
Schneider, M. (2001) Fuzzy topological predicates, their properties, and their
integration into query languages. In Proceedings of the Ninth ACM
International Symposium on Advances in Geographic Information
Systems (pp. 914). New York: ACM Press.
Westervelt, J. D. (2002). Geographic information systems and agent-based
modeling. In H. R. Gimblett (Ed.), Integrating geographic information
systems and agent-based modeling techniques for simulating social
and ecological processes (pp. 83103). Oxford: Oxford University Press.
Westervelt, J. D., & Hopkins, L. D. (1999). Modeling mobile individuals in
dynamic landscapes. International Journal of Geographical Information Science, 13(3), 191208.
Wolff, J. O. (1999). Behavioral model systems. In G. W. Barrett, & J. D. Peles
(Eds.), Landscape ecology of small mammals (pp. 1126). New York:
Springer.
Yazici, A., & Akkaya, K. (2000). Conceptual modeling of geographic information system. In G. Bordogna, & G. Pasi (Eds.), Recent issues on fuzzy
databases (pp. 129151). Berlin: Springer-Verlag.
Yuan, X., & Shen, Z. (2001). Notes on Fuzzy plane geometry I, II. Fuzzy Sets
and Systems, 121, 545547.
Zhan, F. B. (1998). Approximate analysis of binary topological relations between
geographic regions with indeterminate boundaries. Soft Computing, 2, 28
34.
Zhang, J., & Stuart, N. (2001). Fuzzy methods for categorical mapping with
image-based land cover data. International Journal of Geographical
Information Science, 15(2), 175195.
Zollner, P. A. (2000). Comparing the landscape level perceptual abilities of
forest sciurids in fragmented agricultural landscapes. Ecology, 80(3),
10191030.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
301
Chapter X
Object-Oriented
Publish/Subscribe
for Modeling and
Processing Imperfect
Information
Haifeng Liu, University of Toronto, Canada
Hans Arno Jacobsen, University of Toronto, Canada
Abstract
In the publish/subscribe paradigm, information providers disseminate
publications to all consumers who expressed interest by registering
subscriptions with the publish/subscribe system. This paradigm has found
widespread applications, ranging from selective information dissemination
to network management. In all existing publish/subscribe systems, neither
subscriptions nor publications can capture uncertainty inherent to the
information underlying the application domain. However, in many situations,
knowledge of either specific subscriptions or publications is not available.
To address this problem, this chapter proposes a new object-oriented
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Introduction
A new data-processing paradigm publish/subscribe is becoming increasingly popular for information dissemination applications. Publish/subscribe systems anonymously interconnect information providers with information consumers in a distributed environment. Information providers publish information in the
form of publications, and information consumers subscribe their interests in the
form of subscriptions. The publish/subscribe system performs the matching task
and ensures the timely delivery of published events (a.k.a. notifications) to all
interested subscribers. Publish/subscribe has been well studied, and many
systems have been developed supporting this paradigm. Existing research
prototypes include, among others, Gryphon (Aguilera, 1999), LeSubscribe
(Fabret, 2001), and ToPSS (Liu, 2002); industrial strength systems include
various implementations of JMS (Happner, 2002; Monson-Haefel, 2000), the
CORBA Notification Service (OMG, 2002), and TIB/RV. All of these systems
are based on a crisp data model, which means that neither subscribers nor
publishers can express imperfect information in subscriptions and publications,
respectively. In this crisp model, subscriptions are evaluated to be true or false
for a given publication. Moreover, most of these systems do not expose a wellstructured subscription language model and publication data model.
However, in many situations, knowledge to specify subscriptions or publications
is not available. In these cases, uncertainty about the state of the world has to
be cast into the crisp data model that defines absolute limits. Moreover, for a user
of the publish/subscribe system, it may be simpler to describe the state of the
world with imperfect concepts we say, in an approximate manner.
In a selective information dissemination context, for instance, users may want to
submit subscriptions about an apartment with a constraint on rent that is cheap.
On the other hand, information providers may not have exact information for all
items published. In a secondhand market, a seller may not know the exact age
of a vase, so the seller can describe it as an old vase but cannot describe it with
an exact age. Temperature and humidity information collected by sensors is
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
303
often not precise but only correct within a certain error interval around the value
measured. It would be more appropriate to publish such imperfect information
rather than a wrong exact value if such publish/subscribe capabilities were
possible. Moreover, the underlying publish/subscribe system may need to store
the publications submitted for ulterior processing (i.e., for subscriptions that are
submitted to the system after publication submission). For these reasons, it is an
advantage to provide a publish/subscribe data model and a matching scheme that
allow for the expression and processing of imperfect information for both
subscriptions and publications.
In a publish/subscribe system, we are concerned with two major types of
imperfect information as defined in Smets (1997): imprecision and uncertainty.
Imprecision is related to the content of the statement. Publications and
subscriptions are statements about events and users interests. The expressions
may be incomplete, ambiguous, or not well-defined, but involve the content of the
statements. Thus, we refer to this type of imperfection in publications and
subscriptions as imprecision. Another type of imperfection exists in the matching
between publications and subscriptions, which we refer to as uncertainty.
Uncertainty concerns the state of knowledge about the relationship between the
world and the statement about the world. All publish/subscribe systems developed to date are based on the assumption that a match between a subscription
and a publication is either true or false. However, it is difficult to decide whether
a publication matches a subscription involving imprecision in the publication and
the subscription. We call the imperfection inherent to the matching problem
uncertainty. To illustrate the difference between imprecision and uncertainty,
consider these two examples: (1) Charles is a tall guy, and I am sure of it. (2)
Charles is six feet tall, but I am not sure of it. The height of Charles is imprecise
in the former case, but it is certain. In the latter statement, the height is precise
but uncertain.
To support imperfect information in publish/subscribe, we extend current
subscription and publication languages to incorporate the expression of imprecision at the language level and develop a matching mechanism to support
processing of the extended language in publish/subscribe systems. To simplify
the terminology, we use approximate as a general term for all types of
imperfection involved. The extended subscriptions and publications supporting
imprecision will be called approximate subscriptions and publications. The
matching between approximate publications and approximate subscriptions is
called approximate matching. And the systems (or models) that support
approximate subscriptions/publications and implement approximate matching
are called approximate publish/subscribe systems (or models). Crisp is used
to refer to the traditional publish/subscribe systems.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
2.
3.
4.
5.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
305
Information Categorization
Publications and subscriptions are information in publish/subscribe systems.
There are three common approaches to grouping the information to help query
and search: channel-based, hierarchical, and type-based.
In the channel-based approach, information is grouped together under different
channels. A channel is a medium that carries information of related meaning. To
publish a message to a channel implies that this message will be broadcasted to
all subscribers who have subscribed to this channel, and vice versa. Newsgroups
are an example of the use of a channel-based publish/subscribe system. CORBA
event service, CORBA notification service, and Java Message Service (JMS)
also use the channel-based data model.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Subscribers
subscriptions
Publishers
Matching
Filtering
Matched
publications
subscriptions
Notification Engine
notifications
Expressiveness
Expressiveness refers to the ability of publishers and subscribers to express their
interests and events in the form of publications and subscriptions. A higher level
of expressiveness usually requires more computation power and a more advanced algorithm design.
The content-based data model provides more expressive power than other
models to filter publications and is more easily customized for individual
subscribers. The match between subscriptions and publications involves only the
content of the information without any other concerns. JMS lets subscribers
define message selectors, which are based on a subset of the SQL-92 conditional
expression syntax used in the WHERE clauses of SQL statements. CORBA
Notification Service takes a similar approach. From the aspect of information
transmission between subscribers and publishers through broker, content-based
routing is also an interesting research topic that improves the information
delivery efficiency.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
307
Another publish/subscribe model that concerns event correlation uses a rulebased approach (Chakravarthy, 1994; Samani, 1997) with which subscriptions
and publications are expressed as a composition of events. An event is a
happening of interest. It is a state transmission within the system, triggered
internally or externally. Brokers that can process composite events can make
publisher and subscriber processes easier to implement, because the event
correlation logic no longer needs to be handled programmatically.
Persistence
Persistence refers to the storage of data and states of publish/subscribe systems.
The ability of data and state persistence affects the behavior and efficiency of
systems. Most publish/subscribe systems are designed as memory-less messaging systems that do not save the contents or states of publications. The limitation
of a memory-less model can be overcome by an event history persistence model,
where all messages received by the broker are persisted, forming an event
history. It is common to use conventional relational databases as offline storage
systems. However, traditional databases are not designed to process data
streams (continuous sequence of messages entering the broker) efficiently. The
STREAM project (Bahu, 2001) led by Standford University studies techniques
for special storage management and query processing for data streams.
A state-persistent publish/subscribe system stores the states of publications and
subscriptions. In such a system, a publication represents the state of some
objects of interest, and a subscription specifies a state that consitutes the
interests of the subscriber. The broker should only send notifications of a
publication to those subscribers whose subscriptions undergo state transitions in
the relationship with the publication. In other words, the broker component only
notifies subscribers of publications that enter the states specified by their
subscriptions. Hubert and Jacobsen (2003) proposed a subject space model for
state-persistent publish/subscribe systems. The objective of this data model is
the introduction of state-persistence into publish/subscribe systems and its
symmetrical treatment of data and query.
Type-Based Publish/Subscribe
The type-based publish/subscribe model was proposed as an alternative to
express publications, subscriptions, and their interactions. The type-based model
uses features of high-level, strongly typed programming languages, such as
strong typing, scoping, objects, classes, and inheritance to define matching
semantics between subscriptions and publications. In type-based publish/sub-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Application Domains
Publish/subscribe is a messaging paradigm and an information management
methodology. It is desirable that the technologies developed for publish/subscribe systems be generic and applicable in many application domains. Most
research studies on publish/subscribe systems use the stock-brokering application as an example and the motivation of various algorithm designs of publish/
subscribe systems. The stock-brokering application is a typical example, because the roles of publishers, subscribers, and brokers are well defined.
However, there are many other application domains with information management characteristics that satisfy the definition of the publish/subscribe paradigm.
Selective information dissemination is the class of distributed applications that
distributes information according to some restrictions or conditions. Conventional Internet search engines, such as Google, can be modeled as publish/
subscribe systems. The search engine indexes many Web pages, and users can
execute search queries on the indexed pages. A more general form of data
subscription is exemplified by the emerging peer-to-peer file sharing and
publishing systems, such as Napster, Gnutella, Mojo Notion, Free Haven
(Dingledine, 2000), and Freenet (Clarke, 2000). These systems are forms of
publish/subscribe systems, where the broker component is physically distributed.
They attempt to solve the problems of scalable distributed data storage and
retrieval. A geographic information system is an example where an application
can possess the roles of multiple logical components of a publish/subscribe
system. The location information of mobile users is used to provide users with
relevant information based on their positions. There are many other applications
to which the publish/subscribe paradigm is applicable, such as workflow management (Cugola, 2001), intraenterprise process automation, supply chain management, enterprise application integration (Barrett, 1996), and network monitoring.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
309
Subscriber
Publisher
Notifier
public int register_callback(subscription s, cb_info i )
public notification getNotification (subscription s)
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Subscription
Publication
int numOfPred
predicate [ ] preds
float threshold
int numOfAttr
attr_value [ ] av_pairs
float threshold
public Subscription ( )
public addPred(predicate p )
public Publication ( )
public addAttrValue (attr_value av )
Notification
Publication e
Subscription s
float matching_degree
int nofityType
public Notification ( )
public sentNotification(subscription s)
public getNotifyType( )
allows the programmer to design entities that can poll for notification information
or can register callbacks for notifications. These notifier objects can be different
from the actual subscriber objects. In this design, the notifier objects are tied to
a specific subscription by passing it to the system through the method call. In our
design, subscriptions are represented by their subscription objects; an alternative
may be to identify subscriptions, publications, and notifications with identifiers
that are passed back upon successful submission of these objects. The ToPSS
system uses that approach.
The Information class hierarchy in Figure 3 foresees subscriptions, publications
and notifications. Subscriptions define user interests through Boolean combinations of predicates. The subscription type is determined by the predicate types.
We allow in our model the specification of crisp types, approximate types, and
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
311
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
For example, a student is looking for an apartment with constraints on price, size,
and age. Her subscription in natural language that specifies these constraints is:
S: (size is medium) AND (price is no more than 1500) AND (age is not very old)
The first predicate approximates the constraint using an uncertain notion
medium. A membership function is used to represent it:
if
x 40
x 40
if 40 < x < 50
10
medium ( x) =
if 50 x 70
1
1 x 70 if 70 < x < 80
10
if
x 80
0
if x 40
x 40
if 40 < x < 80
old ( x) = 1
40
if x 80
1
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
313
e = {( a 1 , 1 ), ( a 2 , 2 ), L , ( a n , n )}
For example, an apartment advertised for rent may be described with a condition
of 60m2 size and cheap rent. The first attribute is crisp, it defines a value for
attribute size. The second attribute is approximate. It is qualified as cheap, which
is represented by a possibility distribution function cheap. cheap defines the
possibility of each value in the domain of discourse (i.e., all admissible rent
values) as being cheap. The graphical representation of this event is shown in
Figure 5. Formally, this publication can be represented by a set of attribute
function pairs as follows:
P = {( size, 60 ), (rent , cheap )}
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
where
x = 60
1 if
60 ( x) =
0 if ( x < 60) ( x > 60)
and
1
if
x 1200
x 1200
if 1200 < x < 1500
cheap ( x) = 1
300
if
x 1500
0
Matching in Publish/Subscribe
In the general approximate model, the subscription, the publication, or both may
refer to imperfect concepts. The truth value, true or false, is no longer sufficient
for representing the state of a match between a publication and a subscription.
We need a value between 0 and 1 to represent the degree of the match between
a subscription and each publication processed by the system. Individual subscription can match a given publication, more or less, depending on this degree of
match.
Recall that subscriptions and publications are represented as follows:
s = R ((a1 , 1 ), ( a 2 , 2 ),L , (a m , m ))
e = {( a 1 , 1 ), ( a 2 , 2 ), L , ( a n , n )}
The semantics of matching subscriptions with publications is to measure the
possibility and necessity (Dubois, 1988) with which the publication satisfies the
expectation expressed by a subscription. Based on possibility theory, we use a
pair (i, Ni) to denote the evaluation of the possibility and necessity of how the
publication satisfies each predicate i (i.e., the match between i and i in a
subscription). This measure is done by computing the intersection between i and
i. In the following, we will discuss the match on the basis of predicate, then
introduce the matching problem for the whole subscription. The possibility and
necessity of a match between two functions i and i are computed by
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
315
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
S i ( x1 , x 2 ,L , x n ) = R ( i1 ( x1 ), i 2 ( x 2 ),L , in ( x n ))
e( x1 , x 2 ,L , x n ) = { 1 ( x1 ), 2 ( x 2 ),L, n ( x n )}
eSi ( x1 , x 2 , L, x n ) = t (sup min( i1 ( x1 ), 1 ( x1 )),L , sup min( in ( x n ), n ( x n )))
N eSi ( x1 , x 2 , L , x n ) = t (inf max( i1 ( x1 ), 1 ( x1 )),L , inf max( in ( x n ), n ( x n ))) .
317
necessity of how their interests are matched. Users constraints are matched if
both the possibility and necessity degrees are larger than the thresholds and
N. The general representation of a subscription is modified to:
sub = R ((a1 , 1 , 1 , N1 ),L, (a m , m , m , N m ))
System Architecture
The main challenge in applying publish/subscribe systems to real-world applications lies in the design of efficient matching algorithms that exhibit scalability. At
Internet-scale, such a system has to be able to process millions of subscriptions
and react to thousands of publications. The A-ToPSS is implemented based on
this consideration. Figure 8 shows the architecture of A-ToPSS. Publishers and
subscribers send requests through a Web server (e.g., Apache) to the system.
The requests include personal information registration, subscribing their interests
and publishing data information. Subscriptions and publications are processed by
a matching engine. At the same time, all of the users information passes through
a script engine [e.g., PHP, JavaServer Pages(JSP), or Meta-HTML, etc.],
and is stored in a database. The matching engine matches publications against
subscriptions and returns the matched subscriptions to a notification engine. The
pervasive notification engine sends different types of notifications (e.g., e-mail,
ICQ, TCP/UDP, etc.) to the subscribers according to their requests.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Web Interface
A-ToPSS provides a Web interface for users to interact with the system. The
interactive user interface is implemented by Meta-HTML Web programming
language. Meta-HTML is a powerful, extensible server-side programming
language specifically designed for working on the World Wide Web. It resembles
a hybrid of HTML and Lisp languages and has a huge existing function library,
including supports for sockets, image creation, perl, GNU plot, etc. It is
extensible in both Meta-HTML and other languages (C, etc.).
A-ToPSS offers four classes of normal operations: registration, subscribing,
publishing, and notification. The first time a user visits the Web interface,
registration is required to access the information resource. A user needs to
create an ID and set a password. Personal information such as name and address
is optional. However, the contact information relevant to the notification must be
provided in order to successfully receive notifications. For example, e-mail
address must be provided by the user if the user wants to receive notifications
via e-mail. These are administrative operations, which are common to most Web
applications. Next, we will describe features specific to publish/subscribe
systems.
For simplicity, we will explain the operations for subscribing as an illustration.
Operations for publishing are similar, and we will not elaborate here. There are
two types of users in the system: administrators and regular users. Only
administrators have the privilege of creating new subscription types, editing the
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
319
existing ones, or deleting them. Subscription types are templates for subscriptions. These templates specify the number of predicates and whether an attribute
accepts crisp or approximate values. Before the modification or deletion of a
subscription type, the system will check whether any subscription is defined
under this type. Subscription types can only be edited when no subscriptions are
defined under them.
The user-level operations on subscriptions are designed for typical users.
Subscribers can add new subscriptions, edit them, or delete the subscriptions
they previously defined. When adding a new subscription, the user first chooses
a type, and then our system will ask users to input corresponding information
according to the requirements specified by the subscription type. For crisp subscriptions, users need to provide attribute names, operators (e.g., >, <, =, , and ), and
values (e.g., integers, floats, strings, etc.). For approximate subscriptions, it is
more complicated. In addition to attribute names, users need to provide the
number of approximate constraints for each attribute. For example, the price
attribute may have three approximate constraints, which are expensive,
reasonable, and cheap. For the representation of each constraint, the Web
interface provides a trapezoidal membership function where the default values
are set with public common sense. The Web interface also gives users flexibility
in adapting the membership according to their specifications. A user chooses
among a family of functions to represent the imperfect information and set the
parameters. Figure 9 shows a screen shot of the subscription entry panel of our
system, where a user can view and adapt the membership function representing
his or her predicate.
Figure 9. Power users interface for defining approximate subscriptions
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
After users submit subscriptions and publications, their information will be stored
in a database and transmitted to the matching engine at the same time to be
processed. After the matching, matched subscriptions are sent back to the Web
interface and stored in the database. For the moment, A-ToPSS supports
notification only by a pull model. When a user clicks the notification button, the
results of matched publications for subscriptions will be displayed on the Web.
The user can browse the information through a link to the publication that
matches his or her subscription. If any subscription or publication is deleted, the
match related to it will be broken and will not be sent back to the user.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
321
Evaluation
The performance is evaluated with respect to time and memory to confirm the
efficiency of the algorithms and compare the differences between a crisp
publish/subscribe model and an approximate model. Experiments are processed
under various subscription and publication workloads.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Performance Evaluation
To evaluate performance metrics, the following metrics are considered: subscription loading time, overall system throughput, and used memory. Time
measurements are taken in milliseconds and memory measurements in KB.
In Figure 12, we can see that there is a trade off between the loading time and
matching time. Spending more time to load subscriptions in a good organization
will decrease the matching in evaluation against event coming. In real-world
applications, most subscriptions are static (i.e., they are stored in the system for
a long time), and therefore, the matching time is more important than the loading
time. Moreover, because the publication rate is usually high, it is more important
to have a fast matching algorithm that responds in a very short time. In the
memory comparison, the char-wise algorithm uses less memory than the floatwise algorithm due to the space saved by using 1 byte chars instead of 4 byte
floats.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
323
=0
4628
4628
4438
3763
= 0.5
184
804
184
47
=1
7
281
39
7
are compared in two scenarios. In one scenario, the type of publication is fixed,
and we vary the types of subscriptions and thresholds to compare crisp matching
and approximate matching. The other scenario is the opposite of the first.
Table 1 shows the different numbers of matched subscriptions when a fixed
publication is published to the system and matched against various types of
subscriptions with different -cuts. ( is used as the thresholds for possibilities
and necessities.) For each subscription type, the number of matches decreased
with the increase of -cut values, which displayed the threshold effect of . With
the same , the pessimistic case resulted in the largest number of matches, and
the optimistic case resulted in the fewest matches. The approximate case and the
middle case had almost the same results, because the less restrictive the
subscription, the higher the probability of being matched.
Table 2 shows the numbers of matched subscriptions for different types of
publications when the subscription type is fixed. The graphical explanation is
shown in Figure 13. When = 0, the approximate publication returned the largest
number of matches, and the point type returned the least number of matches.
This happened because the value of the approximate publication has a wider
domain, and thus, there is a higher possibility that subscriptions constraints are
matched. However, with higher values of , the results reversed: the approximate publication matched a very small number of subscriptions, while the point
=0
4628
3720
2960
= 0.5
184
474
1932
=1
7
170
868
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Related Work
Industry Standards
There have been a number of standardization efforts on middleware architectures and distributed system interfaces to promote interoperability. The Common
Object Request Broker Architecture (CORBA) is a middleware architecture
standardized by the Object Management Group (OMG). The CORBA Event
Service (OMG, 2001) and Notification Services specifications (OMG, 2002)
augment the CORBA middleware platform with event-based messaging capa-
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
325
bilities. The Java Message Service (JMS) is the standard Java API for messageoriented middleware proposed by Sun Microsystems to add messaging integration capabilities into the J2EE platform.
The CORBA Event Service specification defines an indirect channel-based
event transport for distributed object frameworks. An event channel decouples
event suppliers and consumers. Suppliers generate events and place them onto
a channel. Consumers obtain events from the channel. Two serious limitations
of the Event Service Specification are that it only supports limited event-filtering
capabilities, and it cannot be configured to support different qualities of service.
Most Event Service implementations deliver all events that are sent to a
particular channel to all consumers connected to that channel on a best-effort
basis.
A primary goal of the Notification Service is to enhance the Event Service by
introducing the concepts of event filtering and quality of service specifications.
Clients of the Notification Service can subscribe to events by associating filter
objects with the proxies through which the clients communicate with event
channels. These filter objects encapsulate specific constraints on the events to
be delivered to the client. Furthermore, the Notification Service enables each
channel, each connection, and each message to be configured to support the
desired quality of service with respect to delivery guarantees, event aging
characteristics, and event priorities.
The JMS is an API for enterprise messaging created by Sun Microsystems. JMS
is not a messaging system. It is an abstraction of the interfaces and classes
needed by messaging clients when communicating with messaging systems.
JMS provides publish/subscribe and point-to-point messaging models. Under the
JMS publish/subscribe model, publishers can send messages to many consumers
through a virtual channel called a topic. All messages addressed to a topic are
delivered to all the topics subscribers. The message delivery is push-based, and
no polling is required. The point-to-point messaging model uses queues to store
and forward messages from suppliers to consumers. A given queue may have
multiple receivers, but only one receiver may consume each message. It is a oneto-one communication model.
Continuous Queries
Continuous queries are issued once and are logically run continuously over a
database. Sometimes they are referred to as queries for future data, because
data included in the result set may not exist at the time when the query was
created, but will be created in the future. Traditional one-time queries, in
contrast, run only once to completion and return a result based on the current data
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
327
high event-processing rates. The language and data models are based on an
LDAP-like semistructured data model for expressing subscriptions and publications. In this system, a subscription is a conjunction of predicates, each of which
is a triplet (attribute, operator, value). Supported relational operators include <,
, , , >. This system supports both push- and pull-based information dissemination. The matching engine of LeSubscribe falls within the class of two-step
matching algorithms a predicate matching step and a subscription evaluation
step. In the first step, all predicates are matched against the publication. In the
second step, subscriptions are evaluated based on the set of matched predicates.
Instead of two-step matching algorithms, Gryphon uses a tree-based data
structure to index subscriptions, which leads to another category of matching
algorithms. In Gryphon, all subscriptions are preprocessed into a tree where each
non-leaf node is a test for one attribute, and the edges derived from that node
represent different results. During matching, the incoming publication goes down
through the branch it matches until it arrives at the leaf nodes containing the
matched subscriptions.
Another approach using a tree-based algorithm is binary decision diagrams
(BDDs) (Compailla, 2001). In this model, each subscription is a Boolean function
represented by a BDD. This approach is distinguished in two aspects: one is that
it can support any Boolean formula; the other is that overlapping subscription
expressions are operated only once if the variable ordering was chosen properly.
Elvin (Segall, 1997) is a content-based notification/messaging service that
targets application integration environments and monitoring of distributed systems. Elvin supports a more expressive subscription language that is created as
strings. Subscriptions contain powerful string-processing functions and operators on built-in data types covering integer, string, and Boolean relations. In
addition to the traditional comparison operators like <, , =, , >, , Elvin supports
operations such as matching extended regular expressions with strings.
SIENA (Scalable Internet Event Notification Architectures) (Carzaniga, 1998)
comprises another example of a publish/subscribe event-notification service that
presents a similar publication and subscription language model. This research
project is based on a content-based networking service and focuses on the
routing of subscriptions and publications in a distributed environment so that both
services notification selection (i.e., determining which publication matches
which subscription) and notification delivery (i.e., distributing matching notifications from publishers to subscribers) are balanced. The advantage of this
infrastructure is that it maximizes expressiveness in the selection mechanism
without sacrificing scalability in the delivery mechanism.
The last research project we introduce here is READY (Gruber, 2000), led by
the AT&T research lab. READY is an implementation of the CORBA Notification Service. The specific features of READY, which are not offered by
existing commercial products, include information consumer specifications that
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
can be matched over single and compound event patterns, and quality of service
(QoS) that is managed by providing ordering properties for event delivery.
329
Summary
In this chapter, we presented the publish/subscribe paradigm and introduced a
model that allows expression of imperfect information in both subscriptions and
publications. Fuzzy set theory and possibility theory are used to represent notions
of imprecision in predicates and publications. The most important property of this
approximate publish/subscribe model is that the language model is flexible and
powerful in that it allows subscriptions and publications to be either crisp or
approximate. Furthermore, the possibility and necessity measures used to
calculate the degree of match are expressive. The two measures can be used to
model users with different preferences, such as optimistic and pessimistic.
References
Aguilera, M. K., Strom, R. E., Sturman, D. C., Astley, M., & Chandra, T. D.
(1999). Matching events in a content-based subscription system. Presented
at the Symposium on Principles of Distributed Computing.
Bahu, S., & Widom, J. (2001). Continuous queries over data streams. ACM
Special Interest Group on Management of Data (SIGMOD) Record,
2001(3), 109120.
Banavar, G., Chandra, T. D., Mukherjee, B., Nagarajarao, J., Storm, R. E., &
Sturman, D. C. (1999). An efficient multicast protocol for content-based
publish/subscribe systems. Presented at the International Conference on
Distributed Computing Systems.
Barrett, D. J., Clarke, L. A., Tarr, P. L., & Wise, A. E. (1996). A framework
for event-based software integration. In ACM Transaction on Software
Engineering and Methodology, 5(4), 378421.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Burcea, I., & Jacobsen, H. A. (2003). L-ToPSS Push-oriented locationbased services. Presented at the Fourth VLDB Workshop on Technologies for E-Services (TES03). Humboldt University, Berlin, Germany.
Burcea, I., Jacobsen, H.A., DeLara, E., Muthusam, V., & Petrovic, M. (2004).
Disconnected operations in publish/subscribe. In 2004 IEEE International Conference on Mobile Data Management (MDM).
Carzaniga, A., Rosenblum, D. S., & Wolf, A. L. (1998). Design of a scalable
event notification service: Interface and architecture. Technical Report
CU-US-863-98, Department of Computer Science, University of Colorado.
Chakravarthy, S., & Mishra, D. (1994). Snoop: An expressive event specification language for active databases. Data and Knowledge Engineering,
14(1):1-26, Nov.
Chen, J., Dewitt, D. J., Tian, F., & Wang, Y. (2000). NiagaraCQ: A scalable
continuous query system for internet databases. In Proceedings of the
2000 ACM Special Interest Group on Management of Data (SIGMOD)
International Conference on Knowledge Discovery and Data Mining
(pp. 917).
Clarke, I., Sandberg, O., Wiley, B., & Hong, T. W. (2000). Freenet: A distributed
anonymous information storage and retrieval system. In Proceedings of
ICSI Workshop on Design Issues in Anonymity and Unobservability,
International Computer Science Institute.
Compailla, A., Chaki, S., Jha, S., & Veith, H. (2001). Efficient filtering in publish/
subscribe system using binary decision diagrams. In the Proceedings of
the 23rd International Conference on Software Engineering (ICSE).
Cugola, G., Nitto, E. D., & Fuggetta, A. (2001). The JEDI event-based
infrastructure and its application to the development of the OPSS WFMS.
IEEE Transaction on Software Engineering, 27(9), 827850.
Dingledine, R., Freedman, M. J., & Molnar, D. (2000). The Free Haven project:
Distributed anonymous storage service. In Proceedings of Workshop on
Design Issues in Anonymity and Unobservability.
Dubois, D., & Prade, H. (1988). Possibility theory: An approach to computerized processing of uncertainty. New York: Plenum Press.
Eugster, P. Th., Guerraoui. R., & Sventek, J. (2000). Distributed asynchronous
collections: Abstractions for publish/subscribe interaction. In 14th
AITOEuropean Conference on Object Oriented Programming (ECOOP
2000), pp. 252-276.
Fabret, F., Jacobsen, H. A., Lirbat, F., Pereira, J., Ross, K. A., & Shasha, D.
(2001). Filtering algorithm and implementation for fast publish/subscribe
systems. Presented at the ACM Special Interest Group on Management
of Data (SIGMOD) Conference, Santa Barbara, CA.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
331
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Segall, B., & Arnold, D. (1997). Elvin has left the building: A publish/subscribe
notification service with quenching. Proceedings of the Australian UNIX
and Open Systems User Group Conference (AUUG97). Brisbane,
Australia.
Smets, P. (1997). Imperfect information: Imprecision-uncertainty, uncertainty
management in information systems: From needs to solutions (pp. 225
254). Dordrecht: Kluwer Academic Publishers.
Sun Microsystems Inc. (2002). Java message service specification. Version 1.1.
Tam, D., Azimi, R., & Jacobsen, H. A. (2003). Building content-based publish/
subscribe systems with distributed hash tables. Presented at the International Workshop on Databases, Information Systems and Peer-to-Peer
Computing. Humboldt University, Berlin, Germany.
Wolski, A., & Bouaziz, T. (1998). Fuzzy triggers: Incorporating imprecise
reasoning into active database. In Proceedings of the 14th International
Conference on Data Engineering.
Xu, Z., & Jacobsen, H.A. (2004). Efficient constraint processing for highly
personalized location based services. In Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada.
Zadeh, L. A. (1989). Knowledge representation in fuzzy logic. IEEE Transaction on Knowledge and Data Engineering, 1, 89100.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Zongmin Ma received his Ph.D. from the City University of Hong Kong (2001).
His current research interests include intelligent database systems, knowledge
management, Web-based data management, e-learning systems, intelligent
planning and scheduling, decision making, robot path/motion planning, engineering database modeling, and enterprise information systems. He published many
papers in journals, conferences, edited books, and encyclopedias in these areas.
Also, he is currently authoring and editing several upcoming books being
published by Kluwer Academic Publishers and Idea Group Inc., respectively.
* * *
Rafal Angryk received a Ph.D. in computer science from Tulane University
(USA) and also has an M.A. in business management and an M.Sc. in computer
systems. He worked as a research assistant at the Center for Computational
Sciences, a program organized in cooperation between Stennis Space Center
(NASA) and University of Southern Mississippi. Previously, he was on the
faculty at the Institute of Computer Science, University of Szczecin, Poland. His
current research interests are large databases (data mining, spatial databases),
mobile agents technology (distributed processing, Web-mining), and artificial
intelligence (fuzzy modeling, neural networks), and he has over a dozen papers
in these areas.
Fernando Berzal is an assistant professor in the Department of Computer
Science and Artificial Intelligence at the University of Granada, where he is a
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
science (1996), and his Ph.D. in artificial intelligence (2000), all from the
University of Bristol. His research interests include humanist computing, uncertain reasoning, uncertain conceptual structures, information fusion, image processing, and medical information processing. He is author and co-author of more
than 20 research papers in international journals, edited books, and conference
proceedings.
Miguel . Sicilia obtained a university degree in computer science from the
Pontifical University of Salamanca in Madrid, Spain (1996) and a Ph.D. from
Carlos III University in Madrid, Spain (2002). In 1997 he joined an objecttechnology consulting firm, after enjoying a research grant at the Instituto de
Automtica Industrial (Spanish Research Council). From 1997-1999, he worked
as assistant professor at the Pontifical University, after which he joined the
Computer Science Department of the Carlos III University in Madrid as a
lecturer, working simultaneously as a software architect in e-commerce consulting firms, and as a member of the development team of a personalization engine.
From 2002-October 2003, he worked as a full-time lecturer at Carlos III
University working actively in the area of adaptive hypermedia. Currently, he
works as a full-time professor at the Computer Science Department, University
of Alcal (Madrid). His research interests are primarily adaptive hypermedia,
learning technology, and human-computer interaction, with special focus on the
role of uncertainty and imprecision handling techniques on those fields.
Mara-Amparo Vila received her M.S. in mathematics (1973) and her Ph.D. in
mathematics (1978), both from the University of Granada. Since 1992, she is a
professor in the Department of Computer Science and Artificial Intelligence.
Since 1997, she is also head of the department and the IdBIS research group. Her
research activity is centered around the application of soft computing techniques
to different areas of computer science and artificial intelligence, such as
theoretical aspects of fuzzy sets; decision and optimization processes in fuzzy
environments; fuzzy databases, including relational, logical, and object-oriented
data models; and information retrieval. She has been responsible for 10 research
projects and the advisor of seven Ph.D. theses. She published more than 50
papers in prestigious international journals, more than 60 contributions to
international conferences, and many book chapters.
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 339
Index
A
A-ToPSS 317
access patterns 209
access support relations (ASRs) 227
access via type hierarchies 209
adjustment belief revision 138
agent 272
application programming interfaces
(APIs) 246
approximate matching 303
approximate publish/subscribe systems
303
artificial intelligence 128
association rules 87
associations 158
atomic fuzzy selection expression 64
atomic type 54
attribute generalization 96
attribute generalization algorithm 85
attribute-oriented induction 86
B
B-trees 215
basic type 11, 15
Bayesian network-based 129
C
cardinality ratio 184
Cartesian product 69
CG-trees 233
CH-index 233
class 115
class hierarchy 48, 193
class inspector 202
class recognition 119
closeness of mapping 245
clusters 259
collection type 11, 15
complex objects 185
concept hierarchy 85, 96
conceptual data model 153
conceptual data modeling 153
conditional probability 48
consistent fuzzy concept hierarchy 98
constraint 23
constraint system 23, 24
continuous queries 325
core engine design 317
crisp concept hierarchy 97
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
340 Index
D
data browser 130
data cube 88
data definition operators 32
data generalization 86
data graph 186
data manipulation operators 32
data mining 85
data warehouse 87
database management systems
(DBMS) 207
database model 1, 31
database query 273, 284
database researchers 178
database scheme 29
database trigger technology 325
db4o 255
decision model 277
dependency 158
difference 69
disjunctive fuzzy set 183
dispersal model 277
E
ECO-COSM 269
ecological models 273
ellipse problem 130
entity-relationship (ER) 154
enumeration type 12, 15
equality constraint 6
existing database system 177
expressiveness 306
extended possibilistic truth value
(EPTV) 8
extendible hashing 216
extendible signature hashing index
(ESH) 225
extent cardinality 251
external hashing 215
F
face problem 130
FILUM 138
FIRMS model 5
flat hierarchy 105
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 341
G
G-trees 221
general two-dimensional indexes 222
generalization 158
generalized constraint 6
generalized resemblance operator 184
geographic database 270
geographic information systems (GISs)
274
geographical data 270
global connectivity 270
global depth 215
grid files 216
H
H-trees 232
hierarchical model 146
hierarchical signature organization 224
I
imprecise value 181
imprecision 303
inclusion operator 184
index structures 214
individual-based modeling (IBM) 270
inducer 129
information categorization 305
information dissemination 305
inheritance 208
inheritance relationship 190
instrumentation subsystem 282
J
Java data objects (JDO) 242, 252
join-compatible 74
K
K-d trees 217
KBLIMS 272
knowledge discovery 86
knowledge representation 146
L
label clauses 123
label phase 123
landscape 271
linguistic labels 55, 181
lisp 125
location-aware ToPSS 328
logic programming 114
M
machine learning 86, 128
mass assignment 49
membership degree 47, 163
membership function 285
meta-meta-model layer (M3) 246
modeling subsystem 280
modeling with words 143
monitoring experiments 320
multikey Index 234
multivalued attributes 209
multivalued reference type 14
N
navigational access 213
navigational access via paths 209
necessity measure 211
neural network-based 129
nonempty intersection query 219
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
342 Index
O
object bases 48
object constraint language (OCL) 248
object data management group (ODMG)
208, 242
object identifier (OID) 208
object persistence sources 242
object scheme 29
object-centered model 3
object-oriented data paradigm 178
object-oriented database 113, 274
object-oriented database management
systems (OODBM) 178
object-oriented database model 2
object-oriented databases (OODBs)
85, 242
object-oriented logic programs 123
object-oriented model 3, 47
object-relational database management
systems 178
ObjectStore 259
orthogonal persistence interfaces 244
orthogonal persistence system 242
P
partition tree 99
partitioned signature organization 225
path expression 63
pattern recognition 86
perceptual range 271
persistent object 21
physical storage models 250
polymorphy 209
possibilistic constraint 6
possibility distribution 156, 181
possibility measure 211
possibility theory 302
preferred default subset 118
probabilistic combination strategies 53
probabilistic constraint 7
probabilistic default reasoning 117
probabilistic extent 60
probabilistic interpretation 48
probabilistic object base 46
probabilistic tuple values 56
probability degree 47
probability distribution 48
probability theory 47
probability-value constraint 7
probe 272
programming language 113
projection 69
PROLOG 123
properties 115
property inheritance 119
prototype 201
publication data model 309, 312
publish/subscribe messaging paradigm
305
publish/subscribe paradigm 301
publish/subscribe systems 305
Q
query language 241
query-directed approach 272
querying 211
R
random set constraint 7
recursion 186
reference instance 21
reference type 14
reflection capability 193
relational interval trees 219
renaming 69
renaming expression 70
resemblance relationship 181
RI-trees 219
role-expressiveness 245
rough object-oriented database 5
S
SC-trees 232
segments 259
selection expression 63
selection operation 63
semantic data model 179, 189
semantic representation 181
semantic structure 273
semantics of a constraint 23
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Index 343
V
veristic constraint 7
virtual memory mapping architecture
259
void type 14
voting model 49
T
top-level attributes 55
Toronto publish/subscribe system family
327
transient object 21
translation layer 179
tree hierarchy 145
tuple type 54
type 10
type hierarchies 213, 232
type system 10
type-based publish/subscribe 307
U
UFO model 4
UML 153
Copyright 2005, Idea Group Inc. Copying or distributing in print or electronic forms without written
permission of Idea Group Inc. is prohibited.
Organizational
Data Mining:
Leveraging Enterprise
Data Resources for Optimal
Performance
Hamid R. Nemati, University of North Carolina at Greensboro, USA
Christopher D. Barko, Laboratory Corporation of America, USA
Successfully competing in the new global economy requires
immediate decision capability. This immediate decision capability
requires quick analysis of both timely and relevant data. To
support this analysis, organizations are piling up mountains of
business data in their databases every day. Terabyte-sized
databases are common in organizations today, and this enormous
growth will make petabyte-sized databases a reality within the
next few years. Those organizations making swift, fact-based
decisions by optimally leveraging their data resources will
outperform those organizations that do not. A technology that
facilitates this process of optimal decision-making is known as
organizational data mining (ODM). Organizational Data Mining:
Leveraging Enterprise Data Resources for Optimal Performance
demonstrates how organizations can leverage ODM for enhanced
competitiveness and optimal performance.
ISBN 1-59140-134-8 (h/c) US$79.95 ISBN 1-59140-222-0 (s/c) US$64.95
388 pages Copyright 2004
This book provides a timely account of data warehousing and data mining applications
for the organizations. It provides a balanced coverage of technical and organizational
aspects of these techniques, supplemented by case studies of real commercial
applications. Managers, practitioners, and research-oriented personnel can all benefit
from the many illuminating chapters written by experts in the field.
- Fereidoon Sadri, University of North Carolina, USA