Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
C. Z. Janikow
Mathematics and Computer Science
University of Missouri
St. Louis, MO 63121
P
tion leading to N j . Then, at node N the event count for the
havior attributed to fuzzy systems (we assume that a crisp
fuzzy linguistic class k is e2E f1 (N e f0(e Ak )). Given
class assignment in the class domain is sought). Thus, they
that, the probability p k can be estimated as
extend the symbolic ID3 procedure (Section 2), using fuzzy
sets and approximate reasoning both for the tree-building
Pe2E f (Ne f (e Ak ))
pk = P P f (N f (e A ))
and the inference procedures. At the same time, they bor- 1 0
row the rich existing decision tree methodologies for deal-
k 2C e2E
0 e 1 k0 0
ing with incomplete knowledge (such as missing features),
extended to utilize new wealth of information available in and a tree can be constructed using the same algorithm as
fuzzy representation [5]. that of Section 2.
ID3 is based on classical crisp sets, so an example satis- In the inference routine, the rst step is to nd to what
es exactly one of the possible conditions out of a node and degree a new example satis es each of the leaves. In other
thus falls into exactly one subnode { and consequently into words, it is necessary to nd its membership in each of the
exactly one leaf (except for the cases of missing features). partition blocks of the now fuzzy partition. Thus, the same
In fuzzy decision trees, an example can match more than f0 and f1 mechanisms must be used. The next step is to
one condition, for these conditions are now fuzzy restric- determine the response of each of the leaves individually,
tions based on fuzzy sets. Because of that, a given example through f 2 . As indicated earlier, in fuzzy decision trees the
can fall into many children nodes (typically up to two) of number of satis ed leaves dramatically increases. Such con-
a node. Because this may happen at any level, the example icts are handled by f 3 operators.
may eventually fall into many of the leaves. Because a fuzzy However, there are likely to be con icts as well in indi-
match is any number in 0 1], and approximate reasoning is vidual leaves. This is so because the fuzzy partition blocks
closed in 0 1], a leaf node may contain a large number of are very likely to contain training data with di erent classi-
examples, many of which can be found in other leaves as cations. In cases where the class assignment is not just a
well, each to some degree in 0 1]. This fact is actually ad- fuzzy (or strict) linguistic term but a value from the classi-
vantageous as it provides more graceful behavior, especially cation domain (e.g., when the domain is continuous), then
when dealing with noisy or incomplete information, while there are no unique classi cations whatsoever, and we may
the approximate reasoning methods are designed to handle only speak of the grade of membership of such examples in
the additional complexity. the linguistic classes. Such class information can be inter-
preted in a number of di erent ways. For example, this may this quotient implements the idea of center of gravity of a
be treated as a union of weighted fuzzy sets, or as a disjunc- leaf. These coe cients can be used to de ne our inferences.
tion of weighted fuzzy propositions relating to class assign- We present them along with their intuitive interpretations.
ments, with the weights proportional to the proportion of ex- Note that di erent interpretations actually implicitly refer to
amples of that leaf found in fuzzy sets associated with the di erent f 3 operators { f2 , in addition to previous f 0 and f1 ,
linguistic classes. must be assumed explicitly.
Multiple leaves which can be partially matched by a
sample provide the second con ict level. These con icts 1. Find center of gravity of all the examples in a leaf.
result from overlapping fuzzy sets, and it is aggravated by This represents an exemplar. Then, nd the classi c-
cases of missing features in the sample to be recognized ation of a super-exemplar taken over all leaves, while
and by incomplete knowledge (missing branches in the tree). individual leaf-based exemplars are weighted by how
Because each leaf can be interpreted as a fuzzy rule, then well they match the new event and how much training
data they contain.
multiple leaves have the disjunctive interpretation.
A number of possible fuzzy inferences were described in
=
X f2 (le R1l )=
X f2 (le rl1)
[5]. Here, we describe a few di erent inferences, follow-
l2Leaves l2Leaves
ing the idea of exemplars [1] and implementing the local
center-of-gravity inference with various interpretations of
2. The same as 1, but the individual exemplars are not
those two levels of con icts.
weighted by how many examples they contain.
In exemplar-based learning, selected training examples
are retained, and a decision procedure returns the class as-
=
X f2 (le R1l =rl1)
signed to the "closest" exemplar. Moreover, these examples
l2Leaves
can be generalized, and multiple exemplars can be used for
making the decision [1]. 3. The same as case 1, but individual exemplars rep-
The same ideas can lead to a number of new infer- resent only examples of the best decision (fuzzy lin-
ences from fuzzy decision trees. Consider a fuzzy decision guistic class) in the leaf.
tree built as above. Suppose that we interpret all indi-
vidual leaves as generalized exemplars. Obviously path- =
X f2 (le R2l )=
X f2 (le rl2)
restrictions determine the generalized forms of the exem- l2Leaves l2Leaves
plars. However, in fuzzy decision trees such exemplars
would not have unique classi cations, while an exemplar,
X
4. The same as 2, but also restricted as 3.
like an example, is supposed to have. Thus, we need meth-
ods to create exemplars from leaves (in addition to methods = f2 (le R2l =rl2)
for combining information from such exemplars). l2Leaves
As mentioned before, for practical reasons we follow the
idea of local inferences from individual rules (leaves) [3]. 5. Suppose that denotes the leaf whose data is "most"
We rst de ne two coe cients R and r, two cases each, similar to the new example (i.e., maximizes le ).
which will subsequently be used to de ne the inferences. Here, the decision is based on the exemplar from such
Assume that eo is the numerical class assignment, that is a a leaf.
value from the universe of the classi cation, of example e = R1 =r1
(if the classi cations are symbolic or linguistic fuzzy terms
these may be centers of gravity of such sets). These coef- 6. The same as 5, but the selected exemplar represents
cients can be computed by taking all individual examples only the best class from the leaf.
falling into a leaf l:
Rl
1
X
= (l e ) rl
1
X
= l
= R2 =r2
e o e
e2E e2E What the properties of these speci c inferences are re-
or by favoring examples associated with the best decision mains to be investigated. Some preliminary experiments on
at each leaf l (i.e. is the fuzzy linguistic class which has this subject, for example assuming that the goal is noise l-
the highest accumulative membership of training data in l):
X f (l e X f (l
tration and concept recovery from incomplete data, were re-
ported in [4] for a number of similar inferences.
R2l = 1 e o f0(e A )) rl2 = 1 e f0 (e A )) An apparent advantage of fuzzy decision trees is that they
e2E e2E use the same routines as symbolic decision trees (but with
Those coe cients give an obvious interpretation to R=r. fuzzy representation). This allows for utilizationof the same
For example, if 8(e2E )eo = a then R1 =r1 = a. In general, comprehensible tree structure for knowledge understanding
and veri cation. This also allows more robust processing
with continuously gradual outputs. Moreover, one may eas-
ily incorporate rich methodologies for dealing with miss- This research is in part being supported by grant NSF IRI-
ing features and incomplete trees. For example, suppose 9504334.
that the inference descends to a node which does not have
a branch (maybe due to tree pruning, which often improves References
generalization properties) for the corresponding feature of
the sample. This dandling feature can be fuzzi ed, and then [1] E.R. Baires, B.W. Porter & C.C. Wier. PROTOS: An
its match to fuzzy restrictions associated with the available Exemplar-Based learning Apprentice. Machine Learn-
branches provides better than uniform discernibility among ing III, Kondrato . Y. & Michalski, R. (eds.), Morgan
those children [5]. Kaufmann, pp. 112-127.
Because fuzzy restrictions are evaluated using fuzzy
[2] L. Breiman, J.H. Friedman, R.A. Olsen & C.J. Stone.
membership functions, this process provides a linkage
Classi cation and Regression Trees. Wadsworth, 1984.
between continuous domain values and abstract features {
here, fuzzy terms. However, numerical data is not essen- [3] D. Driankov, H. Hellendoorn & M. Reinfrank. An Intro-
tial for the algorithm since methods for matching two fuzzy duction to Fuzzy Control. Springer-Verlag 1993.
terms are readily available. Thus, fuzzy decision trees can
process data expressed with both numerical values (more [4] C.Z. Janikow. Fuzzy Processing in Decision Trees. Pro-
information) and fuzzy terms. Because of the latter, fuzzy ceedings of the Sixth International Symposium on Arti-
trees can also process inherently symbolic domains. Be- cial Intelligence, 1993, pp. 360-367.
cause the same processing has been adapted for fuzzy do- [5] C.Z. Janikow. Fuzzy Decision Trees: Issues and Meth-
mains, processing purely symbolic data would result in the ods. To appear IEEE Transactions on Systems, Man, and
same behavior (given proper inference) as that of a symbolic Cybernetics.
decision tree. Thus, fuzzy decision trees are natural gener-
alizations of symbolic decision trees. [6] J.R. Quinlan. The E ect of Noise on Concept Learning.
Machine Learning II, R. Michalski, J. Carbonell & T.
Most research and empirical studies remain to be com-
Mitchell (eds), Morgan Kaufmann, 1986.
pleted. For illustration, [5] presents two sample applica-
tions: one dealing with learning continuous functions, and [7] J.R. Quinlan. Induction on Decision Trees. Machine
the other for learning symbolic concepts. The same public- Learning, Vol. 1, 1986, pp. 81-106.
ation also details the complete algorithm.
[8] J.R. Quinlan. Unknown Attribute-Values in Induction.
Proceedings of the Sixth InternationalWorkshop on Ma-
5. Summary chine Learning, Morgan Kaufmann, 1992.
[9] J.R. Quinlan. Combining Instance-Based and Model-
Based Learning. Proceedings of International Confer-
We brie y introduced fuzzy decision trees. They are ence on Machine Learning 1993, Morgan Kaufmann
based on ID3 symbolic decision trees, but they utilize fuzzy 1993, pp. 236-243.
representation. The tree-building routine has been modi ed
to utilize fuzzy instead of strict restrictions. Following fuzzy [10] J.R. Quinlan. C4.5: Programs for Machine Learning.
approximate reasoning methods, the inference routine has Morgan Kaufmann, 1993.
been modi ed to process fuzzy information. The inference [11] M. Umano, H. Okamoto, I. Hatono, H. Tamura, F.
can be based on a number of di erent criteria, resulting in Kawachi, S. Umedzu & J. Kinoschita. Fuzzy Decision
a number of di erent inferences. In this paper, we de ned Trees by Fuzzy ID3 Algorithm and Its Application
new inferences based on exemplar learning | tree leaves to Diagnosis Systems. Proceedings of the Third IEEE
are treated as learning exemplars. Conference on Fuzzy Systems, 1994, pp. 2113-2118.
Fuzzy representation allows decision trees to deal with
continuous data and noise. Unknown-feature methodolo- [12] L. Wang & J. Mendel. Generating Fuzzy Rules by
gies, borrowed from symbolic decision trees, allow fuzzy Learning From Examples. IEEE Transactions on Sys-
trees to deal with missing information. The symbolic tems, Man and Cybernetics, v. 22 No. 6, 1992, pp. 1414-
tree structure allows fuzzy trees to provide comprehensible 1427.
knowledge. We are currently testing a new prototype, which [13] L.A. Zadeh. Fuzzy Logic and Approximate Reason-
will subsequently be used to determine various properties ing. Synthese 30 (1975), pp. 407-428.
and trade-o s associated with fuzzy decision trees.