Sei sulla pagina 1di 3

DECISION TREE ARCHITECTURE Decisions trees are an alternative way of structuring such information and there are efficient

algorithms for constructing such trees from data. The past 30 years have been seen the emergence of a family of learning systems that work in this way ,for example ,CLS,ID3,ACLS,ASSISTANT and IND.ACLS has given rise to a number of commercial derivatives,such as Expert-Ease and Rule-Master,which have been successful application in industry. THE STRUCTURE OF DECISION TREES A decision tree represents a particular way of breaking up a data set into classes or categories.The root of the tree implicitly contains all the data to be classified,while the leaf nodes represent the final classes after categorization.Intermediate nodes represent choice points ,or tests upon attributes of the data,which serve to further subdivide the data at that node. Thus,Quinlan (1993) defines decision trees as structures that consist of Leaf nodes ,representing a class,or Decision nodes ,specifying some test to be carried out on a single attribute value,with one branch for each possible outcome of the test.

Another way of looking at decision trees is that nodes correspond to attributes of the objects to be classified ,while the arcs correspond to alternative values for these attributes. The tree of above figure has as its non-terminal nodes the weather attributes outlook,humidity and windy.The leaves are labeled by one of two classes,P or N.One can think of P as the class of positive instances of the concept we are using,and N as the class of negative instances. The reason foe using decision trees rather than rules is that there exist comparatively simple algorithm for taking a training set and driving a decision tree that will correctly classify unseen objects.The ID3 algorithm which performed this task is conceptually quite simple,as we shall see in the text section.It is also computationally efficient ,in that the time taken to build such trees only increases linearly with the size of the problem. The ID3 algorithm The problem that ID3 sets out to solve is easy to state .Given A set of disjoint target classes{C1,C2,..,Ck},and A set of training data,S,containing objects of more than one class,

ID3 uses a series of tests to refine S into subsets thst contain objects of only one class.The heart of the algorithm is a procedure for building a decision tree,where non-terminal nodes in the tree correspond to classified subsets of the data set.As we shall see,the trick to doing this effectively is selecting the tests.

Let T be any test on a single attribute of the data,with O1,O2,On representing the possible outcomes of applying T to any object x,which we shall write as T(x).T will therefore produce a partition {S1,S2,,Sn}of S such that Si={x|T(x)}=Oi} If we proceed recursively to replace each Si ,with a decision tree,we would have a decision tree for S.As noted earlier, the crucial factor in this problem reduction strategy is the choice of test.For each subtree,we need to find agood attribute for partitioning the objects. In making this decision,Quinlan employs the notion of uncertainly from information theory.Uncertainty is a number describing a set of messages, M={m1,m2,mn}. .. Each message mi,in the set has probability p(mi) of being received and contains an amount of information ,I(mi),defined as I(mi)=-log(mi). Thus information is an inverse monotonic function of probability.Information and uncertainty are measured in bits ,so logrithms are taken to base 2. The uncertainty of a message set,U(M),is just the sum of the information in the several possible message weighted by their probabilities: Speaking intuitively,we are uncertain about which message from the set will be sent to the degree to which we expect the messages to be informative.consequently,we compute the average information of the possible messages in a set are equiprobable,then uncertainty is at a maximum. Quinlans use of this measure is based on the following assumptions. A correct decision tree for S will classify objects in the same proportion as their representation in S. Given a case to claasify,a test can be regarded as the source of a message about that case.

Nevertheless ,the ID3 algorithm has been successfully applied to fairly large training sets .Its computational complexity hinges on the cost of choosing the next test to branch on,which is itselfa linear function of the product of the number of objects in the training set and the number of attributes used to describe them.

The simplicity and efficiency of Quinlans algorithm make it a feasible alternative to knowledge elicitation if sufficient data of the right kind are available.However,unlike the version spaces approach to concept learning,this method is not incremental.In others words ,you cannot consider additional training data without reconsidering the classification of previous instances. ID3 is not guaranteed to find the simplest decision tree that characterizes the training instances,because the information-theoretic evaluation function for choosing attributes is only a heuristic. CHANGES AND ADDITIONS TO ID3 IN C4.5 C4.5 is a suite of programs that embody the ID3 algorithm, and include a module,called C4.5 RULES,that can generate a set of production rules from any resultant decision tree.The program uses pruning heuristics to simplify decision trees in an attempt to produce a result that is both easier to understand and less dependent upon the particular training set used. Quinlan employs the following strategies for forming a final rule set from a decision tree: Derive an initial rule set by enumerating paths from the root to the leaves. Generalize the rules by possible deleting conditions deemed to be unnecessary. Group the rules into subsets according to the target classes they cover and then delete any rules that do not appear to contribute to overall performance on that class. Order the sets of rules for the target classes,and choose a default class to which cases will be assigned.

The reqirements are crucial given by Quinlan: The classes into which data will be divided must be determined ahead of time. These methods require large data sets,the larger the better.Training sets that are too small will lead to overfitting,that is,the classification will be too heavily influenced by individual data items,and the classifier will then perform badly on unseen data. The data must be in a regular attribute-value format; that is,each datum must be capable of being characterized in terms of a fixed set of attributes and their values,whether symbolic,ordial or contionous.

Potrebbero piacerti anche