Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1 What are Rule Based classifiers? How to build a Rule based classifier? 2+3+3
Explain Sequential covering algorithm.
• Rule: (Condition) → y
o where
▪ Condition is a conjunctions of attributes
▪ y is the class label
o LHS: rule antecedent or condition
o RHS: rule consequent
o Examples of classification rules:
▪ (Blood Type=Warm) (Lay Eggs=Yes) → Birds
▪ (Taxable Income < 50K) (Refund=Yes) → Evade=No
• A rule r covers an instance x if the attributes of the instance satisfy the condition of the
rule
➢ R1: (Give Birth = no) (Can Fly = yes) → Birds
➢ R2: (Give Birth = no) (Live in Water = yes) → Fishes
➢ R3: (Give Birth = yes) (Blood Type = warm) → Mammals
➢ R4: (Give Birth = no) (Can Fly = no) → Reptiles
➢ R5: (Live in Water = sometimes) → Amphibians
• Coverage of a rule:
o Fraction of records that satisfy the antecedent of a rule
• Accuracy of a rule:
– Fraction of records that satisfy both the antecedent and consequent of a rule
OR
2 What are Bayesian Classifiers? Explain Bayes theorem for classification. 4+4
Bayesian Classifier –
l An approach for modelling probabilistic relationships between attribute set and the class
variable.
l Let X and Y be a pair of random variables. Their joint probability P(X=x, Y=y) refers to
the probability that variable X will take on the value x and variable Y will take on the
value y.
l A conditional probability is the probability that a random variable will take on a
particular value give that the outcome for another random variable is known.
l i.e. P(Y=y | X=x) refers to the probability that the variable Y will take the value y, given
that variable X is observed to have the value x.
l P(X,Y) = P(Y|X) * P(X) = P(X|Y) * P(Y)
Bayes Theorem - Let X is attribute set and Y ix class variable
If Y has non deterministic relationship with attributes then X and Y can be treated as random
BE(CSE), VI Semester
variables and their relationship as P(Y|X).
P(Y|X) is also known as posterior probability and P(Y) is prior probability.
– i i
X Y Distance Rank
0.5 - 4.5 7
3 - 2 6
4.5 + 0.5 5
4.6 + 0.4 4
4.9 + 0.1 1
5.2 - 0.2 2
5.5 + 0.5 5
7 - 2 6
9.5 - 4.5 7
1 NN +
3 NN –
5 NN +
9 NN -
OR
4 What is Hierarchical clustering? Use similarity matrix in the table to perform single link 2+6
hierarchical clustering. Show the results by drawing clusters and a dendrogram.
P1 P2 P3 P4 P5
P1 0.0 0.90 0.59 0.45 0.65
P2 0.90 0.0 0.36 0.53 0.02
BE(CSE), VI Semester
P3 0.59 0.36 0.0 0.56 0.15
P4 0.45 0.53 0.56 0.0 0.24
P5 0.65 0.02 0.15 0.24 0.0
5 Consider a training set that contain 100 +ve examples and 400 -ve examples for each of the 4+4
following candidate rule. Determine which is the best and worst candidate according to
i) Rule accuracy ii) Foil Information gain.
R1 :A → + (covers 4 + ve and 1- ve examples)
R2 :B → + (covers 30 +ve and 10-ve examples)
R3 :C → + (covers 100 +ve and 90 -ve examples).
OR
BE(CSE), VI Semester
6 What are the characteristics of Rule Based classifiers? Give difference between rule-based 4+4
ordering and class based ordering scheme.
BE(CSE), VI Semester
7 What is Cluster analysis? Write the K- means clustering algorithm. List and explain different 2+3+3
types of clustering.
BE(CSE), VI Semester
● A clustering is a set of clusters
● Important distinction between hierarchical and partitional sets of clusters
● Partitional Clustering
– A division data objects into non-overlapping subsets (clusters) such that each data
object is in exactly one subset
● Hierarchical clustering
– A set of nested clusters organized as a hierarchical tree
OR
8 Explain density based methods for clustering with an example of DBSCAN.. 8
● Density-based
– A cluster is a dense region of points, which is separated by low-density regions,
from other regions of high density.
– Used when the clusters are irregular or intertwined, and when noise and outliers
are present.
BE(CSE), VI Semester
9 Explain different methods for computing distances between clusters. 8
● ‘Closeness’ is measured by Euclidean distance, cosine similarity, correlation, etc.
BE(CSE), VI Semester
OR
10 List and explain different types of evaluation measures that are used to judge various aspects of 4+4
cluster validity. Give equations for cohesion and Separation.
● Evaluation measures that are applied to judge various aspects of cluster validity, are
classified into the following three types.
– Unsupervised (Internal Index): Used to measure the goodness of a clustering
structure without respect to external information.
◆ Sum of Squared Error (SSE)
– -Cluster Cohesion
– -Cluster Separation
– Supervised (External Index): Used to measure the extent to which cluster labels
BE(CSE), VI Semester
match externally supplied class labels.
◆ Entropy - that measures how well cluster labels match externally supplied
class labels.
– Relative Index: Used to compare two different clusters.
◆ Often an external or internal index is used for this function, e.g., SSE or
entropy
BE(CSE), VI Semester
● A proximity graph based approach can also be used for cohesion and separation.
– Cluster cohesion is the sum of the weight of all links within a cluster.
– Cluster separation is the sum of the weights between nodes in the cluster and
nodes outside the cluster.
BE(CSE), VI Semester