Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Classifiers
Introduction
The data is labeled: each datapoint x is assigned to a class label yi (Supervised classification). The
datapoints are split into training data and test data. Classifiers are modeled according to the training
data. This model is tested using the test data: with this information the accuracy of the classifier can be
determined.
IF: Given the datapoint x, the probability it belongs to y1 is higher than the probability it belongs to y2.
Densities can be approximated. This can be done using a Histogram-based Density estimation.
Using the datapoints, the distributions can be determined based on the frequency that a feature has a
value that belongs to one of the bins of the histogram. However, this requires enough data, so the bins
can be filled accurately. Without enough data, some bins cannot be filled and thus classification for that
bin is not possible. The more features you include and the smaller the bin size, the more accurate the
classifier can be, but also the more sample data you need to approximate the distributions (curse of
dimensionality).
The number of bins are the number of parameters you have for the density estimation. More features
(dimensions) means more parameters:
This only needs 2 parameters per feature (dimension), the mean and the variance.
For multivariate Gaussians, the variance will be replaced with a covariance matrix. These can be
estimated accordingly: xi is a feature vector here (containing all dimensions/features)
4. explain the advantages and disadvantages are of the Quadratic classifier, the LDA and the
nearest mean classifier
Quadratic classifier:
Disadvantage: Requires more parameters, thus more sample data: separate mean and separate
covariance matrices per class
Advantages: requires fewer parameters: separate means, but equal covariance matrix per class
Advantages: Not required to have a covariance matrix, but uses an identity covariance matrix. So fewer
data is required.
Disadvantage: assumes all features have the same variance: are uncorrelated
5. identify when scaling of the features is important and how to cope with feature scaling
Log-likelihood parameter estimation
Naïve Bayes
Decision Trees, Perceptrons and non-linear classifiers
Reject option and Fairness
Clustering