Sei sulla pagina 1di 11

Week 2 – Parametric Density-based Classifiers, Gaussian Density-based

Classifiers

Introduction

Learning goals 2.1 and 2.2


1. Explain how you obtain a classifier using a Gaussian (multivariate) distribution for each class
Each measurement/datapoint is a vector in space. The length of the datapoint vector (the dimensions)
are based on the number of features.

The data is labeled: each datapoint x is assigned to a class label yi (Supervised classification). The
datapoints are split into training data and test data. Classifiers are modeled according to the training
data. This model is tested using the test data: with this information the accuracy of the classifier can be
determined.

Models can be parametric: using parameters (weights, averages, variances)


OR
Models can be non-parametric: Not using parameters

Models can be Discriminative: p(y|x) is already known to directly classify objects


OR
Models can be Generative: The prior and conditional probability densities are known, posterior
probability is estimated

The method of a parametric density-based classifier is:

IF: Given the datapoint x, the probability it belongs to y1 is higher than the probability it belongs to y2.

THEN: datapoint x belongs to class y1

How to determine these (posterior) probabilities:


Therefore, we need p(y|x): likelihood, p(y): prior probability, p(x): Unconditional data distribution:

Densities can be approximated. This can be done using a Histogram-based Density estimation.

Using the datapoints, the distributions can be determined based on the frequency that a feature has a
value that belongs to one of the bins of the histogram. However, this requires enough data, so the bins
can be filled accurately. Without enough data, some bins cannot be filled and thus classification for that
bin is not possible. The more features you include and the smaller the bin size, the more accurate the
classifier can be, but also the more sample data you need to approximate the distributions (curse of
dimensionality).

The number of bins are the number of parameters you have for the density estimation. More features
(dimensions) means more parameters:

This is too much: Therefore data assumptions can be done:


We can assume the data is distributed according to a normal distribution = Gaussian distribution:

This only needs 2 parameters per feature (dimension), the mean and the variance.
For multivariate Gaussians, the variance will be replaced with a covariance matrix. These can be
estimated accordingly: xi is a feature vector here (containing all dimensions/features)

For the estimation to be possible, you do still need enough data.


2. implement a simple univariate classifier in Python
1. Look at code

3. explain what the 'curse of dimensionality' is


Using more features should give more information about the outcome to predict. However, the more
features (dimensions) you use in the datapoints, the more parameters you need to estimate. More
estimations means that you need more datapoints. The number of needed datapoints can rise up to
impossible amounts. This is the curse of dimensionality.

4. explain the advantages and disadvantages are of the Quadratic classifier, the LDA and the
nearest mean classifier
Quadratic classifier:

Advantages: Is accurate, can curve around the data

Disadvantage: Requires more parameters, thus more sample data: separate mean and separate
covariance matrices per class

Linear Discriminant analysis:

Advantages: requires fewer parameters: separate means, but equal covariance matrix per class

Disadvantage: it is linear, so might have a high error with curved data

Nearest mean classifier:

Advantages: Not required to have a covariance matrix, but uses an identity covariance matrix. So fewer
data is required.

Disadvantage: assumes all features have the same variance: are uncorrelated
5. identify when scaling of the features is important and how to cope with feature scaling
Log-likelihood parameter estimation
Naïve Bayes
Decision Trees, Perceptrons and non-linear classifiers
Reject option and Fairness
Clustering

Potrebbero piacerti anche