Sei sulla pagina 1di 34

Soft Classifications Methods

Mixed Pixel Problem


Depends upon the spatial resolution of the sensor.
Pure

Mixed

Pure

Pure
Pure

Mixed

Mixed

Mixed

Pure

Both supervised and unsupervised classification may be


applied to perform the hard and soft classification.
Hard classification allocates
image to a single class.

each pixel of remote sensing

It results an inherent assumption that all the pixel in the remote


sensing imagery are pure.
However, often the images are dominated by mixed pixel they
do not represent one particular land cover, but contain two or
more Land Cover (LC) classes in a single pixel.

Coarser the spatial resolution, higher is the chance of


mixed pixels occurring in a single pixel area.
Although the chances of two or more class contributing
to a mixed pixel are high with a coarse spatial
resolution but the number of such pixels is small. On
the other hand, with improved spatial resolution, the
number of classes within a pixel is reduced, but the
number of mixed pixels increases.
Furthermore, for improved spatial resolution, the
masking due to shadow also results the loss of
information.

Presence of mixed pixel creates a problem in image classification.


a mixed pixel displays a composite spectral response that may be
dissimilar to the spectral response of each of its component LC
classes, and therefore, pixel may not be allocated to any of its
component LC classes.
Conventional image classification techniques may, thus, result into
a lot of information loss present in a pixel. These techniques,
therefore, tend to over-or under-estimate the actual areal extents of
the LC classes on ground, thereby degrading the classification
accuracy of the image contaminated by mixed pixels.

Resolution and spectral mixing

CLASSIFICATION AND TARGET


IDENTIFICATION
Spectral analysis methods usually compare pixel spectra with a
reference spectrum (often called a target). Target spectra can be
derived from a variety of sources, including spectral libraries,
regions of interest within a spectral image, or individual pixels
within a spectral image.

Whole Pixel Methods


Whole pixel analysis methods attempt to determine whether one
or more target materials are abundant within each pixel in a
multispectral or hyperspectral image on the basis of the spectral
similarity between the pixel and target spectra.

Whole pixel tools include standard supervised


classifiers such as Minimum Distance or Maximum
likelihood, as well as tools developed specifically for
hyperspectral imagery such as
Spectral Angle Mapper
Spectral Feature Fitting

1.Spectral Angle Mapper (SAM)

Scatter plot of pixel values from two bands of a spectral


image. In such a plot, pixel spectra and target spectra will
plot as points
The Spectral Angle Mapper (Yuhas et al., 1992)
computes a spectral angle between each pixel spectrum
and each target spectrum.

2. Spectral Feature Fitting


In Spectral Feature Fitting the user specifies a range of
wavelengths within which a unique absorption feature
exists for the chosen target. The pixel spectra are then
compared to the target spectrum using two measurements:
1.The depth of the feature in the pixel is compared
depth of the feature in the target, and

to the

2.The shape of the feature in the pixel is compared to the


shape of the feature in the target (using a least-squares
technique).

3.Complete Linear Spectral Unmixing


It is also known as spectral mixture modeling or spectral
mixture analysis.
Set of spectrally unique surface materials existing within a
scene are often referred to as the spectral end members .
reflectance spectrum of any pixel is the result of linear
combinations of the spectra of all end members inside that
pixel.
Unmixing simply solves a set of n linear equations for
each pixel, where n is the number of bands in the image.

Matched Filtering
Often called a partial un-mixing.
No need to find the spectra of all end members in the
scene to get an accurate analysis.
Originally developed to compute abundances of
targets that are relatively rare in the scene.
Matched Filtering filters the input image for good
matches to the chosen target spectrum by maximizing
the response of the target spectrum within the data
and suppressing the response of everything else.

Soft Classification
Each pixel may represent the multiple and partial class
memberships.
It is an alternative to hard classification because of its
ability to deal with the mixed pixel.
Membership functions allocates to each pixel a real
value between 0 and 1, i.e. membership grade.
Sub-pixel scale information is typically represented in
the output of a soft classification by the strength of
membership a pixel displays to each class.
It is used to reflect the relative proportion of the classes
in the area represented by the pixel.

Soft classifiers
Most common soft classifiers are:
Maximum likelihood classification
Fuzzy c-means
Possibilistic c-means
Fuzzy set theory
Noise Clustering
based approaches
Artificial neural networks
Decision Trees

These techniques can be applied to resolve a pixel


into various LC class components, thus generating
soft class outputs.
The output is not a single classified image in soft
classification. Here, a number of images are obtained
as the classified output. The pixel in each image
(generally referred to as fraction image) depicts the
proportion of individual LC classes.
However, these proportions do not actually represent
the spatial distribution of LC classes on ground.

Maximum Likelihood Classifier (MLC) :


MLC is one of the most widely used hard classifier.
In a standard MLC each pixel is allocated to the class with which
it has the highest posterior probability of class membership.
MLC has been adapted for the derivation of sub-pixel information.
This is possible because a by-product of a conventional MLC are
the posterior probabilities of each class for each pixel.

The posterior probability of each class provides is a relative


measure of class membership, and can therefore be used as an
indicator of sub-pixel proportions.
Often many author use the term Fuzzy MLC, to discriminate it
from the (hard) MLC.
Conceptually, there is not a direct link between the proportional
coverage of a class and its posterior probability. In fact, posterior
probabilities are an indicator of the uncertainty in making a
particular class allocation. However, many authors have find that
in practice useful sub-pixel information can be derived from this
approach.

X is a LC class c if and only if

pm pi
Where
X is a vector of DN values of unclassified pixels
pi is likelihood of ith LC class (i=1to c) whereas
pm is likelihood of LC class c and given by

pm

1
t

1
ln N i x i N i x i
2

Ni is variance co-variance matrix of LC class i

i vector of mean DN values of training data of LC class i

a posterior probabilities pa of a pixel belonging to ith LC class the


can be given by:
c

pa pm

mj

j 1

These a posterior probabilities represent the soft classification


output. For example, the a posterior probabilities of class
memberships for a pixel containing three LC classes; soil, water
and vegetation are obtained as 0.75, 0.03 and 0.22 respectively.
The MLC in its hard form will assign the pixel to soil; its
probability of occurrence being maximum in that pixel. On the
other hand, a softened output will show the probabilities of each
of the LC classes considered in a pixel.

Fuzzy c-Means (FCM):


It is an iterative clustering method. may be employed to partition
pixels of a satellite image into different class membership values.
Each pixel in the satellite image is related with every information
class by a function, known as membership function. The value of
membership function known simply as membership, varies between
0 and 1.
The membership value close to 1 implies that the pixel is more
representative of that particular information class, while
membership value close to 0 implies that the pixel has little or no
similarity with the information class.
The net effect of such a function is to produce fuzzy c-partition of a
given data (or satellite image in case of remote sensing).

The objective function for FCM can be given by


c

J fcm (U ,V ) ki D ( xk , vi )
m

i 1 k 1

D( xk , vi ) d xk - vi
2
ki

Where

( xk - vi )T A( xk - vi )

Subject to the constraints

i 1

ki

1 for all k ;

k 1

ki

0 for all i ;

0 ki 1 for all k, i

Where
matrix
V (v1 ........vc ) is the collection of the vectors with the information
class center vi
ki is a class membership values of a pixel

U N c

d ki is distance between feature space between xk and vi

xk is vector (feature vector) denoting spectral response of class k

vi is vector ( prototype vector) denoting the information class center

of class i.
c and N are total number of information classes and pixels
respectively.
A is a weight matrix.
m is a weighting exponent (or fuzzifier) 1 m
From the objective function of the FCM the membership value can
calculated as:

D( xk , vi )

ki

D
(
x
,
v
)
j
j 1
k

1
m 1

where D( xk , v j ) D( xk , vi )

i 1

ki is realization value of class membership ki


The center of information class can be computed as:

vi

k 1
N

ki


k 1

ki

xk
m

Possibilistic c-Means (PCM):


The objective function for PCM can be given by:
c

i 1

k 1

J pcm (U , V ) ki D( xk , vi ) i (1 ki ) m
m

i 1 k 1

Subject to the constraints


max ki 0 for all k;
i

k 1

ki

for all i ; 0 ki 1 for all k, i

The specificity of this new term is that, it emphasizes (or


assigns high membership value) the representative feature
point and de-emphasizes (or assigns low membership value)
the unrepresentative feature point present in the data.

From the objective function of the PCM the membership value can calculated as:

ki

1 D( xk , vi ) i

1
m 1

where
N

i K D( xk , vi )
k 1

m
ki

ki
k 1

i is known as bandwidth parameter

Noise Clustering (NC):


In FCM, noisy points (i.e. outliers) are grouped with
information classes with same overall membership value of one.
Noise classes (or outliers) can be segregated from the core
information class (or cluster). They do not degrade the quality
of clustering analysis.
The main concept of the NC algorithm is the introduction of a
single noise information class (c+1) that will contain all noise
data points.

The objective function for NC can be given by:


c

J nc (U , V ) ki D ( xk , vi ) k ,c 1
m

i 1 k 1

D ( xk , vi )


j 1 D ( xk , v j )
c

ki

1
m 1

k 1

D ( xk , v i )

1
m 1

,1 i c

j 1 D xk , v j

k ,c 1

1
m 1

Performance of the NC classifier is dependent on the Resolution parameter


().

Optimized value of resolution parameter is required.

Artificial Neural Network (ANN):


An ANN is a form of artificial intelligence that imitates some
functions of the human brain.
An ANN consists of a series of layers, each containing a set of
processing units (i.e. neurones).
All neurones on a given layers are linked by weighted connections
to all neurones on the previous and subsequent layers.
During the training phase, the ANN learns about the regularities
present in the training data, and based on these regularities,
constructs rules that can be extended to the unknown data.

Advantages of ANN
It is a non-parametric classifier, i.e. it does not require any assumption about the
statistical distribution of the data.
High computation rate, achieved by their massive parallelism, resulting from a dense
arrangement of interconnections (weights) and simple processors (neurones), which
permits real-time processing of very large datasets.

Disadvantages of ANN
ANN are semantically poor. It is difficult to gain any understanding about how the
result was achieved.
The training of an ANN can be computationally demanding and slow.
ANN are perceived to be difficult to apply successfully. It is difficult to select the type
of network architecture, the initial values of parameters such as learning rate and
momentum, the number of iterations required to train the network and the choice of
initial weights.

Decision Trees (DT):


Can be used as both the Hard or soft classifier
Advantage:
Ability to handle non-parametric training data, i.e. DT are not based on any
assumption on training data distribution.
DT can reveal nonlinear and hierarchical relationships between input variables
and use these to predict class membership.
DT yields a set of rules which are easy to interpret and suitable for deriving a
physical understanding of the classification process.
Good computational efficiency.
DT, unlike ANN, do not need an extensive design and training.
Disadvantage:
The use of hyperplane decision boundaries parallel to the feature axes may
restrict their use in which classes are clearly distinguishable.

Super-resolution Mapping :
Although the soft classification is informative and
meaningful it fails to account for the actual spatial
distribution of class proportions within the pixel.
Super-resolution mapping (or sub-pixel mapping) is a
step forward.
Super-resolution mapping considers the spatial
distribution within and between pixels in order to
produce maps at sub-pixel scale.

Several approaches of super-resolution mapping have


been developed:
Markov random fields
Hopfield neural networks
Linear optimization
Pixel-swapping solution (based on geostatistics)