Sei sulla pagina 1di 44

2 VHR data processing

Earth Observation (EO) technologies have facilitate to understand historical land


development patterns and archaeological prospection.
In fact they are showed to be a useful support for the knowledge and the management of
environmental resources and the identification of the elements useful for the study of the
territory and recently in the archaeological field, allowing to do numerous archaeological
discoveries and showing to have several utilities starting from discovery to monitoring, from
documentation to preservation and the detection of cultural features .
Actually the availability of images with very high spatial, spectral, radiometric and temporal
resolution and of user-friendly software and routines for data processing and analysis has
strongly increased the number of users interested in remote sensing technologies.
Archaeologists are a part of them, thanks to the benefits of remote sensing applications for
their investigations, such as: i) reduction of costs, time and risk associated with
archeological excavations; ii) creation of site strategies addressed to conservation and
preservation.
Nevertheless, this increasing interest in remote sensing has not been accompany with a new
perspective of data processing, analyses and interpretation.
Presently, no effective automatic procedures are available for archaeological purposes, but
semiautomatic and/or enhancement techniques seem to work quite well, even they can be
"site specific" or "feature specific".
Specific methodologies, developed ad hoc for archaeology, are needed in order to optimize
the extraction and understanding of the information content from the numerous active and
passive satellite data set. Often also radiometric, geometric distortions, noise reduction and
data integration have not been addressed at all, mainly because the adopted approach has
been very close to photo-interpretation being that, historically, aerial photography has been
the first remote sensing technology extensively used in archaeology.
To these aims this section is an overview of the main procedures commonly used in
processing of VHR satellite data, useful to extract as much information as possible from the

1
images and to aid the interpretation of Remote Sensing satellite data also for archaeological
purpose.
The choice of specific techniques or algorithms to use depends on the goals of each
individual project and on the data-set available.
This overview starts with a discussion of pre-processing techniques, which include
radiometric correction to correct for uneven sensor response over the whole image and
geometric correction to correct for geometric distortion due to Earth's rotation and other
imaging conditions. The image may also be transformed to conform to a specific map
projection system. Furthermore, if accurate geographical location of an area on the image
needs to be known, ground control points (GCP's) are used to register the image to a precise
map (geo-referencing). Alternatively, and particularly when georeferencing is not important,
it is possible to realize an image to image registration.
In order to aid visual interpretation, visual appearance of the objects in the image can be
improved by image enhancement, which include radiometric and geometric enhancement. In
this direction Vegetation Index, Principal Component Analysis, Tasseled Cap
Transformation, Edge Detection Algorithms and Spatial Statistic are discussed.
Then classification algorithms are described as a tool to discriminate different land-cover
types in an image using spectral features.
Finally, in order to obtain more information that can be derived from each of the single
sensor, data fusion technique is proposed. In particular, more emphasis is given to the fusion
of two images acquired by a multi-spectral sensor with a lower spatial resolution and a
panchromatic sensor with a higher spatial resolution.

2.1 Pre-processing techniques


Some satellite image requires preprocessing before the needed information can be properly
extracted from the data.
The most common pre-processing techniques are: (i) radiometric correction (also called
atmospheric correction) and (ii) geometric correction.
Radiometric corrections means to reduce effects due to the measured brightness values of
the pixels. The mechanism that create this can lead to two broad types of radiometric
distortion: 1)the relative distribution of brightness over an image in a given band can be
different to that in the ground scene; 2) the relative brightness of a single pixel from band to
band can be distorted compared with the spectral reflectance character of the corresponding
region on the ground. Both types can result from the presence of the atmosphere as a

2
transmission medium through which radiation must travel from its source to the sensor, and
can be a result also of instrumentation effects (Richards, 2006).
Geometric correction means to reduce image geometry errors which can arise in many ways:
i. The rotation of the earth during image acquisition,
ii. The finite scan rate of some sensors,
iii. The wide field of view of some sensors,
iv. The curvature of the earth
v. Sensor non-idealities
vi. Variations in platforms altitude, attitude and velocity,
vii. Panoramic effects to the imaging geometry.
There are two techniques to correct the various types of geometric distortion for digital
images: (i) to model the nature and magnitude of the source of distortion and use these
models to establish correction formulae (this approach is effective when the type of
distortion are well known); to establish a mathematical relationship between the addresses of
pixels in an image and the corresponding coordinates of those points on the ground.
Geometric correction must be applied on each band of image. If the bands are well
registered to each other, steps taken to correct one band in an image, can be used on all
remaining bands.
The modeling for doing these corrections are not described in this chapter because satellite
images used in this thesis are yet radiometrically and geometrically correct, but several
references are available that deal in depth with the issue of these corrections.
2.2 Image registration
An image can be registered to a map coordinate system using correction techniques and
therefore have its pixels addressable in terms of map coordinates (easting and northing, or
latitudes and longitudes) rather than pixels and line numbers.
Alternatively, and particularly when georeferencing is not important, it is possible to realize
an image to image registration. In this case an image is chosen as a master to which the other
is to be registered.
This registration is very useful to compare images of the same area acquired in different
periods because registered images facilities a pixel by pixel comparison.

2.3 Processing techniques


When the image is ready to be processed image processing techniques that are essential for
successful interpretation of remotely sensed data can be initiated.

3
These processing techniques can be divided into image enhancement and image
classification.
Image enhancement purpose is to improve image visual impact and to help pattern
recognition (Richards, 2006). It includes radiometric enhancement and geometric
enhancement.
Classification is the procedure most often used in image processing for grouping all the
pixels in an image into a finite number of individual classes or categories to produce a
thematic representation.

2.3.1 Radiometric enhancement


Radiometric enhancement improves the contrast of certain pixels at the expense of other
pixels. This is achieved by altering the intensity value histogram of an image.
Contrast modification is a mapping of brightness values, in that the brightness value of a
particular histogram bar is respecified more favorably. Image enhancement techniques do
not alter the digital numbers or brightness values of image.
It is a valuable tool for archaeological analysis.
Multiband enhancement are radiometric enhancements techniques which use various bands
of a multispectral digital sensor, such as mathematical operations which may be performed
on bands data. One commonly used example is the Normalized Vegetation Index (NDVI).
Change detection is another form of band mathematic which determines changes between
two dataset (images). Effectively the value of a later image is subtracted from those of an
earlier images. This may be useful in archaeology for multitemporal analysis.
There are image enhancement which use statistical operations. The most common are
Principal Component Analysis and Tessaled Cap Transformation.

2.3.1.1 Vegetation Index


Since the 1960s, scientists have extracted and modelled various vegetation biophysical
variables using remotely sensed data. Much of the effort has gone into the development of
vegetation indices defined as dimensionless, radiometric measures that function as
indicators of relative abundance and activity of green vegetation, often including leaf-area-
index (LAI), percentage green cover, chlorophyll content, green biomass, and absorbed
photosynthetically active radiation (APAR).
Vegetation indices (VIs) are based on digital brightness values; they attempt to measure
biomass or vegetative vigour. A VI is formed from combinations of several spectral values

4
that are added, divided, or multiplied in a manner designed to yield a single value that
indicates the amount or vigour of vegetation within a pixel. High values of the VI identify
pixels covered by substantial proportions of healthy vegetation. The simplest form of VI is a
ratio between two digital values from separate spectral bands.
Some band ratios have been defined by applying knowledge of the spectral behaviour of
living vegetation.(Campbell, 2002).
Band ratios are quotients between measurements of reflectance in separate portions of the
spectrum. Ratios are effective in enhancing or revealing latent information where there is an
inverse relationship between two spectral responses to the same biophysical phenomenon. If
two features have the same spectral behaviour, ratios provide little additional information;
but if they have quite different spectral responses, the ratio between the two values provides
a single value that concisely express the contrast between the two reflectances. (Campbell,
2002).
For living vegetation, the ratioing strategy can be especially effective
because of the inverse relationship between vegetation brightness in the
red and infrared regions. That is, absorption of the red light (R) by
chlorophyll and strong reflection if infrared (IR) radiation by mesophyll
tissue assures that the red and near infrared values will be quite
different, and that the ratio (IR/R) of actively growing plants will be high.
Non vegetated surfaces, including open water, man-made features, bare
soil, and dead or stressed vegetation will not display this specific spectral
response, and the ratios will decrease in magnitude. Thus, the IR/R ratio
can provide a measure of photosynthetic activity and biomass within a
pixel. (Campbell,2002)
The IR/R ratio is only one of many measures of vegetation vigour and abundance. The
green/red (G/R) ratio is based upon the same concepts used for the IR/R ratio, although it is
considered less effective.
One of the most widely use VIs developed by Rouse et al. in 1974 is known as the
normalized difference vegetation index (NDVI):

IRR
NDVI =
IR + R

5
This index in principle conveys the same kind of information as the IR/R
and G/R ratios, but is
constrained to vary within limits that provide desirable statistical
properties in the resulting values.
(Campbell, 2002; Jensen 2000)
Although such ratios have been shown to be powerful tools for studying
vegetation, they must be used with care if the values are to be rigorously
(rather than qualitative) interpreted. Values of ratios and Vis can be
influenced by many factors external to the plant leaf, including viewing
angle, soil background, and differences in row direction and spacing in
the case of agricultural crops. Ratios may be sensitive to atmospheric
degradation. Because atmospheric path length varies with viewing angle,
values calculated using off-nadir satellite data vary according to position
within the image. (Campbell, 2002).
Quickbird normalized difference vegetation index (NDVI) data were used
in my Dissertation (2005) in order to assess their capability in the field of
archaeological prospection. The investigations were performed for a test
case (Jure Vetere in the south of Italy) that is characterized by the
presence of dense vegetation mainly composed by herbaceous plants.
The results showed the high capability of QuickBird NDVI to enhance the
typical surface anomalies linked to the presence of archaeological buried
remains. The detected anomalies were confirmed by independent
investigations based on geophysical prospections performed in 2005.

2.3.1.2 Principal Component Analysis (PCA)


Principal Component Analysis (PCA) is a linear transformation which decorrelates
multivariate data by translating and/or rotating the axes of the original feature space, so that
the data can be represented without correlation in a new component space (Richards, 1989).
In order to do this, the process firstly computes the covariance matrix, then eigenvalues and
eigenvectors of S are calculated in order to obtain the new feature components.

6
1
cov k1, k 2

nm SB I , J , K 1 K 2 SB i , j , k 2 k 2
i 1 j 1

(8)

where k1, k2 are two image input, SB i,j, is the digital number (DN) value of processed
image in row i and column j, n number of row, m number of columns, mean of all pixel
SB values.
The percent of total dataset variance explained by each component is obtained by formula 9.

i
%i 100 *
i
i 1
(9)

where i are eigenvalues of S.

Finally, a series of new image layers (called eigenchannels or components) are computed
(formula 10) by multiplying, for each pixel, the eigenvector of S for the original value of a
given pixel in the image:

Pi Pk u k ,i
i 1
(10)

where Pi indicates the data set in component i, u k,i eigenvector element for component i in
input data k, Pk is the DN for k, number of input data.
A loading, or correlation R, of each component i with each input data k can be calculated by
using formula 11,
1 1
Rk , i u k ,i i 2 var k 2
(11)
where var k is the variance of input data k (obtained by reading the k th diagonal of the
covariance matrix).
The PCA transforms the input data-set in new components that should be able to make the
identification of distinct features and surface types easier. The major portion of the variance
is associated with homogeneous areas, whereas localized surface anomalies will be
enhanced in later components, which contain less of the total dataset variance. This is the

7
reason why they may represent information variance for a small area or essentially noise
and, in this case, it must be disregarded. Some problems can arise from the fact that
eigenvectors can not have general and universal meaning since they are extracted from the
series.
When a PCA is applied on a satellite image the parameters of section 1.2.3 becomes the
following:
k1, k2 are two input spectral channels;
SB i,j, is the spectral value of the given channel in row i and column j,

n number of row of image

m number of columns in image

mean of all pixel SB values in the subscripted input channels.

The percent of total dataset variance explained by each component is obtained by formula
(10). Moreover:

Pi indicates a spectral channel in component i

u k,i eigenvector element for component i in input band k

Pk spectral value for channel k.

A loading, or correlation R, of each component i with each input date k can be calculated by
using formula (11).
The PCA transforms the input multispectral bands in new component that should be able to
make the identification of distinct features and surface types easier. The major portion of the
variance in a multi-spectral data set is associated with homogeneous areas, whereas
localized surface anomalies will be enhanced in later components, which contain less of the
total dataset variance. This is the reason why they may represent information variance for a
small area or essentially noise and, in this case, it must be disregarded. Some problems can
arise from the fact that eigenvectors can not have general and universal meaning since they
are extracted from the series.
In archaeology PCA was usefully applied for linear pattern detection and spatial filtering to
Landsat 7 images, for the detection of Pre-hispanic pathways, in Aztec cities within and
outside the Valley of Mexico (Argote- Epino & Chavez 2005); for the discrimination of
surface archaeological remains in Hisar (southwest Turkey) (De Laet 2007), the extraction
of land patterns, useful for palaeogeographic and palaenvironmental investigations in
Metaponto, in the Ionian coast of Southern Italy (Masini & Lasaponara, 2006),

8
2.3.1.3 Tasseled Cap Transformation (TCT)
The TCT, also known as Kauth-Thomas technique, was devised for enhancing spectral
information content of satellite data. The TCT is a linear affine transformation
substantially based on the conversion of given input channel data set in a new data set of
composite values. The transformation depends on the considered sensor.
Usually there are just three composite variables tasseled cap transform bands that are often
used:
TCT-band 1
TCT-band 2
TCT-band 3
In particular, TCT-band 1 is a weighted sum of all spectral bands and can be interpreted as
the overall brightness or albedo at the surface. TCT-band 2 primarily measures the contrast
between the visible bands and near-infrared bands and is similar to a vegetation index. The
TCT-band 3 can be interpreted as a measure of soil and plant moisture.
The original TCT was derived (Kauth and Thomas, 1976) for the four bands of the Landsat
MSS sensor.

For MSS data: T.C.T band n = A1*(MSS4) + A2*(MSS5) + A3*(MSS6) + A4*(MSS7)

A1 A2 A3 A4
Brightness 0.32331 0.60316 0.67581 0.26278
Greenness -0.28317 -0.66006 0.57735 0.38833
Yellowness -0.89952 0.42830 0.07592 -0.04080

Later, the TCT was extended to the Landsat TM (Crist and Cicone, 1984) and ETM (as
available, for example, in a routine of geomatica, PCI software).

For TM data:
T.C. = A1*(TM1) + A2*(TM2) + A3*(TM3) + A4*(TM4) + A5*(TM5) + A7*(TM7)

9
For TM data we have chosen the coefficients as follows:

A1 A2 A3 A4 A5 A6
Brightness 0.3037 0.2793 0.4743 0.5585 0.5082 0.1863
Greenness -0.2848 - 0.2435 - 0.5436 0.7243 0.0840 - 0.1800
Wetness 0.1509 0.7112 0.1973 0.3279 - 0.3406 - 0.4572

For LANDSAT 7 data:


T.C. = A1*(L7_1) + A2*(L7_2) + A3*(L7_3) + A4*(L7_4) + A5*(L7_5) + A7*(L7_7)
For LANDSAT 7 data we have chosen the coefficients as follows:

A1 A2 A3 A4 A5 A7
Brightness 0.1544 0.2552 0.3592 0.5494 0.5490 0.4228
Greenness -0.1009 - 0.1255 - 0.2866 0.8226 - 0.2458 - 0.3936
Wetness 0.3191 0.5061 0.5534 0.0301 - 0.5167 - 0.2604

All the existing TCTs are performed on a pixel basis to best show the underlying
structure of the image by using weighted sums of the input channels.
Later, the TCT was also extended to the IKONOS (Horne 2003) and QuickBird
sensors(Lasaponara and Masini, 2007).
The weighted sums developed by Horne (2003) for IKONOS input channels were:

TCT IKONOS-band1 = .326 BLUE + 0.509 GREEN + 0.560 RED + 0.567 NIR (1)
TCT IKONOS-band2=-0.311 BLUE - 0.356 GREEN -0.325 RED + 0.819NIR (2)
TCT IKONOS-band3 = -0.612 BLUE - 0.312 GREEN + 0.722 RED -0.081 NIR (3)
TCT IKONOS-ban4 = -0.650 BLUE + 0.719 GREEN -0.243 RED -0.031 NIR (4)

To apply TCT on Quickbird, spectral channels was performed in two different ways
(Lasaponara and Masini, 2007). firstly, the weighted sums devised for ETM imagery was
adopted, solely considering the values for the BLUE, GREEN, RED, NIR channels (see
equations 5 to 7); and, secondly, those specifically developed for IKONOS data (see
equations 1 to 4).

TCT ETM-band 1 = 0.1544 BLUE + 0.2552 GREEN + 0.3592 RED + 0.5494 NIR (5)

10
TCT ETM-band 2 = -0.1009 BLUE 0.1255 GREEN 0.2866 RED + 0.8226 NIR (6)
TCT ETM-band 3 = 0.3191 BLUE +0.5061 GREEN + 0.5534 RED + 0.0301 NIR (7)

Lasaponara and Masini (2007) applied the Tasseled Cap Transformation (TCT) to QuickBird
multispectral images for extracting archaeological features linked to ancient human
transformations of the landscape. The investigation was performed on Metaponto.

2.3.2 Geometric enhancement


Geometric enhancement techniques modify and emphasize spatial details in an image. They
are characterized by operations over neighborhoods. The procedures still determine
modified brightness values for an images pixels. The new value for a given pixel is derived
from the brightness of a set of the surrounding pixels.
if geometric enhancement techniques are used on the image data directly then they can be
called image domain techniques; at the contrary they can be used in spatial frequency
domain. At the contrary of radiometric enhancements, geometric techniques do alter the
original data values.
One commonly geometric enhancement technique is the edge detection.

2.3.2.1 Edge detection


Edge detection is an image processing technique aimed to detect discontinuities in the
image.
An edge can be defined as the boundary between objects, or parts of objects, in an image
where brightness, colors, texture or other physical parameters change. Edges are the sign of
lack of continuity, and ending. As a result of this transformation, edge image is obtained
without encountering any changes in physical qualities of the main image.
Edge detection algorithms use matrix convolution to detect edges. In fact most of the edge
detection methods are expressed as template techniques in which a window is defined and
moved over each pixel of the image. Templates can be defined by different size. The result is
a new image r(i,j) with pixel radiometrically modified according to the specific numbers
loaded into the template. For a digital image, r(i,j) is calculated for each pixel value in i, j
position into the image by a linear combination of this matrix and the value of the template
(considering a MxN template ) :

11
M N
r ( i, j )= ( m ,n ) t (m ,n)
m=1 n=1

where ( m ,n ) is the pixel brightness value in a defined location and t(m,n) is the

template entry at that location.


As with edge detection methods the value of a cell is calculated by a new function again, it
will be possible to emphasize the difference between the values of neighboring cells, for
example by using the first or second derivative.
In an image with different grey levels, despite an obvious change in the grey levels of the
object, the shape of the image can be distinguished in Figure 1.2.
Step edge is typical when the image intensity changes from one value on one side of the
discontinuity to a different value on the opposite side; Line Edges, is typical of where the
image intensity abruptly changes value but then returns to the starting value within some
short distance. However, Step and Line edges are rare in real images. Because of low
frequency components or the smoothing introduced by most sensing devices, sharp
discontinuities rarely exist in real signals. Step edges become Ramp Edges and Line Edges
become Roof edges occur where intensity changes are not instantaneous but occur over a
finite distance.

.
Fig. 1.2 - Type of Edges (a) Step Edge (b) Ramp Edge (c) Line Edge (d)Roof Edge

Edge detection contain three mainly steps (Senthilkumaran N. and Rajesh R, 2009):

12
1) Filtering: Images are often corrupted by random variations in intensity values, called
noise. Some common types of noise are salt and pepper noise, impulse noise and Gaussian
noise. Salt and pepper noise contains random occurrences of both black and white intensity
values. However, there is a trade-off between edge strength and noise reduction. More
filtering to reduce noise results in a loss of edge strength.(Senthilkumaran N and Rajesh R.,
2008).
2) Enhancement: In order to facilitate the detection of edges, it is essential to determine
changes in intensity in the neighborhood of a point. Enhancement emphasizes pixels where
there is a significant change in local intensity values and is usually performed by computing
the gradient magnitude (Xian Bin Wen et al., 2008).
3) Detection: Many points in an image have a nonzero value for the gradient, and not all of
these points are edges for a particular application. Therefore, some method should be used to
determine which points are edge points. Frequently, thresholding provides the criterion used
for detection (Paulinas M. and Usinskas A., 2007).
All the Edge Detection Algorithms have the following common principles:
good detection: the algorithm should mark as many real edges in the image as possible.
good localization : edges marked should be as close as possible to the edge in the real
image.
minimal response: a given edge in the image should only be marked once, and where
possible, image noise should not create false edges.
Edge Detection Algorithms were the main subject of my Dissertation (2005). They were
successfully applied to data fusion Quick Bird images in order to emphasize the marks
arising from the presence of buried structures.

2.3.2.1.1 Median Filter


The median filter computes the median value of the grey-level values within a rectangular
filter window surrounding each pixel. This has the effect of smoothing the image and
preserving edges.
The median filter finds the median pixel value, i.e. the "middle" value in an ordered set of
values, below and above which there is an equal number of values.
If we suppose to have a 3x3 template like this:

a1 a2 a3
a4 CT a5
13
a6 a7 a8

then the pixel value will be

ABS CT a1 .... ABS CT a8


CTfiltered
8

where:
- a1,,a8 = grey pixel values into the template
- CT = grey value for pixel at the center of the template

2.3.2.1.2 Sharpening Filter


SHARP uses a subtractive smoothing method to sharpen an image.
The processing is composed of the following steps:
1. Application of an average filter to the image. It retains all low spatial frequency
information but has its high frequency features, such as edges and lines, attenuated.
2. the subtraction of the averaged image from its original. The resultant difference image
will have only the edges and lines substantially remaining.
3. Determination of the edges.
4. Addition of the difference image back to the original to give an edge enhanced image.
The resultant image will have clearer high frequency detail; however there is a tendency
for noise to be enhanced, as might be expected.

If we suppose to have the following template:

a1 a2 a3
a4 a5 a6
a7 a8 a9

The value of the filtered pixel is calculated as:

14
filtered pixel
a1 a2 a3 a4 ... a9
9

2.3.2.1.3 1st derivative edge detection methods

If an image consists of a continuous brightness function of a pair of continuous

coordinates x and y, then a vector gradient can be defined in the image according to:

( x , y )= ( x , y ) i+ ( x , y) j
x y

where i,j are a pair of unit vectors. The direction of the vector gradient is the direction of
maximum upward slope and its amplitude is the value of the slope.
2
( x , y ) =tan 1
1

The magnitude of the gradient defines the edge according to the following formula:

[
2 2
||= + =
2
1
2
2

x ][
( x , y ) +

y
( x , y )
]
The direction of the gradient defines contouring applications or determining aspect in DTM
(Richards, 2006).

Roberts Filtering
Roberts Algorithm realizes a cross-detection in the diagonal directions:
i+1,j+1)-i,j) and i,j+1)-i+1,j)

So the convolution template have a 2x2 size:

0 1 1 0
1 0 0 1

and

Since this procedure computes a local gradient it is necessary to choose a threshold value
above which edge gradients are said to occur.

15
Sobel Operators
Sobel edge detection produces an image where higher grey-level values indicate the
presence of an edge between two objects. The Sobel Edge Detection filter computes the root
mean square of two 3X3 templates. Sobel operator computes discrete gradient in the
orizontal and vertical directions at the pixel location I,j. For that the orthogonal components
of gradient are:

1={ ( i1, j+1 ) + ( i1, j )+ (i1, j1 ) } { ( i+1, j +1 ) + ( i+1, j ) + ( i+1, j1 ) }

2={ ( i1, j+1 ) + (i , j+1 ) + ( i1, j1 ) }{ ( i1, j 1 )+ ( i, j1 )+ ( i+1, j1 ) }

Which are equivalent to the following templates:

1 0 1 1 0 1
2 0 2 2 0 2

1 0 1 1 0 1
and

They equal to the following templates:

1 0 1 1 1 1
1 0 1 0 0 0

1 0 1 1 1 1
and
The first matrix implements a spatial derivative in the horizontal direction, whilst the second
matrix implements a spatial derivative in the vertical direction.
If we have the following window, with a1-a9 grey pixel values:

a1 a2 a3
a4 a5 a6
a7 a8 a9

the algorithm make the following computations:


X = -1*a1 + 1*a3 - 2*a4 + 2*a6 - 1*a7 + 1*a9
Y = 1*a1 + 2*a2 + 1*a3 - 1*a7 - 2*a8 - 1*a9

16
Finally theSobel gradient is:
Sobel gradient = SQRT (X*X + Y*Y)

Prewitt filtering
Prewitt edge detection produces an image where higher grey-level values indicate the
presence of an edge between two objects. The Prewitt Edge Detection filter computes the
root mean square of two 3X3 templates. It is one of the most popular 3X3 edge detection
filters.
The component of the gradient are:

1={ ( i1, j+1 ) + ( i1, j )+ (i1, j1 ) } { ( i+1, j +1 ) + ( i+1, j ) + ( i+1, j1 ) }

2={ ( i1, j+1 ) + (i , j+1 ) + ( i1, j1 ) }{ ( i1, j 1 )+ ( i, j1 )+ ( i+1, j1 ) }

They equal to the following templates:

1 0 1 1 1 1
1 0 1 0 0 0

1 0 1 1 1 1
and

The first matrix implements a spatial derivative in the horizontal direction, whilst the second
matrix implements a spatial derivative in the vertical direction.
If we have the following window:

a1 a2 a3
a4 a5 a6
a7 a8 a9

the algorithm make the following computations:


X = -1*a1 + 1*a3 - 1*a4 + 1*a6 - 1*a7 + 1*a9
Y = 1*a1 + 1*a2 + 1*a3 - 1*a7 - 1*a8 - 1*a9
Finally the Prewitt gradient is:

17
Prewitt gradient = SQRT (X*X + Y*Y)

2.3.2.1.3 2nd derivative edge detection methods


In Second order derivative method, sign is checked to detect if pixel lies on dark or light
side of edge. The zero crossing property of second order derivative is useful for
locating the centers of thick edges.
A Laplacian filter is a second derivative edge enhancement filter that operates without
regard to edge direction. Laplacian filtering emphasizes maximum values within the image
by using a kernel with a high central value typically surrounded by negative weights in the
north-south and east-west directions and zero values at the kernel corners.
For a given image, Laplacian of image is second order partial derivative along x and
y axis which is given by following equation:
2 2
2 x, y x, y i x, y j
x y

This algorithm is very sensitive to the noise. For that before using Laplace for edge
detection it need to suppress the noise.
Laplacian of Gaussian (LoG), this is the name of the algorithm, is composed of the
following steps:
- to suppress the noise before using Laplace for edge detection;
- Laplacian alghorithm application
- Localization of centers of thick edges based on zero crossing property of second
order derivative.

In this approach, firstly noise is reduced by convoluting the image with a Gaussian filter.
Isolated noise points and small structures are filtered out. With smoothing; however; edges
are spread. Those pixels, that have locally maximum gradient, are considered as edges by
the edge detector in which zero crossings of the second derivative are used. To avoid
detection of insignificant edges, only the zero crossings whose corresponding first derivative
is above some threshold, are selected as edge point. The edge direction is obtained using the
direction in which zero crossing occurs.

18
2.3.2.1.4 Canny Algorithm
Canny used the calculus of variations to find the function which optimizes a given edge
detection algorithm. The optimal function in Canny's detector is described by the sum of
four exponential terms, but can be approximated by the first derivative of a Gaussian. On the
base of this, Canny allowed to define the following algorithm:
Smooth the image with a Gaussian filter,
Compute the gradient magnitude and orientation using finite-difference approximations
for the partial derivatives,
Apply non-maxima suppression to the gradient magnitude,
Use the double thresholding algorithm to detect and link edges.
Canny edge detector approximates the operator that optimizes the product of signal-to-noise
ratio and localization. It is generally the first derivative of a Gaussian.
The ''non-maximal suppression'' step is an estimation of the image gradients, a search is
then carried out to determine if the gradient magnitude assumes a local maximum in the
gradient direction. From this stage a set of edge points is obtained. These are sometimes
referred to as "thin edges".
It is in most cases impossible to specify a threshold at which a given intensity gradient
switches from corresponding to an edge into not doing so. Therefore Canny uses
thresholding with hysteresis.
This requires two thresholds: high and low. Making the assumption that important edges
should be along continuous curves in the image allows us to follow a faint section of a given
line and to discard a few noisy pixels that do not constitute a line but have produced large
gradients. Therefore the algorithms begins by applying a high threshold. This marks out the
edges that can be fairly sure genuine. Starting from these, using the directional information
derived earlier, edges can be traced through the image. While tracing an edge, it applies the
lower threshold, allowing to trace faint sections of edges as long as to find a starting point.
The use of two thresholds with hysteresis in Canny Algorithm allows more flexibility than in
a single-threshold approach, but general problems of thresholding approaches still apply. A
threshold set too high can miss important information. On the other hand, a threshold set too
low will falsely identify irrelevant information (such as noise) as important. It is difficult to
give a generic threshold that works well on all images. No tried and tested approach to this
problem yet exists.

19
2.3.2.1.5 Edge Detection Algorithm based on the frequency domain: the Fourier
transform
An image can be represented in the spatial domain, as in the previous cases, but also in the
frequency domain. In the frequency domain each image channel is represented in term of
sinusoidal waves.
The Fourier Transform image represents the composition of the original image in terms of
spatial frequency components, i.e. sine and cosine components. Spatial frequency is the
image analog of the frequency of a signal in the time.
If we have a pixel in location i, j in a KxK pixels image and each pixel hasi,j) brightness,
the Fourier transform in discrete form is described by the following formula:
k 1 k 1
(r , s ) (i, j ) exp j 2 (ir js) / K
i 0 j 0

From the transformed image an image can be reconstructed according to the following
formula:
k 1 k 1
1
(i , j ) 2
K
(r , s) exp j 2 (ir js) / K
i 0 j 0

Thus the discrete Fourier transform of an image transforms each single row to generate an
intermediate image, and then transforms this by column to obtain the final result.
Usually the high spatial frequency into an image is associated with frequent changes of
brightness with position, whilst gradual changes of brightness with position are typical of
low frequency content in the spectrum.
Interpretation of frequency transformed images can be quite complicated. Infact, when the
Fourier Transformation is selected, the output domain is the two-dimensional frequency
spectrum of the input image. If these results are output to the display, a fairly symmetric
pattern will appear. Frequencies are along two directions (X and Y). The DC component,
which corresponds to the average brightness, (frequency = (0,0)) is at (K/2+1,K/2+1) where
K is the image size.
Points away from the DC point indicate higher frequencies. The transform at point
(K/2+1+x,K/2+1+y) corresponds to the cosine wave component which repeats every K/x
pixels along X direction and every N/y pixels along Y direction.

20
Image features which are aligned horizontally, make up the vertical components in the
Fourier Spectrum (and vice versa).

2.3.3 Spatial statistics


Spatial statistics is a part of the statistics literature that includes geostatistics, spatial
autoregression, point pattern analysis, centrographic measures, and image analysis.
Image analysis spatial statistics measures the spatial autocorrelation and semivariance for
images, i.e. it studies the distribution of phenomena in the space, the forms of aggregation
and spatial relations between data. In a remote sensing context, the values of interest in a
spatial autocorrelation analysis are commonly the digital number (DN) values pixels in an
image.
Spatial autocorrelation in an image can be affected by object size, spacing, and shape (Jupp
et al., 1988). Then the sensors spatial resolution will have an effect on the overall spatial
autocorrelation (Jupp et al., 1988).
Spatial autocorrelation may be classified as positive or negative (see Fig. 1.3). Positive
spatial autocorrelation has all similar values appearing together, while negative spatial
autocorrelation has dissimilar values appearing in close association. When no statistically
significant spatial autocorrelation exists, the pattern of spatial distribution is considered
random.

Fig. 1.3 (a) positive autocorrelation; (b) negative autocorrelation; (c) no autocorrelation
(or random)

In the context of image processing, spatial autocorrelation statistics can be used to measure
and analyze the degree of dependency among spectral features. It is generally described
thought some indices of covariance for a series of lag distances (or distance classes) from
each point. The plot of the given indices against the distance classes d is called
correlogramm, that illustrates autocorrelation at each lag distance. The distance at which the
value of spatial autocorrelation crosses the expected value, indicates the range of the patch
size or simply the spatial range of the pattern.

21
In the context of image processing, for each index and lag distance, the output is a new
image which contains a measure of autocorrelation.
Classic spatial autocorrelation statistics include a spatial weights matrix that reflects the the
intensity of the geographic relationship between observations in a neighbourhood. Such
spatial weights matrix indicate elements of computations that are to be included or excluded.
In this way it is possible to define ad hoc weights to extract and emphasize specific pattern.

2.3.3.1 Global Spatial Statistics in the image processing


Global spatial statistics look for an overall pattern between proximity and the similarity of
pixel values. These statistics provide a single value that describes the spatial autocorrelation
of the dataset as a whole. Three commonly used global methods for measuring global spatial
autocorrelation have been defined: semivariance (Matheron, 1971), Gearys c (Geary, 1954),
and Morans I (Moran, 1948). When semivariance is plotted against lag, a variogram is
produced. For Gearys c and Morans I the equivalent graph is termed a correlogram.
Semivariance is a measure of the average variance of the differences between all pairs of
measurements that are separated by the lagged distance (Curran 1998).

N
1 2
( h )=
2N 1 [ z ( i )z ( i +h ) ] (14)

Where: h = spatial lag


Z(i) = DN value at location i
N = number of DN pairs at lag h.

The semivariogram is the relationship between semivariance and lag. It has been studied
extensively, and a strong theoretical understanding exists about its behavior (e.g. Jupp et al.,
1988).
The Gearys c statistic is based on the squared difference between spatially lagged pairs of
pixels, normalized by the overall scene variance. In a remote sensing context, Gearys c
statistic can be defined as:

22
(n1)
W ij( i j )2
i j
c= (15)
2 W ij ( i )2
i j

Where: n = number of observations


Wij = weighted spatial lag between pixels i and j
Xi, xj = DN values of pixels i and j.

Gearys c range value is from 0 to 2 where:


- 0c<1 indicates strong positive spatial autocorrelation
- 1<c2 strong negative spatial autocorrelation
- c= 1 then spatial data are uncorrelated

In Morans I spatial autocorrelation is calculated as a function of the covariation between


pixels i and j. This statistic is calculated as follows:

n
W ij ( i )( j )
i j
I= (16)
W ij
i j
( i )2

Morans I range value is from -1 to 1; in particular if:


- -1<I0 indicates strong positive spatial autocorrelation
- 0I<1 strong negative spatial autocorrelation
- I= 0 then spatial data are uncorrelated

2.3.3.2 Local measures of spatial autocorrelation


Global measures of spatial autocorrelation potentially ignore important local variation in the
data. In response to this shortcoming, a Local spatial statistics is introduced.
Local measures of spatial autocorrelation look for specific areas in an image that have
clusters of similar or dissimilar values. The statistics output is an image for each index
calculated, which contains a measure of autocorrelation around that pixel. ENVI provides
three local spatial statistics: Anselin Local Moran's I, Getis-Ord Local Gi, and Anselin Local
Geary's C.

23
Anselin Local Moran's I index identifies pixel clustering. Positive values indicate a cluster of
similar values, while negative values imply no clustering (that is, high variability between
neighboring pixels).

Anselin Local Moran's I is calculated as follows:

i

(17)

I i =

2
where S X is the X variance.

The Local Geary's C index identifies areas of high variability between a pixel value and its
neighboring pixels. It is useful for detecting edge areas between clusters and other areas with
dissimilar neighboring values.

Anselin Local Geary's C is calculated as follows:

(n1)
W ij( i j )2
i j
c= (18)
2 W ij ( i )2
i j

The Getis-Ord Local Gi index (1992) compares pixel values at a given location with those
pixels at a lag, d, from the original pixel at location i. So it identifies hot spots, such as areas
of very high or very low values that occur near one another. This is useful for determining
clusters of similar values, where concentrations of high values result in a high Gi value and
concentrations of low values result in a low Gi value. The results of this index differ from
the results of the Local Moran's I index because clusters of negative values give high values
for I, but low values for Gi.

Getis-Ord Local Gi is calculated as follows:

24

wij ( d ) jW i
i=1
Gi ( d ) = for ij (19)
S W i ( nW i ) /( n1)

where Wi = wij ( d )
i=1

wij(d) = weighted spatial lag distance d between pixels i and j


xi , xj = DN values of pixels i and j.

Getis and Ord (1992) introduced a local autocorrelation measure, the Gi statistic. Anselin
(1995) subsequently proposed local indicators of spatial autocorrelation (LISA) as a general
means for decomposing global autocorrelation measurements so that the individual
contribution of each observation can be assessed, and local hot spot identified (Anselin,
1995).
To resume we can say that spatial autocorrelation statistics measure and analyze the degree
of dependence among features that have clusters of similar or dissimilar values. The use of
classic spatial autocorrelation statistics such as Morans I, Gearys C and Getis-Ord Local Gi
index (for more information, see Anselin (1995) and Getis and Ord (1994)) enables the
characterization of the spatial autocorrelation within a user-dened distance. For each index,
the output is a new image which contains a measure of autocorrelation around the given
pixel. In particular:
(i) the Local Morans I index identies pixel clustering and positive values imply the
presence of a cluster of similar values which means low variability between neighbouring
pixels, whereas negative values indicate the absence of clustering which means high
variability between neighbouring pixels;
(ii) the Getis-Ord Gi index permits the identication of areas characterized by very high or
very low values (hot spots) compared to those of neighbouring pixels;
(iii) the Local Gearys C index allows to identify edges and areas characterized by high
variability between a pixel value and its neighbouring pixels;
(iv) all of these indices are available as tools of commercial software for Geographical
Information System (GIS) or image processing such as ENVI.

25
2.3.4 Classification
Classification is the procedure most often used in image processing for grouping all the
pixels in an image into a finite number of individual classes or categories to produce a
thematic representation.
It can be performed on single or multiple image channels to separate areas according to their
different scattering or spectral characteristics.
The classification procedures are differentiated as being either supervised or unsupervised
(clustering).
An example of successful use of Classification in archaeology is Malinverni and Fangi
(2008); they used K-means classification on a QuickBird image to localize emergencies. In
realty, the use of Classification algorithms in archaeology is limited because it can help to
obtain a modern classification and maximum a distribution of each class.

2.3.4.1 Supervised Classification


For quantitative analysis of remote sensing image data a Supervised Classification is often
used.
It identifies spectrally similar areas on an image by identifying training sites of known
targets and then extrapolating those spectral signatures to other areas of unknown targets.
A variety of algorithms is available for Supervised Classification, ranging from those based
on probability distribution models for the classes of interest to those in which the
multispectral space is partitioned into class-specific regions using optimally located
surfaces. The major difference between them is their emphasis and their ability to
incorporate remotely sensed information. Common basic procedures for all the Supervised
Classification algorithms are:
- To choose a priori the set of ground cover types into which the images must be
segmented (i.e. water, urban regions, woods, etc.);
- To choose representative pixels from each set of classes or categories for any one land
cover within the image. This procedure creates the training data which lies in a common
region enclosed by border, the so called training field;
- To train areas, usually small and discrete compared to the full image, to estimate the
classification algorithm and to recognize ground cover classes based on their spectral
signatures, as found in the image;

26
- To assess the accuracy of the selected classification using the labeled testing data set and
to refine the training process on the basis of the obtained results. In fact, the accuracy
assessment must be made to determine how correct the classified image is. An accuracy
assessment involves the determination of the overall accuracy of the classification, errors
of omission, errors of commission, producers accuracy, and consumers accuracy. All
measures give an indication of how well the classification of the image was conducted.
Some common classification algorithms include:
- Minimum-Distance to the Mean-Classifier: they use the mean values for each of the
ground cover classes calculated from the training areas. Each pixel within the image is
then examined to determine the mean value that it is closest to. Whichever mean value
that pixel is closest to, based on Euclidian Distance, is the class to which that pixel will
be assigned.
- Parallelepiped Classifier: uses a mean vector as opposed to a single mean value. The
vector contains an upper and lower threshold, which dictates which class a pixel will be
assigned to. If a pixel is above the lower threshold and below the upper threshold, then it
is assigned to that class. If the pixel does not lie within the thresholds of any mean
vectors, then it is assigned to a unclassified or null category.
- The Mahalanobis Distance classification: is a direction-sensitive distance classifier that
uses statistics for each class. It is similar to the Maximum Likelihood classification but
assumes all class covariances are equal and therefore is a faster method. All pixels are
classified to the closest ROI class unless you specify a distance threshold, in which case
some pixels may be unclassified if they do not meet the threshold.
- Maximum Likelihood Classifier: evaluates the variance and co-variance of the various
classes when determining in which class to place an unknown pixel. The statistical
probability of a pixel belonging to a class is calculated based on the mean vector and co-
variance matrix. A pixel is assigned to the class that contains the highest probability.
- The Spectral Angle Mapper algorithm (SAM): measures the spectral similarity by
calculating the angle between the two spectra, treating them as vectors in n-
dimensional space, where n is the number of bands (Kruse et al., 1993; Van der Meer
et al., 1997; Rowan and Mars., 2003). The reference spectra can either be taken
from a spectral libraries, from a field measurements or extracted directly from the
image. From a mathematical point of view this method makes the assumption that the
dot product between the test spectrums "u" to a reference spectrum "v" is:

27
ui
( vi )=uv cos
n
u v =
i=1

ui e v i
where are the components of the vectors u e v in in n-dimensional space. So the

angle between the spectra is calculated with the following equation:

i
ui v


n n
2
( u i vi2)
i =1 i=1
n


i=1

uv
=arccos =arccos
uv

The angle is in the range from zero to /2. Small angles between the two spectrums
indicate high similarity and high angles indicate low similarity. Pixels further away than
the specified maximum angle threshold in radians are not classified. This method is
relatively insensitive to illumination and albedo because the angle between the two
vectors is independent of the vectors length (Crosta et al., 1998; Kruse et al., 1993).
It is common practice to make two or more iterations of a classification process, to improve
the accuracy of the result. With each iteration, the test sites are edited to better reflect the
representation of their class and to remove or reduce any class overlap.

2.3.4.2 Unsupervised Classification


Unsupervised classifications automatically groups pixels with similar spectral reflective
characteristics into distinct cluster, depending on their spectral features. These spectral
clusters are then labeled with a certain class name. The important difference to supervised
classification is that classes do not have to be defined a priori and pixels are clusterized in a
data set based on statistics only, without any user-defined training classes. This means that
unsupervised classification allows to discover classes that were not known before the
classification, but there is a posteriori recognition of the classes.

28
The two unsupervised classifications most commonly used in remote sensing are the
ISODATA and K-mean algorithm. Both of these algorithms are iterative procedures. In
general, both of them assign first an arbitrary initial cluster vector. Then they classify each
pixel to the closest cluster and finally calculate the new cluster mean vectors on the base of
all the pixels in one cluster. The second and third steps are repeated until the change between
the iteration is small. The change can be defined in several different ways, either by
measuring the distances the mean cluster vector have changed from one iteration to another
or by the percentage of pixels that have changed between iterations.
The objective of the k-means algorithm is to minimize the within cluster variability. The
objective function (which is to be minimized) is the sums of squares distances (errors)
between each pixel and its assigned cluster center.
It is widely used for classifying satellite imagery but it is biased on initial mean value
selected and it tends to misclassify the pixel value to different class.
ISODATA is an unsupervised classification method that uses an iterative approach
incorporating a number of heuristic procedures to compute classes. The ISODATA utility
repeats the clustering of the image into classes until either a specified maximum number of
iterations has been performed, or a maximum percentage of unchanged pixels has been
reached between two successive iterations. The algorithm starts by randomly selecting
cluster centers in the multidimensional input data space. Each pixel is then grouped into a
candidate cluster based on the minimization of a distance function between that pixel and
the cluster centers. After each iteration, the cluster means are updated, and clusters may be
split or merged further, depending on the size and spread of the data points in the clusters.
The ISODATA clustering method uses the minimum spectral distance formula to form
clusters. The equation for classifying by spectral distance is based on the equation for
Euclidean distance, i.e.:
SD

n
XYc=
i=1
2
(Ci X xyi ) )

where n= number of bands


C= particular class
Xxyi=value pixel x,y in band i

Ci= mean of DN in band I for class C

SDXYC= spectral distance from pixels x,y to the mean of class C


29
The ISODATA algorithm has some further refinements by splitting and merging of clusters
(JENSEN, 1996). Clusters are merged if either the number of members (pixel) in a cluster is
less than a certain threshold or if the centers of two clusters are closer than a certain
threshold. Clusters are split into two different clusters if the cluster standard deviation
exceeds a predefined value and the number of members (pixels) is twice the threshold for
the minimum number of members. This algorithm is similar to the k-means algorithm with
the distinct difference that the ISODATA algorithm allows for different number of clusters
while the k-means assumes that the number of clusters is known a priori.

2.3.4.3 Spectral Separability Measures


It is possible to have a measure of the separability between two spectral classes (this is never
known a priori). This separability can, for example, help to choice ROI for a supervised
classification but can also to measure simply the spectral separability between two selected
pixels-sets.
A quantitative evaluation of spectral separability of archaeological marks and their
surroundings was carried out by using one of the most widely used indices, the Jeffries
Matusita (JM) distance by Lasaponara and Masini (2007).
Bhattacharrya (or Jeffries-Matusita) Distance and the Transformed Divergence are two of
the most widely used indices for estimating the spectral separability between distributions.
Transformed Divergence is a popular empirical measure which is computationally simpler
than the Bhattacharrya Distance. However, the Bhattacharrya Distance is more theoretically
sound because it is directly related to the upper bound of the probabilities of classification
errors.
Both the Transformed Divergence and Bhattacharrya Distance measures are real values
between 0 and 2, where `0' indicates complete overlap between the signatures of two classes
and `2' indicates a complete separation between the two classes. Both measures are
monotonically related to classification accuracies. The larger the separability values are, the
better the final classification results will be.
The following rules are suggested for possible ranges of separability values:
- 0.0 to 1.0 : very poor separability. It indicates that the two signatures are statistically very
close to each other. The user has two options. The first option is that one signature can be

30
arbitrarily discarded (which is suggested when the separability is closer to 0). The second
option is that the two signatures can be merged (which is suggested when the separability
is closer to 1).
- to 1.9: poor separability. It indicates that the two signatures are separable to some extent.
However, it is desirable to improve separability if possible. Low signature separability is
usually caused by improper combinations of image bands and/or training sites which
have large internal variability within each class.
- 1.9 to 2.0: good separability.

The Transformed Divergence is given by the following equations:


TD(i,j) = 2*[1-exp(-D(i,j)/8)]

where TD(i,j) = Transformed Divergence between classes i and j


D(i,j) = divergence between classes i and j

D(i,j)=0.5*T[M(i)-M(j)]*[InvS(i)+InvS(j)]*
[M(i)-M(j)+0.5*Trace[InvS(i)*S(j)+InvS(j)*S(i)-2*I ]

where M(i) = mean vector of class i, where the vector has Nchanne elements (Nchannel is
the number of channels used)
S(i) = covariance matrix for class i, which has Nchannel by Nchannel elements
InvS(i) = inverse of matrix S(i)
Trace[ ] = trace of matrix (sum of diagonal elements)
T[ ] = transpose of matrix
I = identity matrix

The JeffriesMatusita (JM) distance is obtained from the Bhattacharya distance, shown in
equation 1.

{| |( + )/2|
}
1

1
B= ( mim j )t
8 {
i
2
j
} 1
( mim j ) + 2 ln
i

| | |
i
j
1/ 2

j
1/ 2 (12)

31
in which mi and mj are the class mean vectors, and i; i; are the class covariances.
The Bhattacharya distance can be seen as two components. The first part of equation (12)
represents the mean, whereas the second part is the covariance difference. For the BD a
greater value indicates a greater average distance.
Adrawback of theBDis that such an index does not provide any indication of threshold
values for separability.
JeffriesMatusita (JM) distance is shown in equation 12.

J ij =2(1eB) (13)

where B is Bhattacharya distance computed using formula 12.


The presence of the exponential factor in formula 13 gives an exponentially decreasing
weight to increasing separations between spectral classes.
In general to proceed to a feature selection, one needs to find the subset of features that gives
the largest average JM distance. The average pairwise distance is given by:

M M
d ave = p( i) p ( j )d ij
i=1 j=i+1

where M is the number of spectral classes and p( i) and p( j ) are the class prior

probabilities.

2.3.5 Data Fusion


With the development of multiple types of remote sensors on board satellites, more and
more data have become available for scientific researches. As the volume of data grows, so
does the need to combine data gathered from different sources to extract the most useful
information. Data fusion is an effective way for optimum utilization of large volumes of data
from multiple sources. Multi-sensor data fusion seeks to combine information from multiple
sensors and sources to achieve inferences that are not feasible from a single sensor or source.

32
The goal of multiple sensor data fusion is to integrate complementary and redundant
information to provide a composite image which could be used to better understanding of
the entire scene.

The fusion of information from sensors with different physical characteristics enhances the
understanding of our surroundings and provides the basis for planning, decision-making, and
control of autonomous and intelligent machines (Hall, 1997). In the past decades it has been
applied to different fields such as pattern recognition, visual enhancement, classification,
change detection, object detection and area surveillance (Pohl, 1998).
Multi-sensor data fusion can be performed at four different processing levels, according to
the stage at which the fusion takes place: signal level, pixel level, feature level, and decision
level (Dai, 1999).
(1) Signal level fusion. In signal-based fusion, signals from different sensors are combined
to create a new signal with a better signal-to noise ratio than the original signals.
(2) Pixel level fusion. Pixel-based fusion is performed on a pixel-by-pixel basis. It generates
a fused image in which information associated with each pixel is determined from a set of
pixels in source images to improve the performance of image processing tasks such as
segmentation (3) Feature level fusion. Feature-based fusion at feature level requires an
extraction of objects recognized in the various data sources. It requires the extraction of
salient features which are depending on their environment such as pixel intensities, edges or
textures. These similar features from input images are fused.
(4) Decision-level fusion consists of merging information at a higher level of abstraction,
combines the results from multiple algorithms to yield a final fused decision. Input images
are processed individually for information extraction. The obtained information is then
combined applying decision rules to reinforce common interpretation.
Among the hundreds of variations of image fusion techniques, the most popular and
effective methods include, but are not limited to, intensity-hue-saturation (IHS), high-pass
filtering, principal component analysis (PCA), different arithmetic combination(e.g., Brovey
transform), multi-resolution analysis-based methods (e.g., pyramid algorithm, wavelet
transform), and Artificial Neural Networks (ANNs).
Principal component analysis (PCA), intensity-hue-saturation (IHS), The Brovey transform,
Synthetic Variable Ratio (SVR), High-pass Filtering are standard fusion algorithms.
Three problems must be considered before their application: (1) Standard fusion algorithms
generate a fused image from a set of pixels in the various sources. These pixel-level fusion

33
methods are very sensitive to registration accuracy, so that co-registration of input images at
sub-pixel level is required; (2) One of the main limitations of HIS and Brovey transform is
that the number of input multiple spectral bands should be equal or less than three at a time;
(3) Standard image fusion methods are often successful at improves the spatial resolution,
however, they tend to distort the original spectral signatures to some extent [9,10]. More
recently new techniques such as the wavelet transform seem to reduce the color distortion
problem and to keep the statistical parameters invariable.
Some of these data fusion algorithms are described in the following section.

2.3.5 .1 Principal Component Analysis Sharpening


In PCA-based sharpening (Chavez et al., 1991), the PAN image replaces the first principal
component, before an inverse principal component transform is performed. Prior to
substitution, the PAN image is stretched such that its mean and variance are similar to that of
the first principal component.

2.3.5.2 Intensity Hue Saturation (HIS)


The IHS fusion converts a color MS image from the RGB space into the IHS color space.
Because the intensity (I) band resembles a panchromatic (PAN) image, it is replaced by a
high-resolution PAN image in the fusion. A reverse IHS transform is then performed on the
PAN together with the hue (H) and saturation (S) bands, resulting in an IHS fused image.
It uses three positional parameters in lieu of the Red, Green and Blue (RGB); Intensity, Hue
and Saturation. Intensity relates to the overall brightness of a color or energy level of the
light and is devoid of any color content. It shows how close it is to black or white. Hue
refers to the dominant or average wavelength of light contributing to a color, i.e. the actual
perceived color such as red, blue, yellow, orange, etc. Saturation specifies the degree to
which the color is pure from white light (grayscale) dilution or pollution. It runs from
neutral gray through pastel to saturated colors. The transformation from RGB color space to
IHS space is nonlinear, lossless and reversible. One can vary each of the IHS components
without affecting the others. It is performed by a rotation of axis from the first orthogonal
RGB system to a new orthogonal IHS system. The equations describing the transformation
to the IHS are as follows (Pellemans, et al., 1993):

34
[ ][ ][ ]
1 2 1 1
0 0

[]x 3 3 2 2 B
y = 2 1 0 1 0 G
0
z 3 3 1 1 R
0
0 0 1 2 2

The value of H, S and I can then be computed as:

y
H=tan 1 z
( )
x S=cos1
( x+ y+ z
m ( H ) ) I=
( x + y + z)
I M (H , S)

Where m ( H ) is the maximum co-latitude permitted at a given hue and I M ( H , S) is

the maximum intensity permitted at a given hue and co-latitude.

2.3.5.3 Brovey transformation


Brovey transformation is a numerical method to merge different source of data (Vrabel
1996, Liu 2000), which is implemented in the PCI software in the PANFUSE procedure.
Brovey equation is designed on the basis of the assumption that spectral range of the
panchromatic image is the same as that covered by multispecral data. The transformation is
defined by the following equation:

X k (i , j ) X p (n , m)
Y k ( i , j )= n

X k (i , j )
k=2

where Yk(i, j) and Xk(i, j) are the kth fused multispectral band and the original multispectral
band respectively.; I and j denote the pixel and line number respectively. Xp(m, n) is the
original panchromatic band., and m, and n denote the pixel and line number.
In the current case, the image fusion for quickbird data is carried as follows:
(i) selection of spectral bands
(ii) resample of them to panchromatic spatial resolution
(iii) perform Brovey transformation for the resampled new image data

35
The resulting image consist of a combination of the n multispectral bands and panchromatic
image.

2.3.5.4 Synthetic Variable Ratio (SVR)

Munechika et al. (1993) developed an SVR method to merge an MS image and a


high resolution panchromatic image as described in the following equation:
X S Li
XS Pi=Pan H
P anLSyn

where XSPi is the grey value of the ith band of the merged image, PanH is the grey
value of the original high spatial resolution image, XSLi is the grey value of the ith
band of the original MS image, and PanLSyn is the grey value of the low resolution
synthetically panchromatic image by the following simulation equation proposed by
Suits et al. (1988):
4
PanLSyn = i XS Li
i=1

In order to obtain parameters i, Munechika et al.(1993) adopted a modied atmospheric

model. The parameters i were then calculated through a regression analysis between the

values simulated through the atmospheric model and then measured for ve typical land
cover types: urban, soil, water, trees and grass. After construction of the PanLSyn, a linear
histogram match was used to force the original SPOT Pan image to match the PanLSyn in
order to eliminate atmospheric and illumination differences.

Zhang (1999) modied the SVR method in order to obtain a more stable i.

2.3.5.5 High-pass filtering


The high pass filtering (HPF) sharpening technique (Chavez et al., 1991) extracts high
frequency components from the PAN image using a high pass filter. These are then
incorporated into low frequency components of the multispectral images, as given by the
following equation,

36
S ( x , y )= {W high HPF [ PAN ( x , y) ] } {W low LPF [ O' (x , y ) ] }

to generate the corresponding sharpened images.

In equation (1), PAN (x,y) and O' (x , y ) correspond to a pixel at location (x,y) in the

PAN image and the interpolated and registered multispectral image (at wavelength ),
respectively.
HPF and LPF correspond to the high and low pass filter operators, respectively. The result
after applying the filter operator on an image I at location ( x,y) can be expressed as

Ky Kx

[ W i, j I (x + y , y + j) ] . Here, kx and ky are related to the kernel size Nx and Ny


J=K Y i=K x

N X +1 N +1
along the x and the y axis as k x= k y= y whereas wi,j are the filter
2 2

coefficients. The frequency components are combined in a proportion determined by the


corresponding weighting coefficients high whigh and wlow.The high frequency components
can be related to the spatial information in an image, whereas the low frequency components
correspond to the spectral information. Factors that influence the performance of this
technique are filter coefficients (wi,j) and window size ( Nx and Ny), in addition to weights
( whigh and wlow.)that determine the amount of frequency components being combined.

2.3.5.6 Zhang Pansharpening data fusion


The fusion between the panchromatic QuickBird and the multispectral images of the data
was performed by using a data fusion algorithm specifically developed for the VHR satellite
images by Zhang (2004). The author found that all the existing data fusion approaches were
not adequate for VHR satellite data for two main reasons (i) the colour distortion, and (ii)
dataset dependency.
Zhang (2004) devised a statistics-based fusion technique able to solve the two previously
quoted problems in image. Such a data fusion algorithm is quite different from the other
techniques in the following two principle ways:
(i) Firstly, to reduce the colour distortion, it utilizes the least squares technique to find the
best fit between the grey values of the image bands being fused and to adjust the
contribution of individual bands to the fusion result;

37
(ii) Secondly, to eliminate the problem of dataset dependency, it employs a set of statistic
approaches to estimate the grey value relationship between all the input bands.
This algorithm was adopted by Digital Globe
http://www.pcigeomatics.com/support_center/tech_papers/techpapers_main.php] and it is
also available in a PCIGeomatica routine (PANSHARP). In the PANSHARP routine, if the
original MS and Pan images are geo-referenced, the resampling process can also be
accomplished together with the fusion within one step. All the MS bands can be fused at one
time. The fusion can also be performed solely on user-specified MS bands.

2.3.5.7 Wavelet transforms


Wavelet transforms provide a framework in which an image is decomposed, with each level
corresponding to a coarser resolution band. In the case of fusing a MS image with a high-
resolution PAN image with wavelet fusion, the Pan image is first decomposed into a set of
low-resolution Pan images with corresponding wavelet coefficients (spatial details) for each
level.
Individual bands of the MS image then replace the low-resolution Pan at the resolution level
of the original MS image. The high resolution spatial detail is injected into each MS band by
performing a reverse wavelet transform on each MS band together with the corresponding
wavelet coefficients. In the wavelet-based fusion schemes, detail information is extracted
from the PAN image using wavelet transforms and injected into the MS image. Distortion of
the spectral information is minimized compared to the standard methods.
In order to achieve optimum fusion results, various wavelet-based fusion schemes had been
tested by many researchers.
In general, as a typical feature level fusion method, wavelet-based fusion could evidently
perform better than convenient methods in terms of minimizing color distortion and
denoising effects. It has been one of the most popular fusion methods in remote sensing in
recent years, and has been standard module in many commercial image processing soft
wares, such as ENVI, PCI, ERDAS. Problems and limitations associated with them include:
(1) Its omputational complexity compared to the standard methods; (2) Spectral content of
small objects often lost in the fused images; (3) It often requires the user to determine
appropriate values for certain parameters (such as thresholds). The development of more
sophisticated wavelet-based fusion algorithm (such as Ridgelet, Curvelet, and Contourlet
transformation) could improve the performance results, but these new schemes may cause
greater complexity in the computation and setting of parameters.

38
2.3.5.8 Artificial neural network
Artificial neural networks (ANNs) have proven to be a more powerful and self-adaptive
method of pattern recognition as compared to traditional linear and simple nonlinear
analyses. The ANN-based method employs a nonlinear response function that iterates many
times in a special network structure in order to learn the complex functional relationship
between input and output training data.
The General schematic diagram of the ANN-based image fusion method can be summarized
as follow: the input layer has several neurons, which represent the feature factors extracted
and normalized from image A and image B. The hidden layer has several neurons and the
output layer has one neuron (or more neuron). The ith neuron of the input layer connects with
the jth neuron of the hidden layer by weight Wij, and weight between the j th neuron of the
hidden layer and the tth neuron of output layer is
Vjt (in this case t =1). The weighting function is used to simulate and recognize the response
relationship between features of fused image and corresponding feature from original
images (image A and image B).
As the first step of ANN-based data fusion, two registered images are decomposed into
several blocks with size of M and N. Then, features of the corresponding blocks in the two
original images are extracted, and the normalized feature vector incident to neural networks
can be constructed. The next step is to select some vector samples to train neural networks.
Many neural network models have been proposed for image fusion such as BP, SOFM, and
ARTMAP neural network.
The ANN-based fusion method exploits the pattern recognition capabilities of artificial
neural networks, and meanwhile, the learning capability of neural networks makes it feasible
to customize the image fusion process. Many of applications indicated that the ANN-based
fusion methods had more advantages than traditional statistical methods, especially when
input multiple sensor data were incomplete or with much noises. It is often served as an
efficient decision level fusion tools for its self learning characters, especially in land
use/land cover classification. In addition, the multiple inputs - multiple outputs framework
make it to be an possible approach to fuse high dimension data, such as long-term time-
series data or hyper-spectral data.

2.3.5.9 Data Fusion Performance evaluation

39
The best results from data fusion is that the multispectral set of fused images should be as
identical as possible to the set of multispectral images that the corresponding sensor
(reference) would observe with the high spatial resolution of panchromatic. As no
multispectral reference images are available at the requested higher spatial resolution, the
assessment of the quality of the fused products is not obvious. Several score indices or
figure metrics have been designed over the years (see, Thomas and Wald, 2007) to evaluate
the performances of the fused images. Both intra-band indices and inter-band indices have
been set up in order to measure respectively, spatial distortions (radiometric and geometric
distortions) and spectral distortions (colour distortions).
In order to assess the performance of data fusion algorithms, three properties should be
verified as expressed by Wald et al., PERS, 1997, Best Paper Award 97:
1. The data fusion products, once degraded to their original resolution, should be equal to the
original.
2. The data fusion image should be as identical as possible to the MS image that would be
acquired by the corresponding sensor with the high spatial resolution of the Pan sensor.
3. The MS set of fused images should be as identical as possible to the set of MS images that
would be acquired by the corresponding sensor with the high spatial resolution of Pan.
As no multispectral reference images are available at the requested higher spatial resolution,
the verification of the second and the third property is not obvious. In order to overcome this
drawback, three different methodological approaches can be followed, the Wald protocol,
the Zhou protocol, and, finally the QNR (Quality with No Reference) index devised by
Alparone et al (2007).

2.3.5.9a Wald protocol


In order to solve the problems linked to the unavailability of the multispectral reference
images, Wald et al., (PERS, 1997, Best Paper Award 97) see Alparone et al 2007 b,
suggested a protocol to be applied in order to evaluate the quality of data fusion products.
Such a protocol is based on the following three steps:
1. spatial degradation of both the Pan and MS images by the same factor,
2. fusing the MS images at the degraded scale;
3. comparing the fused MS images with the original reference MS images.
The Wald protocol assumes a scale invariance behaviour.

40
This means that performances of fusion methods are supposed to be invariant when fusion
algorithms are applied to the full spatial resolution. Nevertheless, in the context of remote
sensing of archaeology, the small features, which represent a large amount of the
archaeological heritage, can be lost after degrading both the Pan and MS. In this situations,
the archaeological feature will be missed, and, therefore, the evaluation of data fusion results
could not be performed over the targets of interest. To avoid the degradation, one can
considered the two alternative approaches described in section 2. 7.2 and 2.7.3.

2.3.5.9b Zhou protocol


As an alternative to Wald's protocol, the problem of measuring the fusion quality may be
approached at the full spatial scale without any degradation by applying Zhous Protocol
(Zhou et al., IJRS, 1998). Such a protocol is based on the following three criteria:
(1) Both the spectral and spatial qualities are evaluated, but by using separate scores from
the available data: the first from the low resolution MS bands and the second one from the
high resolution Pan image.
(2) The evaluation of spectral quality is performed for each band by computing an absolute
cumulative difference between the fused and the input MS images.
(3) The evaluation of spatial quality is obtained as the correlation coefficient (CC) computed
between the spatial details of the Pan image and of each of the fused MS bands.
Such spatial details are extracted by using a Laplacian filter. Unfortunately, some problems
can arise by using Zhous Protocol (Alparone et al. 2007b). Firstly, the two quality measures
follow opposite trends. Secondly, at degraded scale, the obtained results can not be in
agreement with objective quality indices.

2.3.5.9c QNR index Alparone et al. (2007)


The QNR (Quality with No Reference) index devised by Alparone et al. (2007) is a blind
index capable of jointly measuring the spectral and spatial quality at the full scale. This
index should allow to overcome the drawbacks that can arise when using Zhou's protocol.
The QNR computes both the spatial and spectral distortion from the quality index (Q) by
Wang & Bovik (2002).
This index combines the correlation coefficient with luminance and contrast distortion. It
was devised for image fusion to assess the quality of output image, as well as for evaluating
image processing systems and algorithms. Given an image X and its reference image Y, the
quality index proposed by Wang, Bovik is calculated as:

41
( 2 x + y + C 1 )( 2 xy +C 2 )
Q=
( 2x + 2y + C 1 ) ( 2x +2y +C 2)

where C1=(K1*L) and C2=(K1*L).


x and y indicate the mean of the two images X and its reference image Y,x and y are the
standard deviation, x,y represents the covariance between the two images, and L is the
dynamic range for the image pixel values, k1 << 1 and k2 << 1 are two constants chosen
equal to 0.01 and 0.03, respectively.
Although the values selected for k1 and k2 are arbitrary, It was experienced that the quality
index is insensitive to variations of k1 and k2. Note that C1 and C2 are solely introduced to
stabilize the measure. In other word, just to avoid the denominator approaches zero values
for flat regions.
To measure the overall image quality the mean quality index can be rewritten as a three
factor product, that can be regarded are relatively independent.

( xy +C 3) (2 x y +C 1) (2 x y + C 2)
Q ( x , y )=f ( l ( x , y ) , c ( x , y ) , s ( x , y ) )=
( x y +C 3) ( 2x + 2y +C 1) ( 2x + 2y +C 2)

where C3 is a small positive constantsas C1 and C2.


In particular, among the three factor of equation 2, the first (varying between -1 and 1)
represents the correlation coefficient between the two image x and y; the second (varying
between 0 and 1) measures the similarity between the mean luminance values of x and Y,
and finally, the third (varying between 0 and 1) measures the contrast similarity.
The rationale The QNR (Quality with No Reference) index devised by Alparone et al.
(2007) is that the Q index calculated between any two spectral bands and between each band
and the Pan image should be unchanged after fusion. In order to obtain a single index, both
the spectral and spatial distortion indices are complemented and combined together to obtain
a single index that measures the global quality of the fused image.
In detail, the spectral distortion is computed as follows:

42
The spectral distortion is obtained by computing the difference of Q values from the fused
MS bands and the input MS bands, re-sampled at the same spatial resolution as the Pan
image
The Q is calculated for each couple of bands of the fused and re-sampled MS data to form
two matrices with main diagonal equal to 1
The measure of spectral distortion Dl is computed by using a value proportional to the p-
norm of the difference of the two matrices


L L
1 ^ r ) Q (~
Ge , ~
2
D = P 2
L L l =1 r=1
| ^e,G
Q(G Gr )|
r l

^ e ,G
(G ^r )
where L is the number of the spectral bands processed, and denotes the Q is

calculated for each couple of bands of the fused and resampled MS data.
The spatial distorsion is computed two times: 1) between each fused MS band and the Pan
image; and than 2) between each input MS band and the spatially degraded Pan image. The
spatial distortions Ds are calculated by a value proportional to the q-norm of the differences


L
q 1 ^ , P ) Q ( ~ ~ q
DS =
L l=1
|Qn ( G l n G l , P )|

^l,P)
(G
where L is the number of the spectral bands processed, and denotes the Q is calculated
~ ~
between each fused MS band and the Pan image, and Qn ( G l , P ) denotes the Q is calculated

between each input MS band and the spatially degraded Pan image

43
Hall, L.; Llinas, J. An introduction to multisensor data fusion. Proc. IEEE. 1997, 85, 623.

Pohl, C.; Van Genderen, J.L. Multisensor image fusion in remote sensing: concepts, methods and
applications. Int. J. Remote Sens. 1998, 19, 823854.

Dai, X.; Khorram, S. Data fusion using artificial neural networks: a case study on multitemporal
change analysis. Comput. Environ. Urban Syst. 1999, 23, 1931.

44

Potrebbero piacerti anche