Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IMAGE TRANSFORMS
CONTENTS
2.1. Unitary Transforms
2.1.1. One dimensional signals
2.1.2. Two dimensional signals (images)
2.1.3. Fundamental properties of unitary transforms
2.2. The Two Dimensional Fourier Transform
2.2.1. Continuous space and continuous frequency
2.2.2. Discrete space and continuous frequency
2.3. Discrete space and discrete frequency
2.3.1. The two dimensional Discrete Fourier Transform (2-D DFT)
2.3.2. Properties of the 2-D DFT
2.3.3. The importance of the phase in 2-D DFT.
2.4. The Discrete Cosine Transform (DCT)
2.4.1..One dimensional signals
2.4.2. Two dimensional signals (images)
2.4.3. Properties of the DCT transform
2.5. Walsh Transform (WT)
2.5.1. One dimensional signals
2.5.2. Two dimensional signals
2.5.3. Properties of the Walsh Transform
2.6. Hadamard Transform (HT)
2.6.1..Properties of the Hadamard Transform
2.7. Karhunen-Loeve (KLT) or Hotelling Transform
2.7.1. The case of one realisation of a signal or image
2.7.2. Properties of the Karhunen-Loeve transform
2.8. Slant Transform
2.9. Haar transform
2.9.1. Haar Matrix
2.9.2. Properties of the Haar transform
2.10. The Discrete Sine Transform (DST)
2.10.1. Properties of the DST transform
2.11. Question bank
TECHNICAL TERMS
1. Image
An image may be defined as two dimensional light intensity function f(x, y)
2. Pixel
A digital image is composed of a finite number of elements each of which has
particular location or value
3. Path
Path from pixel p with co-ordinates (x, y) to pixel q with co- ordinates (s, t) is
a sequence of distinct pixels with co-ordinates.
4. Image Transform
An image transform provides a set of coordinates or basis vectors for vector space.
5. Baud rate
The speed of data transmission, (i.e.) bits per second.
6. Addressable resolution
The maximum number of locations addressable at one pixel per location.
7. Pixelation
The degree of visibility of these blocky pixels depends upon the viewing distance
from the cathode-ray tube
IMAGE TRANSFORMS
one
dimensional
f f (0) f (1) f ( N 1)
{ f ( x), 0 x N 1}
sequence
represented
as
vector
g T f g (u ) T (u , x) f ( x), 0 u N 1
x 0
Where g (u ) is the transform (or transformation) of f ( x ) , and T (u, x ) is the so called forward
transformation kernel. Similarly, the inverse transform is the relation
N 1
f ( x ) I ( x, u ) g (u ), 0 x N 1
u 0
The matrix T is called unitary, and the transformation is called unitary as well. It can be proven
(how?) that the columns (or rows) of an N N unitary matrix are orthonormal and therefore, form a
complete set of basis vectors in the N dimensional vector space.
In that case
f T
N 1
g f ( x) T (u, x) g (u )
u 0
vectors of T .
g (u , v) T (u , v, x, y ) f ( x, y )
x 0 y 0
N 1 N 1
f ( x, y ) I ( x, y, u , v) g (u , v)
u 0 v 0
Where, again,
T (u , v, x, y )
and
I ( x, y , u , v )
T (u , v, x, y ) T1 (u , x )T2 (v, y )
It is said to be symmetric if T1 is functionally equal to T2 such that
T (u , v, x, y ) T1 (u , x )T1 (v, y )
The same comments are valid for the inverse kernel.
If the kernel T (u , v, x, y ) of an image transform is separable and symmetric, then the transform
N 1 N 1
N 1 N 1
x 0 y 0
x 0 y 0
t ij T1 (i , j ) . If, in addition, T 1 is a unitary matrix then the transform is called separable unitary and the
f T 1 g T 1
Thus, a unitary transformation preserves the signal energy. This property is called energy preservation
property.
This means that every unitary transformation is simply a rotation of the vector
dimensional vector space.
For the 2-D case the energy preservation property is written as
N 1 N 1
N 1 N 1
f ( x , y ) g ( u, v )
x 0 y 0
u 0 v 0
in the N -
f ( x , y ) of two variables. If
f ( x , y ) is
continuous and integrable and F ( u, v ) is integrable, the following Fourier transform pair exists:
F (u , v ) f ( x, y )e j 2 (ux vy ) dxdy
f ( x, y )
1
j 2 ( ux vy )
dudv
F (u , v)e
( 2 ) 2
In general F ( u, v ) is a complex-valued function of two real frequency variables u, v and hence, it can
be written as:
F (u , v ) R (u , v) jI (u , v )
The amplitude spectrum, phase spectrum and power spectrum, respectively, are defined as follows.
R 2 ( u, v ) I 2 ( u, v )
F ( u, v )
I (u , v )
R (u , v )
(u , v ) tan 1
P( u, v ) F ( u, v )
R 2 ( u, v ) I 2 ( u, v )
f ( x, y )e j ( xu vy )
F ( u, v )
x y
f ( x, y )
1
( 2 ) 2
F (u, v )e
j ( xu vy )
dudv
u v
lim lim
N2
f ( x , y )e
j ( xu vy )
N1 N 2 x N y N
1
2
and v .
F (u , v)
1 M 1 N 1
f ( x, y )e j 2 (ux / M vy / N )
MN x 0 y 0
u 0, , M 1 , v 0, , N 1
f ( x , y ) F ( u, v )e j 2 ( ux / M vy / N )
u 0 v 0
F ( u, v )
f ( x, y )
1 N 1 N 1
j 2 ( ux vy )/ N
f ( x , y )e
N x 0 y 0
1 N 1 N 1
j 2 ( ux vy )/ N
F ( u, v )e
N u 0 v 0
It is straightforward to prove that the two dimensional Discrete Fourier Transform is separable, symmetric
and unitary.
2.3.2. Properties of the 2-D DFT
Most of them are straightforward extensions of the properties of the 1-D Fourier Transform. Advise
any introductory book on Image Processing.
2.3.3. The importance of the phase in 2-D DFT.
The Fourier transform of a sequence is, in general, complex-valued, and the unique representation
of a sequence in the Fourier transform domain requires both the phase and the magnitude of the Fourier
transform. In various contexts it is often desirable to reconstruct a signal from only partial domain
information. Consider a 2-D sequence f ( x , y ) with Fourier transform F (u, v ) f ( x , y ) so that
F (u , v ) { f ( x, y} F (u , v ) e
j f ( u , v )
It has been observed that a straightforward signal synthesis from the Fourier transform phase
f (u , v ) alone often captures most of the intelligibility of the original image
F (u , v )
f ( x , y ) (why?). A
generally capture the original signals intelligibility. The above observation is valid for a large number of
signals (or images). To illustrate this, we can synthesize the phase-only signal
magnitude-only signal f m ( x, y ) by
f p ( x, y ) 1 1e
j f ( u , v )
f p ( x, y ) and the
f m ( x, y ) 1 F (u , v ) e j 0
Consider two images f ( x , y ) and g ( x, y ) . From these two images, we synthesize two other images
f 1 ( x, y ) and g 1 ( x, y ) by mixing the amplitudes and phases of the original images as follows:
F (u, v) e
f1 ( x, y ) 1 G (u , v ) e
g1 ( x, y ) 1
j f ( u , v )
j g ( u , v )
intelligibility of g ( x, y ) .
(2 x 1)u
, u 0,1, , N 1
2N
N 1
C (u ) a (u ) f ( x) cos
x0
1/ N
u0
2/ N
u 1, , N 1
a (u )
( 2 x 1)u
2N
N 1
f ( x) a (u )C (u ) cos
u 0
( 2 x 1)u
( 2 y 1)v
cos
2N
2N
C (u , v) a (u ) a (v) f ( x, y ) cos
x0 y 0
(2 x 1)u
(2 y 1)v
cos
2N
2N
N 1 N 1
f ( x, y ) a (u )a (v )C (u , v) cos
u 0 v 0
The DCT is a real transform. This property makes it attractive in comparison to the Fourier transform.
The DCT has excellent energy compaction properties. For that reason it is widely used in image
compression standards (as for example JPEG standards).
There are fast algorithms to compute the DCT, similar to the FFT for computing the DFT.
and
and
we need
bits to
we can write:
W (u )
1 N 1
n 1
b ( x )b
(u )
f ( x) (1) i n 1i or
N x 0
i 0
n 1
bi ( x )bn 1i (u )
1 N 1
W (u )
f ( x)(1) i 0
N x 0
The array formed by the Walsh kernels is again a symmetric matrix having orthogonal rows and columns.
Therefore, the Walsh transform is and its elements are of the form T (u , x )
n 1
( 1)
i 0
bi ( x ) bn 1i ( u )
. You
the rows of the matrix T which are the vectors T (u ,0) T (u ,1) T (u , N 1) have the form of
square waves. As the variable
square waves frequency increases as well. For example for u 0 we see that
(u )10 bn 1 (u )bn 2 (u ) b0 (u ) 2 00 0 2
T (0, x ) 1 and W (0)
and hence,
bn 1 i (u ) 0 , for any
i . Thus,
1 N 1
f ( x) . We see that the first element of the Walsh transform in the mean
N x 0
of the original function f (x ) (the DC value) as it is the case with the Fourier transform.
The inverse Walsh transform is defined as follows.
N 1
n 1
f ( x) W (u ) ( 1) bi ( x )bn 1 i (u ) or
u 0
i 0
n 1
bi ( x )bn 1 i (u )
f ( x) W (u )( 1) i 0
N 1
u 0
W (u, v )
1 N 1 N 1
n 1
( b ( x )b
( u ) bi ( y ) bn1i ( v ))
f ( x, y ) ( 1) i n1i
or
N x 0 y 0
i 0
n 1
(bi ( x ) bn 1 i ( u ) bi ( y ) bn 1 i ( v ))
1 N 1 N 1
W ( u, v )
f ( x, y )( 1) i 0
N x 0 y 0
The inverse Walsh transform is defined as follows for two dimensional signals.
f ( x, y )
1 N 1 N 1
n 1
( b ( x )b
( u ) bi ( y ) bn 1i ( v ))
W (u, v ) ( 1) i n1i
or
N u 0 v 0
i 0
n 1
(bi ( x ) bn 1i ( u ) bi ( y ) bn1i ( v ))
1 N 1 N 1
f ( x, y )
W (u, v )( 1) i0
N u 0 v 0
Unlike the Fourier transform, which is based on trigonometric terms, the Walsh transform consists of a
series expansion of basic functions whose values are only 1 or 1 and they have the form of square
waves. These functions can be implemented more efficiently in a digital environment than the
exponential basis functions of the Fourier transform.
The forward and inverse Walsh kernels are identical except for a constant multiplicative factor of
1
N
The forward and inverse Walsh kernels are identical for 2-D signals. This is because the array formed
by the kernels is a symmetric matrix having orthogonal rows and columns, so its inverse array is the
same as the array itself.
The concept of frequency exists also in Walsh transform basis functions. We can think of frequency as
the number of zero crossings or the number of transitions in a basis vector and we
call this number sequence. The Walsh transform exhibits the property of energy compaction as all the
transforms that we are currently studying.
For the fast computation of the Walsh transform there exists an algorithm called Fast Walsh Transform
(FWT). This is a straightforward modification of the FFT. Advise any introductory book for your own
interest.
H ( u, v )
1 N 1 N 1
n 1
( b ( x ) b ( u ) bi ( y ) bi ( v ))
n
f ( x, y ) ( 1) i i
, N 2 or
N x 0 y 0
i 0
n 1
(bi ( x ) bi ( u ) bi ( y ) bi ( v ))
1 N 1 N 1
i 0
H (u, v )
f
(
x
,
y
)(
1
)
N x 0 y 0
Inverse
f ( x, y )
1 N 1 N 1
n 1
( b ( x ) b ( u ) bi ( y ) bi ( v ))
H (u, v ) ( 1) i i
etc.
N u 0 v 0
i 0
Most of the comments made for Walsh transform are valid here.
The Hadamard transform differs from the Walsh transform only in the order of basis functions. The
order of basis functions of the Hadamard transform does not allow the fast computation of it by using a
straightforward modification of the FFT. An extended version of the Hadamard transform is the
Ordered Hadamard Transform for which a fast algorithm called Fast Hadamard Transform (FHT) can
be applied.
An important property of Hadamard transform is that, letting H N represent the matrix of order N ,
the recursive relationship is given by the expression
H
H 2N N
HN
HN
H N
Consequently, the KL transform is also called the Hotelling transform or the method of principal
components. The term KLT is the most widely used.
x1
x
2
x
xn
The mean vector of the population is defined as
m x E x
The operator E refers to the expected value of the population calculated theoretically using the
probability density functions (pdf) of the elements x i and the joint probability density functions between
the elements xi and x j .
The covariance matrix of the population is defined as
C x E ( x m x )( x m x )T
Because x is
n -dimensional,
element cii of C x is the variance of x i , and the element cij of C x is the covariance between the
elements x i and x j . If the elements xi and x j are uncorrelated, their covariance is zero and, therefore,
cij c ji 0 .
For M vectors from a random population, where M is large enough, the mean vector and
covariance matrix can be approximately calculated from the vectors by using the following relationships
where all the expected values are approximated by summations
mx
Cx
1
M
x
k 1
1 M
T
T
xk xk m x mx
M k 1
Very easily it can be seen that C x is real and symmetric. In that case a set of
you are familiar with that term) eigenvectors always exists. Let e i and i , i 1,2, , n , be this set of
eigenvectors and corresponding eigenvalues of C x , arranged in descending order so that i i 1 for
i 1,2, , n 1 . Let A be a matrix whose rows are formed from the eigenvectors of C x , ordered so
that the first row of A is the eigenvector corresponding to the largest eigenvalue, and the last row the
eigenvector corresponding to the smallest eigenvalue.
Suppose that A is a transformation matrix that maps the vectors x' s into vectors
y' s
by using the
following transformation
y A( x m x )
The above transform is called the Karhunen-Loeve or Hotelling transform. The mean of the
vectors resulting from the above transformation is zero (try to prove that)
my 0
and C y is a diagonal matrix whose elements along the main diagonal are the eigenvalues of C x (try to
prove that)
Cy
The off-diagonal elements of the covariance matrix are 0 , so the elements of the
vectors are
uncorrelated.
Lets try to reconstruct any of the original vectors x from its corresponding
of A are orthonormal vectors (why?), then A 1 AT , and any vector x can by recovered from its
corresponding vector
x A y mx
Suppose that instead of using all the eigenvectors of C x we form matrix A K from the K
eigenvectors corresponding to the K largest Eigen values, yielding a transformation matrix of order
K n . The
vectors would then be K dimensional, and the reconstruction of any of the original
is
The mean square error between the perfect reconstruction x and the approximate reconstruction x
j 1
j 1
j K 1
ems j j j .
By using A K instead of A for the KL transform we achieve compression of the available data.
is a vector obtained by
lexicographic ordering of the pixels f ( x, y ) within a block of size M M (placing the rows of the
block sequentially).
The mean vector of the random field inside the block is a scalar that is estimated by the
approximate relationship
2
1 M
m f 2 f (k )
M k 1
and the covariance matrix of the 2-D random field inside the block is C f where
2
1 M
cii 2 f (k ) f (k ) m 2f
M k 1
and
1
cij c i j 2
M
M2
f (k ) f (k i j ) m
k 1
2
f
After knowing how to calculate the matrix C f , the KLT for the case of a single realisation is the same as
described above.
Its basis functions depend on the covariance matrix of the image, and hence they have to recomputed
and transmitted for every image.
Perfect decorrelation is not possible, since images can rarely be modeled as realizations of ergodic
fields.
Where
an=(3N2/4(N2-1))1/2
bn=((N2-1)/ 4(N2-1))1/2
i=N/2,
sequency=2
.The shape of
and k is in a range of
When
; when
, the Haar
function is defined as
From the above equation, one can see that p determines the amplitude and width of the nonzero part of the function, while q determines the position of the non-zero part of the Haar function.
2.9.1. Haar Matrix
The discrete Haar functions formed the basis of the Haar matrix H
Where
and
, where
is an
matrix and
is a
matrix, is
expressed as
When
Where
is a
matrix, and
is a Haar function.
, i.e.
An un-normalized 8-point Haar matrix
From the definition of the Haar matrix H, one can observe that, unlike the Fourier transform, H
matrix has only real element (i.e., 1, -1 or 0) and is non-symmetric.
2.10. THE DISCRETE SINE TRANSFORM (DST)
This is a transform that is similar to the Fourier transform in the sense that the new independent
variable represents again frequency. The DST is defined below.
The DST is a real transform. This property makes it attractive in comparison to the Fourier transform.
The DST has excellent energy compaction properties. For that reason it is widely used in image
compression standards (as for example JPEG standards).
There are fast algorithms to compute the DST, similar to the FFT for computing the DFT.