Sei sulla pagina 1di 15

U

N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Neural maps in remote sensing image analysis
Thomas Villmann
a,
*
, Erzsebet Merenyi
b
, Barbara Hammer
c
a
Klinik fur Psychotherapie, Universitat Leipzig, Karl-Tauchnitz-Str. 25, 04107 Leipzig, Germany
b
Department of Electrical and Computer Engineering, Rice University, 6100 Main Street, MS 380 Houston, TX, USA
c
Department of Mathematics/Computer Science, University of Osnabruck, Albrechtstrae 28, 49069 Osnabruck, Germany
Abstract
We study the application of self-organizing maps (SOMs) for the analyses of remote sensing spectral images. Advanced airborne and
satellite-based imaging spectrometers produce very high-dimensional spectral signatures that provide key information to many scientic
investigations about the surface and atmosphere of Earth and other planets. These new, sophisticated data demand new and advanced
approaches to cluster detection, visualization, and supervised classication. In this article we concentrate on the issue of faithful topological
mapping in order to avoid false interpretations of cluster maps created by an SOM. We describe several new extensions of the standard SOM,
developed in the past few years: the growing SOM, magnication control, and generalized relevance learning vector quantization, and
demonstrate their effect on both low-dimensional traditional multi-spectral imagery and ,200-dimensional hyperspectral imagery.
q 2003 Published by Elsevier Science Ltd.
Keywords: Self-organizing map; Remote sensing; Generalized relevance learning vector quantization
1. Introduction
In geophysics, geology, astronomy, as well as in many
environmental applications air- and satellite-borne remote
sensing spectral imaging has become one of the most
advanced tools for collecting vital information about the
surface of Earth and other planets. Spectral images consist
of an array of multi-dimensional vectors each of which is
assigned to a particular spatial area, reecting the response
of a spectral sensor for that area at various wavelengths
(Fig. 1). Classication of intricate, high-dimensional
spectral signatures has turned out to be far from trivial.
Discrimination among many surface cover classes, dis-
covery of spatially small, interesting spectral species proved
to be an insurmountable challenge to many traditional
clustering and classication methods. By customary
measures (such as, for example, principal component
analysis (PCA) the intrinsic spectral dimensionality of
hyperspectral images appears to be surprisingly low, 510
at most. Yet dimensionality reduction to such low numbers
has not been successful in terms of preservation of
important class distinctions. The spectral bands, many of
which are highly correlated, may lie on a low-dimensional
but non-linear manifold, which is a scenario that eludes
many classical approaches. In addition, these data comprise
enormous volumes and are frequently noisy. This motivates
research into advanced and novel approaches for remote
sensing image analysis and, in particular, neural networks
(Merenyi, 1999). Generally, articial neural networks
attempt to replicate the computational power of biological
neural networks, with the following important (incomplete
list of) features:
adaptivitythe ability to change the internal represen-
tation (connection weights, network structure) if new
data or information are available;
robustnesshandling of missing, noisy ore otherwise
confused data;
power/speedhandling of large data volumes in accep-
table time due to inherent parallelism;
non-linearitythe ability to represent non-linear func-
tions or mappings.
Exactly these properties make the application of neural
networks interesting in remote sensing image analysis. In
the present contribution, we will concentrate on a special
neural network type, neural maps. Neural maps constitute an
important neural network paradigm. In brains, neural maps
occur in all sensory modalities as well as in motor areas. In
technical contexts, neural maps are utilized in the fashion of
neighborhood preserving vector quantizers. In both cases
0893-6080/03/$ - see front matter q 2003 Published by Elsevier Science Ltd.
doi:10.1016/S0893-6080(03)00021-2
Neural Networks xx (0000) 115
www.elsevier.com/locate/neunet
*
Corresponding author. Tel.: -49-341-9718868; fax: -49-341-
2131257.
E-mail address: villmann@informatik.uni-leipzig.de (T. Villmann).
NN 17201/3/200313:50SUREKHA64643 MODEL 5
ARTICLE IN PRESS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
these networks project data from some possibly high-
dimensional input space onto a position in some output
space. In the area of spectral image analysis we apply neural
maps for clustering, data dimensionality reduction, learning
of relevant data dimensions and visualization. For this
purpose we highlight several useful extensions of the basic
neural map approaches.
This paper is structured as follows. In Section 2 we
briey introduce the problems in remote sensing spectral
image analysis. In Section 3 we give the basics of neural
maps and vector quantization together with a short review of
new extensions: structure adaptation and magnication
control (Section 3.2), and learning of proper metrics or
relevance learning (Section 3.3). In Section 4 we present
applications of the reviewed methods to multi-spectral
LANDSAT TM imagery as well as very high-dimensional
hyperspectral AVIRIS imagery.
2. Remote sensing spectral imaging
As mentioned earlier air- and satellite-borne remote
sensing spectral images consist of an array of multi-
dimensional vectors assigned to particular spatial regions
(pixel locations) reecting the response of a spectral sensor
at various wavelengths. These vectors are called spectra. A
spectrum is a characteristic pattern that provides a clue to
the surface material within the respective surface element.
The utilization of these spectra includes areas such as
mineral exploration, land use, forestry, ecosystem manage-
ment, assessment of natural hazards, water resources,
environmental contamination, biomass and productivity;
and many other activities of economic signicance, as well
as prime scientic pursuits such as looking for possible
sources of past or present life on other planets. The number
of applications has dramatically increased in the past 10
years with the advent of imaging spectrometers such as
AVIRIS of NASA/JPL, that greatly surpass the traditional
multi-spectral sensors (e.g. LANDSAT thematic mapper
(TM).
Imaging spectrometers can resolve the known, unique,
discriminating spectral features of minerals, soils, rocks,
and vegetation. While a multi-spectral sensor samples a
given wavelength window (typically the 0.4 2 2.5 mm
range in the case of visible and near-infrared imaging)
with several broad bandpasses, leaving large gaps
between the bands, imaging spectrometers sample a
spectral window contiguously with very narrow,
10 2 20 nm badpasses (Fig. 1) (Richards & Jia, 1999).
Hyperspectral technology is in great demand because
direct identication of surface compounds is possible
Fig. 1. Comparison of the spectral resolution of the AVIRIS hyperspectral
imager and the LANDSAT TM imager in the visible and near-infrared
spectral range. AVIRIS represents a quasi-continuous spectral sampling
whereas LANDSAT TM has coarse spectral resolution with large gaps
between the band passes. Additionally, spectral reectance charateristics of
common earth surface materials is shown: 1, water; 2, vegetation; 3, soil
(from Richards and Jia (1999)).
Fig. 2. Left: The concept of hyperspectral imaging. Figure from Campbell (1996). Right: The spectral signature of the mineral alunite as seen through the six
broad bands of Landsat TM, as seen by the moderate spectral resolution sensor MODIS (20 bands in this region), and as measured in laboratory. Figure from
Clark (1999). Hyperspectral sensors such as AVIRIS of NASA/JPL (Green, 1996) produce spectral details comparable to laboratory measurements.
NN 17201/3/200313:50SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 2
ARTICLE IN PRESS
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
without prior eld work, for materials with
known spectral signatures. Depending on the wavelength
resolution and the width of the wavelength window used
by a particular sensor, the dimensionality of the spectra
can be as low as 56 (such as in LANDSAT TM), or as
high as several hundred for hyperspectral imagers
(Green, 1996).
Spectral images can formally be described as a matrix
S = v
(x;y)
; where v
(x;y)
[ R
D
V
is the vector of spectral
information associated with pixel location (x; y): The
elements v
(x;y)
i
; i = 1D
V
of spectrum v
(x;y)
reect the
responses of a spectral sensor at a suite of wavelengths
(Fig. 2; Campbell, 1996; Clark, 1999). The spectrum is a
characteristic ngerprint pattern that identies the surface
material within the area dened by pixel (x; y): The
individual two-dimensional image S
i
= v
(x;y)
i
at wave-
length i is called the ith image band. The data space V
spanned by visiblenear infrared reectance spectra is
[0 2noise; U - noise]
D
V
# R
D
V
where U . 0 represents
an upper limit of the measured scaled reectivity and
noise is the maximum value of noise across all spectral
channels and image pixels. The data density P(V) may
vary strongly within this space. Sections of the data space
can be very densely populated while other parts may be
extremely sparse, depending on the materials in the scene
and on the spectral bandpasses of the sensor. According to
this model traditional multi-spectral imagery has a low
D
V
value while D
V
can be several hundred for
hyperspectral images. The latter case is of particular
interest because the great spectral detail, complexity, and
very large data volume pose new challenges in clustering,
cluster visualization, and classication of images with
such high spectral dimensionality (Merenyi, 1998).
In addition to dimensionality and volume, other factors,
specic to remote sensing, can make the analyses of
hyperspectral images even harder. For example, given the
richness of data, the goal is to separate many cover classes,
however, surface materials that are signicantly different for
an application may be distinguished by very subtle
differences in their spectral patterns. The pixels can be
mixed, which means that several different materials may
contribute to the spectral signature associated with one
pixel. Training data may be scarce for some classes, and
classes may be represented very unevenly. All these
difculties motivate research into advanced and novel
approaches (Merenyi, 1999).
Noise is far less problematic than the intricacy of the
spectral patterns, because of the high signal-to-noise ratios
(5001500) that present-day hyperspectral imagers provide.
For this discussion, we will omit noise issues, and additional
effects such as atmospheric distortions, illumination geo-
metry and albedo variations in the scene, because these can
be addressed through well-established procedures prior to
clustering or classication.
3. Neural maps and vector quantization
3.1. Basic concepts and notations
Neural maps as tools for (unsupervised) topographic
vector quantization algorithms map data vectors v sampled
from a data manifold V# R
D
V
onto a set Aof neurons r
C
V!A
: V!A: (3:1)
Associated with each neuron is a pointer (weight vector)
w
r
[ R
D
V
specifying the conguration of the neural map
W = {w
r
}
r[A
: The task of vector quantization is realized
by the map C as a winner-take-all rule, i.e. a stimulus
vector v [ V is mapped onto that neuron s [ A the
pointer w
s
of which is closest to the presented stimulus
vector v;
C
V!A
: v - s(v) = argmin
r[A
d(v; w
r
): (3:2)
where d(v; w
r
) is the Euclidean distance. The neuron s is
referred to as the winner or best-matching unit. The
reverse mapping is dened as C
A!V
: r - w
r
: These two
mappings together determine the neural map
M= (C
V!A
; C
A!V
); (3:3)
realized by the network. The subset of the input space
V
r
= {v [ V: r = C
V!A
(v)}; (3:4)
which is mapped to a particular neuron r according to Eq.
(3.2), forms the (masked) V receptive eld of that neuron.
If the intersection of two masked receptive elds V
r
; V
r
/
is non-vanishing we call V
r
and V
r
/ neighbored. The
neighborhood relations form a corresponding graph
structure G
V
in A: two neurons are connected in G
V
if and only if their masked receptive elds are neighbored.
The graph G
V
is called the induced Delaunay-graph (See,
for example, Martinetz and Schulten (1994) for detailed
denitions). Due to the bijective relation between neurons
and weight vectors, G
V
also represents the Delaunay
graph of the weights (Martinetz & Schulten, 1994).
In the most widely used neural map algorithm, the self-
organizing map (SOM) (Kohonen, 1995), the neurons are
arranged a priori on a xed grid A.
1
Other algorithms such
as the topology representing network (TRN), which is based
on the neural gas algorithm (NG) (Martinetz, Berkovich, &
Schulten, 1993), do not specify the topological relations
among neurons, but construct a neighborhood structure in
the output space during learning (Martinetz & Schulten,
1994).
In any of these algorithms the pointers are adapted
such that the input data are matched appropriately. For
this purpose a random sequence of data points v [ V
from the stimuli distribution P(v) is presented to the
1
Usually the grid Ais chosen as a d
A
-dimensional hypercube and then
one has i = (i
1
; ; i
d
A
): Yet, other ordered arrangements are also
admissible.
NN 17201/3/200313:52SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 3
ARTICLE IN PRESS
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
map, the winning neuron s is determined according to
Eq. (3.2), and the pointer w
s
is shifted toward v:
Interactions among the neurons are included via a
neighborhood function determining to what extent the
units that are close to the winner in the output space
participate in the learning step
Dw
r
= eh
l
(r; s; v)(v 2w
r
): (3:5)
h
l
(r; s; v) is usually chosen to be of Gaussian shape
h
l
(r; s; v) = exp 2
(d
A
(r; s(v)))
2
l
!
; (3:6)
where d
A
(r; s(v)) is a distance measure on the set A.
We emphasize that h
l
(r; s; v) implicitly depends on the
whole set W of weight vectors via the winner
determination of s according to Eq. (3.2). In SOMs it
is evaluated in the output space A: d
SOM
A
(r; s(v)) = kr 2
s(v)k
A
whereas for the NG we have d
NG
A
(r; s(v)) = k
r
(v;
W): k
r
(v; W) gives the number of pointers w
r
/ for which
the relation d(v; w
r
/ ) # d(v; w
r
) is valid (Martinetz et al.,
1993), i.e. d
NG
A
is evaluated in the input space (Here, d(; )
is the Euclidean distance.). In the SOM usually l = 2s
2
is used.
The particular denitions of the distance measures in
the two models cause further differences. In Martinetz
et al. (1993) the convergence rate of the neural gas
network was shown to be faster than that of the SOM.
Furthermore, it was proven that the adaptation rule for
the weight vectors follows, on average, a potential
dynamic. In contrast, a global energy function does not
exist in the SOM algorithm (Erwin, Obermayer, &
Schulten, 1992).
One important extension of the basic concepts of neural
maps concerns the so-called magnication. The standard
SOM distributes the pointers according to the input
distribution
P(W) , P(V)
a
; (3:7)
with the magnication factor a
SOM
=
2
3
(Ritter &
Schulten, 1986; Kohonen, 1999).
2,3
For the NG one
nds, by analytical considerations, that for small l the
magnication factor a
NG
= d=(d - 2) which only depends
on the dimensionality of the input space embedded in
R
D
V
; i.e. the result is valid for all dimensions (Martinetz
et al., 1993).
Topology preservation or topographic mapping in
neural maps is dened as the preservation of the
continuity of the mapping from the input space onto
the output space, more precisely it is equivalent to the
continuity of M between the topological spaces with
properly chosen metric in both A and V: For lack of
space we refer to Villmann, Der, and Herrmann (1997) for
detailed considerations. For the TRN the metric in A is
dened according to the given data structure whereas in
case of the SOM violations of topology preservation may
occur if the structure of the output space does not match
the topological structure of V In SOMs the topology
preserving property can be used for immediate evaluations
of the resulting map, for instance by interpretation as a
color space, as demonstrated in Section 4.2.2. Topology
preservation also allows the applications of interpolating
schemes such as the parametrized SOM (PSOM) (Ritter,
1993) or interpolating SOM (I-SOM) (Goppert &
Rosenstiel, 1997). A higher degree of topology preser-
vation, in general, improves the accuracy of the map
(Bauer & Pawelzik, 1992).
As pointed out in Villmann et al. (1997) violations of
topographic mapping in SOMs can result in false
interpretations. Several approaches were developed to
judge the degree of topology preservation for a given
map. Here we briey describe a variant
~
P of the well
known topographic product P (Bauer & Pawelzik, 1992).
Instead of the Euclidean distances between the
weight vectors,
~
P uses the respective distances d
G
V
(w
r
;
w
r
/ ) of minimal path lengths in the induced Delaunay-
graph G
V
of the w
r
: During the computation of
~
P for
each node r the sequences n
A
j
(r) of jth neighbors of r in
A; and n
V
j
(r) describing the jth neighbor of w
r
in V;
have to be determined. These sequences and further
averaging over neighborhood orders j and nodes r nally
lead to
~
P =
1
N(N 21)
X
r
X
N21
j=1
1
2j
log(Q); (3:8)
with
Q = P
j
l=1
d
G
V
(w
r
; w
n
A
l
(r)
)
d
G
V
(w
r
; w
n
V
l
(r)
)
d
A
(r; n
A
l
(r))
d
A
(r; n
V
l
(r))
; (3:9)
and d
A
(r; r
/
) is the distance between the neuron positions
r and r
/
in the lattice A:
~
P can take on positive or
negative values with similar meaning as for the original
topographic product P : if
~
P , 0 holds the output space
is too low-dimensional, and for
~
P . 0 the output space is
too high-dimensional. In both cases neighborhood
relations are violated. Only for
~
P < 0 does the output
space approximately match the topology of the input
data. The present variant
~
P overcomes the problem of
strongly curved maps which may be judged neighbor-
hood violating by the original P even though the shape
of the map might be perfectly justied (Villmann et al.,
1997).
2
This result is valid for the one-dimensional case and higher-dimensional
ones which separate.
3
The notation magnication factor is, strictly speaking, not correct.
Some times it is called (mathematically exact) magnication exponent.
However, the notation magnication factor is commonly used (van Hulle,
2000). Therefore we use this notation here.
NN 17201/3/200313:52SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 4
ARTICLE IN PRESS
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
3.2. Magnication control and structure adaptation
3.2.1. Magnication control
The rst approach to inuence the magnication of a
vector quantizer, proposed in DeSieno (1988), is called
the mechanism of conscience yielding approximately the
same winning probability for each neuron. Hence, the
approach yields density matching between input and
output space which corresponds to a magnication factor
of 1. Bauer, Der, and Herrmann (1996) suggested a more
general scheme for the SOM to achieve an arbitrary
magnication. They introduced a local learning parameter
e
s
for each neuron r with ke
s
l P(w
r
)
m
in Eq. (3.5),
where m is an additional control parameter. Eq. (3.5) now
reads as
Dw
r
= e
s
h
l
(r; s; v)(v 2w
r
): (3:10)
Note that the learning factor e
s
of the winning neuron s is
applied to all updates. This local learning leads to a
similar relation as in Eq. (3.7)
P(W) , P(V)
a
/
; (3:11)
with a
/
= a(m - 1) and allows a magnication control
through the choice of m: In particular, one can achieve a
resolution of a
/
= 1; which maximizes mutual information
(Linsker, 1989; Villmann & Herrmann, 1998). Application
of this approach to the derivation of the dynamic of TRN
yields exactly the same result (Villmann, 2000).
3.2.2. Structure adaptation
For both neural map types, the SOM and the TRN,
growing variants for structure adaptation are known.
In the growing SOM (GSOM) approach the output A
is a hypercube in which both the dimension and the
respective length ratios are adapted during the learning
procedure in addition to the weight vector learning, i.e.
the overall dimensionality and the dimensions along the
individual directions change in the hypercube. The GSOM
starts from an initial 2-neuron chain, learns according the
regular SOM-algorithm, adds neurons to the output space
according to a certain criterion to be described later,
learns again, adds again, etc. until a prespecied
maximum number N
max
of neurons is distributed. During
this procedure, the output space topology remains of the
form n
1
n
2
; with n
j
= 1 for j . D
A
; where D
A
is
the current dimensionality of A:
4
From there it can grow
either by adding nodes in one of the directions which are
already spanned by the output space or by initialising a
new dimension. This decision is made on the basis of the
receptive elds V
r
: When reconstructing v [ V from
neuron r; an error u = v 2w
r
remains decomposed along
the different directions, which results from projecting back
the output grid A into the input space V
u = v 2w
r
=
X
D
A
i=1
a
i
(v)
w
r-e
i
2w
r2e
i
kw
r-e
i
2w
r2e
i
k
- v
/
: (3:12)
Here, e
i
denotes the unit vector in direction i of A; w
r-e
i
and w
r2e
i
are the weight vectors of the neighbors of the
neuron r in the ith direction of the lattice A:
5
Considering
a receptive eld V
r
and determining the rst principle
component v
PCA
of V
r
allows a further decomposition of
v
/
: Projection of v
/
onto the direction of v
PCA
then yields
a
D
A
-1
(v);
v
/
= a
D
A
-1
(v)
v
PCA
kv
PCA
k
- v
//
: (3:13)
The criterion for the growing now is to add nodes in that
direction which has, on average, the largest of the
expected (normalized) error amplitudes ~ a
i
~ a
i
=

n
i
n
i
- 1
r
X
v[V
r
la
i
(v)l

P
D
A
-1
j=1
a
2
j
(v)
q ;
i = 1; ; D
A
- 1:
(3:14)
After each growth step, a new learning phase has to take
place, in order to readjust the map. For a detailed study of
the algorithm we refer to Bauer and Villmann (1997).
The growing TRN adapts the number of neurons in TRN
whereby the structure between them is re-arranged accord-
ing to the TRN denition (Fritzke, 1995). Hence, it is
capable of representing the data space in a topologically
faithful manner with increasing accuracy and it also realizes
a structure adaptation.
3.3. Relevance learning
As mentioned earlier neural maps are unsupervised
learning algorithms. In case of labeled data one can apply
supervised classication schemes for vector quantization
(VQ). Assuming that the data are labeled, an automatic
clustering can be learned via attaching maps to the SOM or
adding a supervised component to VQ to achieve learning
vector quantization (LVQ) (Kohonen, Lappalainen, &
Saljarvi, 1997; Meyering & Ritter, 1992). Various modi-
cations of LVQ exist which ensure faster convergence, a
better adaptation of the receptive elds to optimum
Bayesian decision, or an adaptation for complex data
4
Hence, the initial conguration is 2 1 1 ; D
A
= 1:
5
At the border of the output space grid, where not two, but just one
neighboring neuron is available, we use
w
r
2w
r2e
i
kw
r
2w
r2e
i
k
or
w
r-e
i
2w
r
kw
r2e
i
2w
r
k
to compute the backprojection of the output space direction e
i
into the input
space.
NN 17201/3/200313:52SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 5
ARTICLE IN PRESS
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
structures, to mention just a few (Kohonen et al., 1997; Sato
& Yamada, 1995; Somervuo & Kohonen, 1999).
A common feature of unsupervised algorithms and LVQ
is the information provided by the distance structure in V;
which is determined by the chosen metric. Learning heavily
relies on this feature and therefore crucially depends on how
appropriate the chosen metric is for the respective learning
task. Several methods have been proposed which adapt the
metric during training (Kaski, 1998; Kaski & Sinkkonen,
1999; Pregenzer, Pfurtscheller, & Flotzinger, 1996).
Here we focus on metric adaptation for LVQ since the
LVQ combines the elegance of simple and intuitive updates
in unsupervised algorithms with the accuracy of supervised
methods. Let c
v
[ L be the label of input v; L a set of
labels (classes) with #L= N
L
: LVQ uses a xed number
of codebook vectors (weight vectors) for each class. Let
W = {w
r
} be the set of all codebook vectors and c
r
be the
class label of w
r
: Furthermore, let W
c
= {w
r
lc
r
= c} be the
subset of weights assigned to class c [ L:
The training algorithm adapts the codebook vectors
such that for each class c [ L; the corresponding
codebook vectors W
c
represent the class as accurately as
possible. This means that the set of points in any given class
V
c
= {v [ Vlc
v
= c}; and the union U
c
=
S
rl
wr[Wc
V
r
of
receptive elds of the corresponding codebook vectors
should differ as little as possible. For a given data point v
with class label c
v
let m(v) denote some function which is
negative if v is classied correctly, i.e. it belongs to a
receptive eld V
r
with c
r
= c
v
; and which is positive if v is
classied incorrectly, i.e. it belongs to a receptive eld V
r
with c
r
c
v
: Let f : R !R be some monotonically
increasing function. The general scheme of GLVQ aims to
minimize the cost function
Cost =
X
v
f (m(v)); (3:15)
via stochastic gradient descent. Different choices of f and
m(v) yield LVQ or LVQ2.1 as introduced in Kohonen et al.
(1997), respectively, as shown in Sato and Yamada (1995).
Denote by d the squared Euclidean distance of v to the
nearest codebook vector. The choice of f as the sigmoidal
function and
m(v) =
d
r
-
2d
r
2
d
r
-
- d
r
2
; (3:16)
where d
r
-
is the squared Euclidean distance of the input
vector v to the nearest codebook vector labeled with c
r
-
=
c
v
; say w
r
-
; and d
r
2
is the squared Euclidean distance to the
best matching codebook vector but labeled with c
r
2
c
r
-
;
say w
r
2
; yields a powerful and noise tolerant behavior since
it combines adaptation near the optimum Bayesian borders
similarly as in LVQ2.1, with prohibiting the possible
divergence of LVQ2.1 (as reported in Sato and Yamada
(1995)). We refer to this modication as the GLVQ. The
respective learning dynamic is obtained by differentiation of
the cost function (3.15) (Sato & Yamada, 1995)
Dw
r
-
= e
4f
/
l
m(v)
d
r
2
(d
r
-
- d
r
2
)
2
(v 2w
r
-
) and
Dw
r
2
= 2e
4f
/
l
m(v)
d
r
-
(d
r
-
- d
r
2
)
2
(v 2w
r
2
):
(3:17)
To result a metric adaptation we now introduce input
weights l = (l
1
; ; l
D
V
); l
i
$ 0; klk = 1 in order to allow
a different scaling of the input dimensions: substituting the
squared Euclidean metric by its scaled variant
d
l
(x; y) =
X
D
V
i=1
l
i
(x
i
2y
i
)
2
; (3:18)
the receptive eld of codebook vector w
r
becomes
V
l
r
= {v [ V: r = C
l
V!A
(v)}; (3:19)
with
C
l
V!A
: v - s(v) = argmin
r[A
d
l
(v; w
r
); (3:20)
as mapping rule. Replacing V
r
by V
l
r
and d by d
l
in the cost
function Cost in Eq. (3.15) yields a variable weighting of
the input dimensions and hence an adaptive metric. Now
Eq. (3.17) reads as
Dw
r
-
= e
4f
/
l
m(v)
d
l
r
2
(d
l
r
-
- d
l
r
2
)
2
L(v 2w
r
-
) and
Dw
r
2
= 2e
4f
/
l
m(v)
d
l
r
-
(d
l
r
-
- d
l
r
2
)
2
L(v 2w
r
2
);
(3:21)
with L being the diagonal matrix with entries l
1
; ; l
D
V
:
Appropriate weighting factors l can also be determined
automatically via a stochastic gradient descent. In this way
the GLVQ-learning rule (3.17) is augmented as follows
Dl
k
=2e
1
2f
/
l
m(v)
d
l
r
2
(d
l
r
-
- d
l
r
2
)
2
(v 2w
r
-
)
2
k

2
d
l
r
-
(d
l
r
-
- d
l
r
2
)
2
(v 2w
r
2
)
2
k
!
; (3:22)
with e
1
[ (0; 1) and k = 1D
V
; followed by normalization
to obtain klk = 1: We term this generalization of the
relevance learning vector quantization (RLVQ) (Bojer,
Hammer, Schunk, & von Toschanowitz, 2001) and GLVQ
generalized relevance learning vector quantization
(GRLVQ) as introduced in Hammer and Villmann (2001a,
2002).
Obviously, the same idea could be applied to any
gradient dynamics. We could, for example, minimize a
different error function such as the KullbackLeibler
divergence of the distribution which is to be learned and
the distribution which is implemented by the vector
quantizer. This approach is not limited to supervised
tasks. Unsupervised methods such as the NG (Martinetz
NN 17201/3/200313:52SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 6
ARTICLE IN PRESS
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
et al., 1993), which obey a gradient dynamics, could be
improved by application of weighting factors in order to
obtain an adaptive metric. Furthermore, in unsupervised
topographic mapping models like the SOM or NG one
can try to learn the relevance of input dimensions subject to
the requirement of topology preservation (Hammer &
Villmann, 2001a,b).
3.4. Further approaches
Beside the above introduced methods further approaches
of neural inspired algorithms are well known. In this context
we have to refer to the family of fuzzy-clustering algorithms
as for instance Fuzzy-SOM (Bezdek & Pal, 1993; Graepel,
Burger, & Obermayer, 1998; Hasenjager, Ritter, &
Obermayer, 1999), Fuzzy-c-means (Bezdek, Pal, Hathaway,
& Karayiannis, 1995; Gath & Geva, 1989) or other (neural)
vector quantizers based on minimization of an energy
function (Bishop, Svensen, & Williams, 1998; Buhmann &
Kuhnel, 1993; Linsker, 1989). For an overview of respective
approaches we refer to Haykin (1994) (neural network
approaches), Duda and Hart, (1973), and Schurmann (1996)
(pattern classication).
4. Application in remote sensing image analysis
4.1. Low-dimensional data: LANDSAT TM multi-spectral
images
4.1.1. Image description
LANDSAT TM satellite-based sensors produce images
of the Earth in seven different spectral bands. The ground
resolution is 30 30 m
2
for bands 15 and band 7. Band 6
(thermal band) is often dropped from analyses because of
the lower spatial resolution. The LANDSAT TM bands
were strategically determined for optimal detection and
discrimination of vegetation, water, rock formations and
cultural features within the limits of broad band multi-
spectral imaging. The spectral information, associated with
each pixel of a LANDSAT scene is represented by a vector
v [ V# R
D
V
with D
V
= 6: The aim of any classication
algorithm is to subdivide this data space into subsets of data
points, with each subset corresponding to specic surface
covers such as forest, industrial region, etc. The feature
categories are specied by prototype data vectors (training
spectra).
In the present contribution we consider a LANDSAT TM
image from the Colorado area, USA.
6
In addition we have a
manually generated label map (ground truth image) for the
assessment of classication accuracy. All pixels of the
Colorado image have associated ground truth labels,
categorized into 14 classes. The classes include several
regions of different vegetation and soil, and can be used for
supervised learning schemes like the GRLVQ (Table 1).
4.1.2. Analyses of LANDSAT TM imagery
A initial GrassbergerProcaccia analysis (Grassberger &
Procaccia, 1983) yields D
GP
< 3:1414 for the intrinsic
dimensionality (ID) of the Colorado image.
SOM analysis. The GSOM generates a 12 7 3 lattice
(N
max
= 256) in agreement with the Grassberger Procaccia
analysis (D
GP
< 3:1414), which corresponds to a
~
P-value
of 0.0095 indicating good topology preservation (Fig. 3).
One way to visualize the GSOM clusters of LANDSAT
TM data is to use a SOM dimension D
A
= 3 (Gross &
Seibert, 1993) and interpret the positions of the neurons r in
the lattice A as vectors c(r) = (r; g; b) in the RGB color
space, where r; g; b are the intensities of the colors red,
green and blue, respectively, computed as c
l
(r) = [(r
l
2
1)=(n
l
21)]255; for l = 1; 2; 3 (Gross & Seibert, 1993).
Table 1
Table of labels and classes occurring in the LANDSAT TM Colorado
image
Class label Ground cover type
a Scotch pine
b Douglas r
c Pine/r
d Mixed pine forest
e Supple/prickle pine
f Aspen/mixed pine forest
g Without vegetation
h Aspen
i Water
j Moist meadow
k Bush land
l Grass/pastureland
m Dry meadow
n Alpine vegetation
o Misclassication
Fig. 3. Left: the reference classication for the Landsat TM Colorado
image, provided by M. Augusteijn. Classed are keyed as shown on the color
wedge on the left, and the corresponding surface units are described in
Table 1. Right: cluster map of the Colorado image derived from all six
bands by the 12 7 3 GSOM-solution. Note that, due to the visualization
scheme, which uses a continuous scale of RGB colors described in the text,
this map does not show hard cluster boundaries. More similar colors
indicate more similar spectral classes.
6
Thanks to M. Augusteijn (Univerity of Colorado) for providing this
image.
NN 17201/3/200313:52SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 7
ARTICLE IN PRESS
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Such assignment of colors to winner neurons immediately
yields a pseudo-color cluster map of the original image for
visual interpretation. Since we are mapping the data clouds
from a six-dimensional input space onto a three-dimen-
sional color space dimensional conicts may arise and the
visual interpretation may fail. However, in the case of
topologically faithful mapping this color representation,
prepared using all six LANDSAT TM image bands,
contains considerably more information than a customary
color composite combining three TM bands (frequently
bands 2, 3, and 3). (Gross & Seibert, 1993).
Application of GRLVQ. We trained the GRLVQ with 42
codebook vectors (three for each class) on 5% of the data set
until convergence. The algorithm converged in , 10 cycles
for e = 0:1 and e
1
= 0:01: The GRLVQ produced a
classication accuracy of about 90% on the training set as
well as on the entire data set. The weighting factors resulting
from GRLVQ analysis provide a ranking of the data
dimensions
l 5(0:1; 0:17; 0:27; 0:21; 0:26; 0:0): (4:1)
Clearly, dimension 6 is ranked as least important with
weighting factor close to 0. Dimension 1 is the second least
important followed by dimensions 2 and 4. Dimensions 3
and 5 are of similarly high importance according to this
ranking. If we prune dimensions 6, 1, and 2, an accuracy of
84% can still be achieved. Pruning dimension 4 brings the
classication accuracy down to 50%, which is a very
signicant loss of information. This indicates that the
intrinsic data dimension may not be as low as 2. These
classication results are shown in Fig. 4 with misclassied
pixels colored black and classes keyed by the color wedge.
Use of the scaled metric (3.18) during training (which is
equivalent to having a new input space V
l
) of a GSOM
results in a two-dimensional lattice structure the size of
which is 11 10. This is in agreement with a corresponding
Grassberger Procaccia estimate of the intrinsic data
dimension as D
GP
l
< 2:261 when this metric is applied.
However, as we saw above and in Fig. 4, reducing the
dimensionality of this Landsat TM image to 2 is not as
simple as discarding the four input image planes that
received the lowest weightings from the GRLVQ. Both the
GrassbergerProcaccia estimate and the GSOM suggest the
intrinsic dimension of 2 for V
l
whereas the dimension
suggested by GRLVQ is at least 4. Hence, the (scaled) data
lie on a two-dimensional submanifold in V
l
: The
distribution of weights for the two-dimensional lattice
structure is visualized in Fig. 5 for each original (but scaled)
input dimension. All input dimensions, except the fourth,
seem to be correlated. The fourth dimension shows a clearly
different distribution which causes a two-dimensional
representation. The corresponding clustering, visualized in
the two-dimensional (r; g; 0) color space is depicted in Fig.
4. Based on visual inspection this cluster structure compares
better with the reference classication in Fig. 4, upper left,
than the GSOM clustering in Fig. 4. The scaling information
provided to the GSOM by GRLVQ based on training labels
seems to have improved the quality of the clustering.
4.2. Hyperspectral data: the lunar crater volcanic eld
AVIRIS image
4.2.1. Image description
A visiblenear infrared (0.42.5 mm), 224-band,
20 m/pixel AVIRIS image of the lunar crater volcanic
Fig. 4. GRLVQ results for the Landsat TM Colorado image. Upper left:
GRLVQ without pruning. Upper right: GRLVQwith pruning of dimensions
1, 2, and 6. Lower left: GRLVQ with pruning dimensions 1, 2, 6, and 4.
Lower right: Clustering of the Colorado image using GSOM with GRLVQ
scaling. For the three supervised classications, the same color wedge and
class labels apply as in Fig. 3, left.
Fig. 5. The distribution of weight values in the two-dimensional 11 10
GSOM lattice for the non-vanishing input dimensions 15 as a result of
GRLVQ-scaling of the Colorado image during GSOM training. Dimension
4 has signicantly different weight distribution from all others, while in the
rest of the dimensions the variations are less dramatic. (Please note that the
scaling is different for each dimension.) For visualization the SOM-
Toolbox provided by the Neural Network Group at Helsinki University of
Technology was used.
NN 17201/3/200313:54SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 8
ARTICLE IN PRESS
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
eld (LCVF), Nevada, USA, was analyzed in order to study
SOM performance for high-dimensional remote sensing
spectral imagery. (AVIRIS is the airborne visible-near
infrared imaging spectrometer, developed at NASA/Jet
Propulsion Laboratory. See http://makalu.jpl.nasa.gov for
details on this sensor and on imaging spectroscopy). The
LCVF is one of NASAs remote sensing test sites, where
images are obtained regularly. A great amount of accumu-
lated ground truth from comprehensive eld studies
(Arvidson et al., 1991) and research results from indepen-
dent earlier work such as Farrand (1991) provide a detailed
basis for the evaluation of the results presented here.
Fig. 6 shows a natural color composite of the LCVF with
labels marking the locations of 23 different surface cover
types of interest. This 10 12 km
2
area contains, among
other materials, volcanic cinder cones (class A, reddest
peaks) and weathered derivatives thereof such as ferric
oxide rich soils (L, M, W), basalt ows of various ages (F,
G, I), a dry lake divided into two halves of sandy (D) and
clayey composition (E); a small rhyolitic outcrop (B); and
some vegetation at the lower left corner (J), and along
washes (C). Alluvial material (H), dry (N,O,P,U) and wet
(Q,R,S,T) playa outwash with sediments of various clay
contents as well as other sediments (V) in depressions of the
mountain slopes, and basalt cobble stones strewn around the
playa (K) form a challenging series of spectral signatures for
pattern recognition (Merenyi, 1998). A long, NWSE
trending scarp, straddled by the label G, borders the
vegetated area. Since this color composite only contains
information from three selected image bands (one red, one
green, and one blue), many of the cover type variations
remain undistinguished. They will become evident in the
cluster and class maps below.
After atmospheric correction and removal of excessively
noisy bands (saturated water bands and overlapping detector
channels), 194 image bands remained from the original 224.
These 194-dimensional spectra are the input patterns in the
following analyses.
The spectral dimensionality of hyperspectral images is
not well understood and it is an area of active research.
While many believe that hyperspectral images are highly
redundant because of band correlations, others maintain an
opposite view. Few investigations exist into the ID of
hyperspectral images. Linear methods such as PCA or
determination of mixture model endmembers (Adams,
Smith, & Gillespie, 1993; R.S. Inc, 1997) usually yield
38 endmembers. Optimally topology preserving maps, a
TRN approach (Bruske & Merenyi, 1999), nds the spectral
ID of the LCVF AVIRIS image to be between 3 and 7
whereas the Grassberger Procaccia estimate (Grassberger
& Procaccia, 1983) for the same image is D
GP
< 3:06:
These surprisingly low numbers, that increase with
improved sensor performance (Green & Boardman, 2000),
result from using statistical thresholds for the determination
of what is relevant, regardless of application dependent
meaning of the data.
The number of relevant components increases dramati-
cally when specic goals are considered such as what cover
classes should be separated. With an associative neural
network, Pendock (1999) extracted 20 linear mixing
Fig. 6. The Lunar Crater Volcanic Field. RGB natural color composite from an AVIRIS, 1994 image. The full hyperspectral image comprises 224 image bands
over the 0.42.5 mm wavelength range, 512 614 pixels, altogether 140 MB of data. Labels indicate 23 different cover types described in the text. The ground
resolution is 20 m/pixel.
NN 17201/3/200313:57SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 9
ARTICLE IN PRESS
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
endmembers from a 50-band (2.02.5 mm) segment of an
AVIRIS image of Cuprite, Nevada (another well known
remote sensing test site), setting only a rather general
surface texture criterium. Benediktsson et al. (1994)
performed feature extraction on an AVIRIS geologic
scene of Iceland, which resulted in 35 bands. They used
an ANN (the same network that performed the classication
itself) for decision boundary feature extraction (DBFE). The
DBFE is claimed to preserve all features that are necessary
to achieve the same accuracy by using all the original data
dimensions, by the same classier for predetermined
classes. However, no comparison of classication accuracy
was made using the full spectral dimension to support the
DBFE claim. In their study a relatively low number of
classes, 9, were of interest, and the goal was to nd the
number of features that describe those classes. Separation of
a higher number of classes may require more features.
It is not clear how feature extraction should be done in
order to preserve relevant information in hyperspectral
images. Selection of 30 bands from our LCVF image by any
of several methods leads to a loss of a number of the
originally determined 23 cover classes. One example is
shown in Fig. 8. Wavelet compression studies on an earlier
image of the the same AVIRIS scene (Moon & Merenyi,
1995) conclude that various schemes and compression rates
affect different spectral classes differently, and none was
found overall better than another, within 2550% com-
pressions (retaining 7550% of the wavelet coefcients). In
a study on simulated, 201-band spectral data, (Benediktsson
et al., 1990) show slight accuracy increase across classi-
cations on 20-, 40-, and 60-band subsets. However, that
study is based on only two vegetation classes, the feature
extraction is a progressive hierarchical subsampling of the
spectral bands, and there is no comparison with using the
full, 201-band case. Comparative studies using full spectral
resolution and many classes are lacking, in general, because
few methods can cope with such high-dimensional data
technically, and the ones that are capable (such as minimum
distance, parallel piped) often perform too poorly to merit
consideration.
Undesirable loss of relevant information can result using
any of these feature extraction approaches. In any case,
nding an optimal feature extraction requires great
preprocessing efforts just to tailor the data to available
tools. An alternative is to develop capabilities to handle the
full spectral information. Analysis of unreduced data is
important for the establishment of benchmarks, exploration
and novelty detection; as well as to allow for the distinction
of signicantly greater number of cover types than from
traditional multi-spectral imagery (such as LANDSAT TM),
according to the purpose of modern imaging spectrometers.
4.2.2. SOM analyses of hyperspectral imagery
A systematic supervised classication study was con-
ducted on the LCVF image (Fig. 6), to simultaneously
assess loss of information due to reduction of spectral
dimensionality, and to compare performances of several
traditional and an SOM-based hybrid ANN classier. The
23 geologically relevant classes indicated in Fig. 6 represent
a great variety of surface covers in terms of spatial extent,
the similarity of spectral signatures (Merenyi, 1998), and the
number of available training samples. The full study,
complete with evaluations of classication accuracies, is
described in Merenyi et al. (2001). Average spectral shapes
of these 23 classes are also shown in Merenyi (1998).
Fig. 7, top panel, shows the best classication, with
92% overall accuracy, produced by an SOM-hybrid ANN
using all 194 spectral bands for input. This ANN rst
learns in an unsupervised mode, during which the input
data are clustered in the hidden SOM layer. In this version
the SOM uses the conscience mechanism of DeSieno
(1988), which forces density matching (i.e. a magni-
cation factor of 1) as pointed out in Section 3.2 and
illustrated for this particular LCVF image and SOM
mapping in Merenyi (2000). After the convergence of the
SOM, the output layer is allowed to learn class labels via
a WidrowHoff learning rule. The preformed clusters in
the SOM greatly aid in accurate and sensitive classi-
cation, by helping prevent the learning of inconsistent
class labels. Detailed description of this classier is given
in several previous scientic studies, which produced
improved interpretation of high-dimensional spectral data
compared to earlier analyses (Howell, Merenyi, &
Lebofsky, 1994; Merenyi, Singer, & Miller, 1996;
Merenyi et al., 1997). Training samples for the supervised
classications were selected based on eld knowledge.
The SOM hidden layer was not evaluated and used for
identication of spectral types (SOM clusters) prior to
training sample determination. Fig. 7 reects the geolo-
gists view of the desirable segmentation.
In order to apply maximum likelihood and other
covariance based classiers, the number of spectral
channels needed to be reduced to 30, since the maximum
number of training spectra that could be identied for shape
all classes was 31. Dimensionality reduction was performed
in several ways, including PCA, equidistant subsampling,
and band selection by a domain expert. Band selection by
domain expert proved most favorable. Fig. 7, bottom panel,
shows the maximum likelihood classication with 30 bands,
which produced 51% accuracy. A number of classes
(notably the ones with subtle spectral differences, such as
N, Q, R, S, T, V, W) were entirely lost. Class K (basalt
cobbles) disappeared from most of the edge of the playa,
and only traces of B (rhyolitic outcrop) remained. Class G
and F were greatly overestimated. Although the ANN
classier produced better results (not shown here) on the
same 30-band reduced data set than the maximum like-
lihood, a marked drop in accuracy (to 75% from 92%)
occurred compared to classication on the full data set. This
emphasizes that accurate mapping of interesting, spatially
small geologic units is possible from full hyperspectral
information and with appropriate tools.
NN 17201/3/200313:58SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 10
ARTICLE IN PRESS
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Fig. 7. Top: SOM-hybrid supervised ANN classication of the LCVF scene, using 194 image bands. The overall classication accuracy is 92% (Merenyi et al.,
2001). Bottom: maximum likelihood classication of the LCVF scene. 30, strategically selected bands were used due to the limited number of training samples
for a number of classes. Considerable loss of class distinction occurred compared to the ANN classication, resulting in an overall accuracy of 51%. bg stands
for background (unclassied pixels).
NN 17201/3/200313:58SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 11
ARTICLE IN PRESS
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Discovery in Hyperspectral Images with a SOM. The
previous section demonstrated the power of the SOM in
helping to discriminate among a large number of pre-
determined surface cover classes with subtle differences in
the spectral patterns, using the full spectral resolution. It is
even more interesting to examine the SOMs performance
for the detection of clusters in high-dimensional data. Fig. 8
displays a 40 40 SOM trained with the conscience
algorithm (DeSieno, 1988). The input data space was the
entire 194-band LCVF image. Groups of neurons, altogether
32, that were found to be sensitized to groups of similar
spectra in the 194-dimensional input data, are indicated by
various colors. The boundaries of these clusters were
determined by a somewhat modied version of the U-
matrix method (Ultsch, 1992). The density matching
property of this variant of the SOM facilitates proper
mapping of spatially small clusters and therefore increases
the possibility of discovery. Examples of discoveries are
discussed later. Areas where no data points (spectra) were
mapped are the grey corners with uniformly high fences,
and are relatively small. The black background in the SOM
lattice shows areas that have not been evaluated for cluster
detection. The spatial locations of the image pixels mapped
onto the groups of neurons in Fig. 8, are shown in the same
colors in Fig. 9. Color coding for clusters that correspond to
classes or subclasses of those in Fig. 7, top, is the same as in
Fig. 7, to show similarities. Colors for additional groups
were added.
The rst observation is the striking correspondence
between the supervised ANN class map in Fig. 7, top panel,
and this clustering: the SOM detected all classes that were
known as meaningful geological units. The discovery of
classes B (rhyolitic outcrop, white), F (young basalt ows,
dark grey and black, some shown in the black ovals), G (a
different basalt, exposed along the scarp, dark blue, one
segment outlined in the white rectangle), K (basalt cobbles,
light blue, one segment shown in the black rectangle), and
other spatially small classes such as the series of playa
deposits (N, O, P, Q, R, S, T) is signicant. This is the
capability we need for sifting through high-volume, high-
information-content data to alert for interesting, novel, or
hard-to-nd units. The second observation is that the SOM
detected more, spatially coherent, clusters than the number
of classes that we trained for in Fig. 7. The SOMs view of
the data is more rened and more precise than that of the
geologists. For example, class A (in Fig. 7) is split here into
a red (peak of cinder cones) and a dark orange (anks of
cinder cones) cluster that make geologic sense. The maroon
cluster to the right of the red and dark orange clusters at the
bottom of the SOM lls in some areas that remained
unclassied (bg) in the ANN class map, in Fig. 7. An
example is the arcuate feature at the base of the cinder cone
in the white oval that apparently contains a material
different enough to merit a separate spectral cluster. This
material lls other areas, also unclassied in Fig. 7,
consistently at the foot of cinder cones (another example
is seen in the large black oval). Evaluation of further
renements are left to the reader. Evidence that the SOM
mapping in Fig. 9 approximates an equiprobabilistic
mapping (that the magnication factor for the SOM in
Fig. 9 is close to 1), using DeSienos algorithm, is presented
in Merenyi (2000).
GSOM analysis of the LCVF hyperspectral image. As
mentioned earlier, earlier investigations yielded an intrinsic
spectral dimensionality of 37 for the LCVF data set
Fig. 9. The clusters from Fig. 8 remapped to the original spatial image, to
show where the different spectral types originated from. The relatively
large, light grey areas correspond to the black, unevaluated parts of the
SOM in Fig. 8. Ovals and rectangles highlight examples of small classes
with subtle spectral differences, discussed in the text. 32 clusters were
detected, and found geologically meaningful, adding to the geologists view
reected in Fig. 7, top panel.
Fig. 8. Clusters identied in a 40 40 SOM. The SOM was trained on the
entire 194-band LCVF image, using the DeSieno algorithm (DeSieno,
1988).
NN 17201/3/200314:05SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 12
ARTICLE IN PRESS
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
(Bruske & Merenyi, 1999). The GrassbergerProcaccia
estimate (Grassberger & Procaccia, 1983) D
GP
A
< 3:06
corroborates the lower end of the above range, suggesting
that the data are highly correlated, and therefore a drastic
dimensionality reduction may be possible. However, a
faithful topographic mapping is necessary to preserve the
information contained in the hyperspectral image. In
addition to the conscience algorithm explicit magnication
control (Bauer et al., 1996) according to Eq. (3.10) and the
growing SOM (GSOM) procedure, as extensions of
the standard SOM, are suitable tools (Villmann, 2002).
The GSOM produced a lattice of dimensions 8 6 6 for
the LCVF image, which represents a radical dimension
reduction, and it is in agreement with the Grassberger
Procaccia analysis earlier. The resulting false color
visualization of the spectral clusters is depicted in Fig. 10.
It shows a segmentation similar to that in the supervised
classication in Fig. 7, top panel. Closer examination
reveals, however, that a number of the small classes with
subtle spectral differences from others were lost (B, G, K, Q,
R, S, T classes in Fig. 7, top panel). In addition, some
confusion of classes can be observed. For example, the
bright yellow cluster appears at locations of known young
basalt outcrops (class F, black, in Fig. 7, and it also appears
at locations of alluvial deposits (class H, orange in Fig. 7,
top panel). This may inspire further investigation into the
use of magnication control, and perhaps the growth criteria
in the GSOM.
5. Conclusion
SOMs have been showing great promise for the
analyses of remote sensing spectral images. With recent
advances in remote sensor technology, very high-dimen-
sional spectral data emerged and demand new and
advanced approaches to cluster detection, visualization,
and supervised classication. While standard SOMs
produce good results, the high dimensionality and large
amount of hyperspectral data call for very careful
evaluation and control of the faithfulness of topological
mapping performed by SOMs. Faithful topological map-
ping is required in order to avoid false interpretations of
cluster maps created by an SOM. We summarized several
new approaches developed in the past few years.
Extensions to the standard Kohonen SOM, the growing
SOM, magnication control, and GRLVQ were discussed
along with the modied topographic product
~
P: These
ensure topology preservation through mathematical con-
siderations. Their performance, and relationship to a
former powerful SOM extension, the DeSieno conscience
mechanism was discussed in the framework of case
studies for both low-dimensional traditional multi-spectral,
and very high-dimensional (hyperspectral) imagery. The
GrassbergerProcaccia analysis served for an independent
estimate of the determination of ID to benchmark ID
estimation by GSOM and GRLVQ. While we show some
excellent data clustering and classication, there remains
certain discrepancy between theoretical considerations and
application results, notably with regard to ID measures
and the consequences of dimensionality reduction to
classication accuracy. This will be targeted in future
work. Finally, since it is outside the scope of this
contribution, we want to point out that full scale
investigations such the LCVF study also have to make
heavy use of advanced image processing tools and user
interfaces, to handle great volumes of data efciently, and
for effective graphics/visualization. References to such
tools are made in the cited literature on data analyses.
Acknowledgements
E.M. has been supported by the Applied Information
Systems Research Program of NASA, Ofce of Space
Science, NAG59045 and NAG5-10432. Contributions by
Dr William H. Farrand, providing eld knowledge and data
for the evaluation of the LCVF results, are gratefully
acknowledged. Khoros (Khoral Research, Inc.) and Neural-
Works Professional (NeuralWare) packages were utilized in
research software development.
References
Adams, J. B., Smith, M. O., & Gillespie, A. R. (1993). Imaging
spectroscopy: Interpretation based on spectral mixture analysis. In C.
Peters, & P. Englert (Eds.), Remote geochemical analysis: Elemental
and mineralogical composition (pp. 145166). New York: Cambridge
University Press.
Fig. 10. GSOM-generated cluster map of the same 194 band hyperspectral
image of the Lunar Crater Volcanic Field, Nevada, USA, as in Fig. 7. It
shows groups similar to those in the supervised classication map in Fig. 7,
top panel. Subtle spectral units such as B, G, K, Q, R, S, T were not
separated, however, and for example, the yellow unit appears at both the
locations of known young basalt deposits (class F, black, in Fig. 7, top
panel) and at locations of alluvial deposits (class H, orange, in Fig. 7).
NN 17201/3/200314:06SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 13
ARTICLE IN PRESS
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Arvidson, R. E., M.D.-B,, et al. (1991). Archiving and distribution of
geologic remote sensing eld experiment data. EOS, Transactions of
the American Geophysical Union, 72(17), 176.
Bauer, H.-U., Der, R., & Herrmann, M. (1996). Controlling the
magnication factor of self-organizing feature maps. Neural Compu-
tation, 8(4), 757771.
Bauer, H.-U., & Pawelzik, K. R. (1992). Quantifying the neighborhood
preservation of self-organizing feature maps. IEEE Transactions on
Neural Networks, 3(4), 570579.
Bauer, H.-U., & Villmann, T. (1997). Growing a hypercubical output space
in a self-organizing feature map. IEEE Transactions on Neural
Networks, 8(2), 218226.
Benediktsson J. A., Sveinsson, J. R., et al (1994). Classication of very-
high-dimensional data with geological applications. Proceedings of
MAC Europe 91 (pp. 1318), Germany: Lenggries.
Benediktsson, J. A., Swain, P. H., et al (1990). Classication of very high
dimensional data using neural networks. IGARSS90 10th Annual
International Geoscience and Remote Sensing Symposium (Vol. 2, p.
1269).
Bezdek, J., & Pal, N. R (1993). Fuzzication of the self-organizing feature
map: Will it work? Proceedings of the SPIEThe International Society
for Optical Engineering, 2061 (p. 142162).
Bezdek, J. C., Pal, N. R., Hathaway, R. J., & Karayiannis, N. B (1995).
Some new competitive learning schemes. Proceedings of the SPIE
The International Society for Optical Engineering 2492 (pt. 1) (pp.
538549) (Applications and Science of Articial Neural Networks
Conference, 1721 April 1995, Orlando, FL, SPIE).
Bishop, C. M., Svensen, M., & Williams, C. K. I. (1998). GTM:
the generative topographic mapping. Neural Computation, 10,
215234.
Bojer, T., Hammer, B., Schunk, D., & von Toschanowitz, K. T (2001).
Relevance determination in learning vector quantization. Proceedings
of European Symposium on Articial Neural Networks (ESANN2001)
(pp. 271276). Brussels, Belgium: D facto publications.
Bruske, J., & Merenyi, E (1999). Estimating the intrinsic dimensionality of
hyperspectral images, Proceedings of European Symposium on
Articial Neural Networks (ESANN99) (pp. 105110), Brussels,
Belgium: D facto publications.
Buhmann, J., & Kuhnel, H. (1993). Vector quantization with complexity
costs. IEEE Transactions on Information Theory, 39, 11331145.
Campbell, J. (1996). Introduction to remote sensing. USA: The Guilford
Press.
Clark, R. N. (1999). Spectroscopy of rocks and minerals, and principles of
spectroscopy. In A. Rencz (Ed.), Manual of remote sensing. New York:
Wiley.
DeSieno, D (1988). Adding a conscience to competitive learning,
Proceedings of ICNN88, International Conference on Neural Net-
works (pp. 117124) Piscataway, NJ: IEEE Service Center.
Duda, R., & Hart, P. (1973). Pattern classication and scene analysis. New
York: Wiley.
Erwin, E., Obermayer, K., & Schulten, K. (1992). Self-organizing maps:
ordering, convergence properties and energy functions. Biological
Cybernetics, 67(1), 4755.
Farrand, W. H (1991). VIS/NIR reectance spectroscopy of tuff rings and
tuff cones. PhD Thesis. University of Arizona.
Fritzke, B. (1995). In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), A
growing neural gas network learns topologies (Vol. 7). Advances in
neural information processing systems, Cambridge, MA: MIT Press.
Gath, I., & Geva, A. (1989). Unsupervised optimal fuzzy clustering. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 11,
773781.
Goppert, J., & Rosenstiel, W. (1997). The continuous interpolating self-
organizing map. Neural Processing Letters, 5(3), 185192.
Graepel, T., Burger, M., & Obermayer, K. (1998). Self-organizing maps:
generalizations and new optimization techniques. Neurocomputing,
21(13), 173190.
Grassberger, P., & Procaccia, I. (1983). Measuring the strangeness of
strange attractors. Physica D, 9, 189208.
Green, R. O (1996). Summaries of the Sixth Annual JPL Airborne
Geoscience Workshop. AVIRIS Workshop, Pasadena, CA, March 46.
Green, R. O., & Boardman, J (2000). Exploration of the relationship
between information content and signal-to-noise ratio and spatial
resolution. Proceedings of the Ninth AVIRIS Earth Science and
Applications Workshop, Pasadena, CA.
Gross, M. H., & Seibert, F. (1993). Visualization of multidimensional
image data sets using a neural network. Visual Computer, 10,
145159.
Hammer, B., & Villmann, T. (2001a). Estimating relevant input dimensions
for self-organizing algorithms. In N. Allinson, H. Yin, L. Allinson, & J.
Slack (Eds.), Advances in self-organising maps (pp. 173180). London:
Springer.
Hammer, B., & Villmann, T (2001b). Input pruning for neural gas
architectures, Proceedings of European Symposium on Articial Neural
Networks (ESANN2001) (pp. 283288), Brussels, Belgium: D facto
publications.
Hammer, B., & Villmann, T. (2002). Generalized relevance learning vector
quantization. Neural Networks, 15(8/9), 10591068.
Hasenjager, M., Ritter, H., & Obermayer, K. (1999). Active data selection
for fuzzy topographic mapping of proximities. In G. Brewka, R. Der, S.
Gottwald, & A. Schierwagen (Eds.), Fuzzy-neuro-systems99 (FNS99)
(pp. 93103). Proceedings of the FNS-Workshop, Leipzig: Leipziger
Universitatsverlag.
Haykin, S. (1994). Neural networksA comprehensive foundation. New
York: IEEE Press.
Howell, E. S., Merenyi, E., & Lebofsky, L. A. (1994). Classication of
asteroid spectra using a neural network. Journal of Geophysics
Research, 99(10), 847865.
Kaski, S (1998). Dimensionality reduction by random mapping: Fast
similarity computation for clustering. Proceedings of IJCNN98,
International Joint Conference on Neural Networks (Vol. 1, pp. 413
418), Piscataway, NJ: IEEE Service Center.
Kaski, S., & Sinkkonen, J. (1999). Metrics induced by maximizing mutual
information. Publications in computer and information science, report
A55, Espoo, Finland: Helsinki University of Technology.
Kohonen, T. (1995). Self-organizing maps (Vol. 30). Springer series in
information sciences, Berlin: Springer, Second extended edition 1997.
Kohonen, T. (1999). Comparison of SOM point densities based on different
criteria. Neural Computation, 11(8), 212234.
Kohonen, T., Kaski, S., Lappalainen, H., & Saljarvi, H (1997).
The adaptive-subspace self-organizing map (assom), International
Workshop on Self-Organizing Maps (WSOM97) (pp. 191196),
Helsinki.
Linsker, R. (1989). How to generate maps by maximizing the mutual
information between input and output signals. Neural Computation, 1,
402411.
Martinetz, T. M., Berkovich, S. G., & Schulten, K. J. (1993). Neural-gas
network for vector quantization and its application to time-series
prediction. IEEE Transactions on Neural Networks, 4(4), 558569.
Martinetz, T., & Schulten, K. (1994). Topology representing networks.
Neural Networks, 7(3), 507522.
Merenyi, E (1998). Self-organizing ANNs for planetary surface compo-
sition research. Proceedings of European Symposium on Articial
Neural Networks (ESANN98) (pp. 197202), Brussels, Belgium: D
facto publications.
Merenyi, E (1999). The challenges in spectral image analysis: An
introduction and review of ANN approaches. Proceedings of European
Symposium on Articial Neural Networks (ESANN99) (pp. 9398),
Brussels, Belgium: D facto publications.
Merenyi, E. (2000). Precision mining of high-dimensional patterns with
self-organizing maps: Interpretation of hyperspectral images. In P.
Sincak, & J. Vascak (Eds.), Quo vadis computational intelligence? New
trends and approaches in computational intelligence (Vol. 54). Studies
in fuzziness and soft computing, Physica-Verlag.
NN 17201/3/200314:08SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 14
ARTICLE IN PRESS
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
U
N
C
O
R
R
E
C
T
E
D

P
R
O
O
F
Merenyi, E., Howell, E. S., et al. (1997). Prediction of water in asteroids
from spectral data shortward of 3 microns. ICARUS, 129(10),
421439.
Merenyi, E., Farrand, W. H., et al (2001). Efcient geologic mapping from
hyperspectral images with articial neural networks classication: a
comparison to conventional tools. IEEE TGARS (in preparation).
Merenyi, E., Singer, R. B., & Miller, J. S. (1996). Mapping of spectral
variations on the surface of mars from high spectral resolution
telescopic images. ICARUS, 124(10), 280295.
Meyering, A., & Ritter, H (1992). Learning 3D-shape-perception with local
linear maps. Proceedings of IJCNN92 (pp. 432436).
Moon, T., & Merenyi, E (1995). Classication of hyperspectral images
using wavelet transforms and neural networks. Proceedings of the
Annual SPIE Conference (p. 2569), San Diego, CA.
Pendock, N (1999). A simple associative neural network for producing
spatially homogeneous spectral abundance interpretations of hyper-
spectral imagery. Proceedings of European Symposium on Articial
Neural Networks (ESANN99) (pp. 99104), Brussels, Belgium: D
facto publications.
Pregenzer, M., Pfurtscheller, G., & Flotzinger, D. (1996). Automated
feature selection with a distinction sensitive learning vector quantizer.
Neurocomputing, 11(1), 1929.
R. S. Inc., ENVI v.3 Users Guide (1997).
Richards, J., & Jia, X. (1999). Remote sensing digital image analysis.
Berlin: Springer, Third, revised and enlarged edition.
Ritter, H. (1993). Parametrized self-organizing maps. In S. Gielen, & B.
Kappen (Eds.), Proceedings of the ICANN93 International Con-
ference on Articial Neural Networks (pp. 568575). London:
Springer.
Ritter, H., & Schulten, K. (1986). On the stationary state of Kohonens self-
organizing sensory mapping. Biological Cybernetics, 54, 99106.
Sato, A. S., & Yamada, K. (1995). Generalized learning vector
quantization. In G. Tesauro, D. Touretzky, & L. Leen (Eds.), (Vol. 7)
(pp. 423429). Advances in neural information processing systems,
Cambridge, MA: MIT Press.
Schurmann, J. (1996). Pattern classication. New York: Wiley.
Somervuo, P., & Kohonen, T. (1999). Self-organizing maps and learning
vector quantization for feature sequences. Neural Processing Letters,
10(2), 151159.
Ultsch, A. (1992). Knowledge acquisition with self-organizing neural
networks. In I. Aleksander, & J. Taylor (Eds.), (Vol. I) (pp. 735738).
Articial neural networks, 2, Amsterdam: North-Holland.
Villmann, T. (2000). Controlling strategies for the magnication factor in
the neural gas network. Neural Network World, 10(4), 739750.
Villmann, T. (2002). Neural maps for faithful data modelling in medicine
state of the art and exemplary applications. Neurocomputing, 48(14),
229250.
Villmann, T., Der, R., Herrmann, M., & Martinetz, T. (1997). Topology
preservation in self-organizing feature maps: exact denition and
measurement. IEEE Transactions on Neural Networks, 8(2), 256266.
Villmann, T., & Herrmann, M (1998). Magnication control in neural
maps. Proceedings of European Symposium on Articial Neural
Networks (ESANN98) (pp. 191196), Brussels, Belgium: D facto
publications.
van Hulle, M. M. (2000). Faithful representations and topographic maps
from distortion- to information-based self-organization. London:
Wiley.
NN 17201/3/200314:09SUREKHA64643 MODEL 5
T. Villmann et al. / Neural Networks xx (0000) 115 15
ARTICLE IN PRESS
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680

Potrebbero piacerti anche