Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract — The issue of data normalization has been levels at the two ears as a sound source moves about an
extensively studied with respect to many different applications. animal and are created by head and body shadowing effects
However, before implementing any data normalization, it is which affect high frequency sounds more than low frequency
necessary to establish whether the data requires normalization, sounds [15]. There is a vast literature on the importance of
and this can be determined by a simple test-bench test. Our
test-bench compared seven well-known normalization
ILDs and how neurons at various brain levels respond to
techniques for nine possible sensitivity functions recorded from ILDs that cover a wide azimuthal range across frontal space,
brain neurons to variations in Interaural Level Differences from opposite one ear across to opposite the other. We focus
(ILD), a cue to azimuthal location of a sound source. These on application of data normalization techniques to the
nine realistic ILD functions were systematically modified to be response heterogeneity in ILD functions (plots of the
delivered at the most suitable normalization technique. This strength of neuronal responses to variations in ILDs)
method then helped to select a coherent normalization recorded from neurons in an obligatory midbrain auditory
technique for the data before applying the statistical technique relay structure, the Inferior Colliculus. To examine the effect
of Cluster Analysis which can be explored in future studies. of different normalization methods, we created a test bench
of theoretical ILD functions. These functions included
Keywords - Data Normalization; Interaural Level Difference
variations in different features of ILD functions to simulate
I. INTRODUCTION TO DATA NORMALIZATION all the ILD variants reported in the literature. Use of this
standard simulated set of ILD functions allowed direct
Data normalization is a scaling process for numbers in a comparison between different normalization methods and we
data array and is used where there is the heterogeneity in the carried out a three-step procedure to find the best ‘tailored’
numbers renders difficult any standard statistical analysis. data normalization for prototypical ILD functions.
The data is then often normalized before any application
process is applied and therefore data normalization is usually
termed as data pre-processing. Many different data II. GENERATING PROTOTYPICAL ILD FUNCTIONS
normalization techniques have been developed in diverse Our own database and an extensive literature review
applications such as in diagnostic circuits in electronics [1], showed that four prototypical ILD functions can be recorded
temporal coding in vision [2], predictive control systems in in [different] neurons at all levels of the brain above the
seismic activities [3], modeling labor market activity [4], brainstem. These four functions (Figure 1) consist of (a) two
pattern recognition [5], and extensively in microarray data Sigmoid functions where neuronal responses vary
analysis in genetics, [6], [7], [8], [9], [10], [11], [12], [13]. sigmoidally across a wide range of ILDs with a plateau of
The purpose of data normalization depends on the responses in the ILD range favoring either one ear or the
proposed application, and hence data normalization includes other, (b) a Peaked function where neuronal responses are
use of linear scaling to compress a large dynamic range [1], peaked at some ILD within the range encompassing frontal
scaling of values to correct for variation in laser intensity space, and (c) Insensitive functions where neuronal
[11], handling obscure variation [6] or removing systematic responses vary very little with ILDs. Each of these four
errors in data [5], [10], [13], or efficiently removing broad response categories encompasses functions that can
redundancy in a non-linear model as an optimal vary in metrics defining the features of the ILD function, e.g.
transformation for temporal processing [2]. Although the position along the ILD axis of the peak of responses or of the
benefits of data normalization depend on data type, data size slope from maximum to minimum responses, the steepness
and normalization method, generally the advantages of data of the slope– features that have been variously discussed to
normalization are (a) to give a more meaningful range of be defining information-bearing elements that may be used
scaled numbers for use, (b) to rearrange the data array in a to derive azimuthal location of a sound source [16].
more regular distribution, (c) to enhance the correctness of In the simulated ILD sensitivity functions, 13 ILD values
subsequent calculations, and (d) to increase the significance were used ranging from +30 dB (30 dB louder in one ear) to
or importance of the most descriptive numbers in a non- -30 dB (30 dB louder in the other ear) as detailed in Section
normally distributed data set. V below. Neuronal responses were represented as spikes /
stimulus on a scale from ‘0’ to ‘100’ (“m”, maximum
Here we examine the use of data normalization methods response). This normalized scale allowed us to simulate ILD
to model responses of brain neurons to variations in an functions in absolute values across all normalization tests to
important parameter for localization of the azimuthal allow comparison of effects across the tests.
location of high-frequency sounds, Interaural Level
Differences (ILDs) [14]. ILDs are the difference in sound
196
preferred method is use of correlation matrices for reduction technique was also unsuitable for the ILD functions. The
of data dimension (such as in Principal Component number of spike counts for all nine ILD functions originally
Analyses) because the correlation matrix is a normalized varied from nearly 0 to 100 (spikes/stimulus). Logarithmic
measure of linear relationship between variables.[26]. The normalization scaled down the spike count range by log2 but
result of this data normalization technique is (1) Maximum the outcome was an exaggerated perturbation of ILD
and minimum values of all normalized data variables are functions, which deformed the original functions. Moreover,
spread between negative and positive values, which some irregular transformations occurred, especially that
misrepresent the original values of the ILD data set, and (2) transitions from Peaked to Sigmoidal and to Insensitive ILD
variations of the four Insensitive ILD patterns are
functions were not smooth as in the original data sets.
exaggerated by normalization.
F. Unit Total Probability Mass normalization
B. Data normalization by a single maximum value
The unit total probability mass (UTPM) normalization or
All unnormalized data are scaled to range between 0 and total intensity normalization [9] is achieved by dividing
a single maximum value (Table 1.2). This maximum value each vector’s element by the sum of that vector’s variables,
is the largest of the maxima across all unnormalized data and multiplying by the mean (Table 1.6, where “Xn”= raw
sets and is used to divide each matrix component to generate data, “µn”= mean value, and “Vn”= normalized data) [3]
a “Vn(i, j)” normalized matrix, i.e. the maximum is always [31] [4] [32]. It has been used in the cumulative distribution
=1 and the minimum always = 0 (where, ∀ Xn(i) ≠0). This function [33] as a discrete type of data normalization and in
technique is often used for microarray data [13], [28]. normalizing input for cluster analysis [8]. It was well suited
However for our data it resulted in value changes that were to our test bench because: (1) The perturbation in variations
so small that it was not possible to distinguish very small of Sigmoidal, Peaked, and Insensitive ILD functions with
changes among similar types of ILD patterns. varying spike counts were not distorted; (2) The positions of
varying cut-off and varying slopes of the ILD functions
C. Data normalization by each vector’s maximum value were kept the same as in the original functions; and (3) The
Data points in each ILD are divided by the maximum irregularity transformation (transitions from Peaked to
spike count in that function, i.e., each vector’s maximum Sigmoidal and to Insensitive ILD functions) were not too
“Xn (i, j)” is used to normalize the “Xn” raw matrix (Table smooth as in the original data set. The maximum values of
1.3). It has the advantage that all functions are scaled to a each ILD functions are all scaled down by ~13% without
maximum of “1” (i.e. the maximum spike count for that losing their original shapes.
function) and other values are effectively expressed as a G. Normalization by data standardization
proportion of this maximum spike count for that function.
Data standardization is achieved by dividing the mean
The disadvantage of this procedure was that the shapes of
subtracted data (mean-corrected) by its standard deviation,
ILD functions with small spike counts became distorted
(Table 1.7: mean “µn” value is subtracted from each data
because each vector’s maximum spike counts were all
point “Xn” and then is divided by its SD “σn”). The
normalized to the same maximum of 1. Thus, this technique
variances of standardized variables are 1 and therefore
was not suitable for all ILD functions, especially those with
covariance of standardized variables always ranges between
small spike count changes with slight perturbations.
‘-1’ and ‘+1’ [28]. An advantage of this normalization is
D. Data normalization by each vector’s standard deviation that data are expressed in comparable units. On some
An uncommon technique is where each data point is occasions, data have been standardized by zero mean and
normalized by its standard deviation (Table 1.4: “σ” = SD, unity SD (like σ = 1 in [34], towards a standard procedure
“X” = raw matrix, and “V” = normalized matrix). This for Principal Component Analysis). This normalization can
produces better data spread, especially for asymmetric also make a more efficient front-end application for neural
functions (e.g., our Sigmoidal functions) [29]. However, it network training [1]. However, this technique was not
was unsuitable for our data because it did not preserve the suitable for the insensitive type of ILD functions as
different number of spike counts in a proportionally scaled perturbations of the ILD functions were adversely affected
manner for similar types of nonlinear ILDs. by this technique, causing ILD function distortion. The
maxima and minima values in this normalization method are
E. Logarithmic normalization also spread between negative and positive values to keep the
Logarithmic normalization is widely used especially mean = 0 and SD = 1 for more nonlinear type of functions
when data analysis involves large number of data [30]. It is i.e. not for insensitive type ILD functions.
a nonlinear procedure [10] that is suitable to deal with
nonlinear data [12]. Each data point “Xn” is divided by the VII. CONCLUSION AND DISCUSSION
mean value “µn” for that function. Then the logarithm to In general terms normalization is signal intensity divided
base 2 (log2) “Vn” is calculated (Table 1.5). With this by a reference value, to reduce systematic errors in data
transformation there is decreased variance [8], as large spike variables [30]. Data normalization also maximizes variance
counts are reduced more than low ones. However, this [34], which is especially important before applying data
197
dimension reduction technique for ILD type data. Data [11] D. Venet, "MatArray: a Matlab toolbox for microarray data,"
Bioinformatics, v. 19, pp. 659-60, 2003.
normalization is often a prerequisite for statistical data
[12] J. Weiner III, C. Zimmerman, H. Gohlmann, and R. Herrmann,
analysis, and finding a suitable scaling technique for the "Transcription profiles of the bacterium Mycoplasma pneumoniae
data is an important task. In a novel approach for the field, grown at different temperatures," Nucleic Acids Research, v. 31, pp.
we developed a test bench of prototypical ILD functions to 6306-20, 2003.
[13] C. Workman, L. Jensen, H. Jarmer, R. Berka, L. Gautier, H. Nielser,
investigate appropriate normalization techniques. We found
and et al, "A new non-linear normalization method for reducing
that the unit total probability mass normalization method variability in DNA microarray experiments," Genome Biology, v. 3,
was the best for ILD response functions. 2002.
Other data normalization techniques can be generated by [14] D. Irvine, "IID in the cat: changes in sound pressure level at the two
ears associated with azimuthal displacements in the frontal horizontal
variation of existing ones, such as dividing by sum of all
plane," Hearing Research, v. 26, pp. 267-86, 1987.
signals or by the standard error signals after corrected mean [15] W. Hartmann and B. Rakerd, "ILD: Diffraction and localization by
values [12]. Also slight variations of normalization human listeners," The Journal of the Acoustical Society of America, v.
techniques used here could give a procedure useful for a 129, p. 2622, 2011.
[16] B. Grothe, M. Pecka, and D. McAlpine, "Mechanisms of Sound
non-linear feature for the data [10]. For our data this could
Localization in Mammals," Physiological Reviews, v.90, pp. 983-
be achieved by multiplication of the mean value of data for 1012, 2010.
our unit total probability mass normalization technique for [17] H. Tuckwell, Stochastic processes in the neurosciences. Philadelphia,
each ILD functions. PA, USA: Society for Industrial and Applied Mathematics, 1989.
[18] D. Irvine, The Auditory Brainstem: A Review of the Structure and
In addition to visual comparisons of the result of a
Function of Auditory Brainstem Processing Mechanisms, 1st ed. N.Y.:
selected normalization technique against the raw data, other Springer-Verlag, 1986.
methods are also available for better selection of the correct [19] D. Irvine, V. Park, and L. McCormick, "Mechanisms underlying the
normalization technique. The quality of the normalization sensitivity of neurons in the lateral superior olive to IID," Journal of
Neurophysiology, v. 86, pp. 2647-66, 2001.
technique can be estimated by: (i) calculating the sum of
[20] T. Lohuis and Z. Fuzessery, "Neuronal sensitivity to interaural time
squares of differences between the model and normalization differences in the sound envelope in the auditory cortex of the pallid
histogram, (ii) using Pearson correlation coefficients bat," Hearing Research, v. 143, pp. 43-57, 2000.
between the values before and after data normalization [35]. [21] D. Phillips and D. Irvine, "Responses of single neurons in
physiologically defined area AI of cat cerebral cortex: sensitivity to
Such a quantification method for normalization selection is
IID," Hearing Research, v. 4, pp. 299-307, 1981.
worth investigation, but is beyond the scope of this study. [22] L. Aitkin, The auditory midbrain: structure and function in the
central auditory pathway. Clifton, N.J.: Humana Press, 1986.
REFERENCES [23] L. Aitkin, D. Irvine, J. Nelson, M. Merzenich, and J. Clarey,
[1] M. Aminian and F. Aminian, "Neural-network based analog-circuit "Frequency representation in the auditory midbrain and forebrain of a
fault diagnosis using wavelet transform as preprocessor," IEEE marsupial, the northern native cat," Brain Behaviour and Evoluation,
Transactions on Circuits and Systems II: Analog and Digital Signal v. 29, pp. 17-28, 1986.
Processing, v. 47, pp. 151-6, 2000. [24] L. Aitkin, The auditory cortex: structural and functional bases of
[2] M. Buiatti and C. van Vreeswijk, "Variance normalisation: a key auditory perception, 1st ed. London: Chapman & Hall, 1990.
mechanism for temporal adaptation in natural vision?," Vision [25] L. Aitkin, M. Merzenich, D. Irvine, J. Clarey, and J. Nelson,
Research, v. 43, pp. 1895-906, 2003. "Frequency representation in auditory cortex of the common
[3] Y. Kosugi, M. Sase, H. Kuwatani, N. Kinoshita, T. Momose, J. marmoset," Journal of Comparative Neurology, v. 252, pp. 175-85,
Nishikawa, and T. Watanabe, "Neural network mapping for nonlinear 1986.
stereotactic normalization of brain MR images," Journal of Computer [26] J. Jackson, A user's guide to principal components. N. Y.: Wiley &
Assisted Tomography, v. 17, pp. 455-60, 1993. Sons, Inc., 1991.
[4] G. Skoog and J. Ciecka, "Probability mass functions for additional [27] H. Moghaddam and K. Zadeh, "Fast adaptive algorithms and
years of labor market activity induced by the Markov (increment- networks for class-separability features," Pattern Recognition, v. 36,
decrement) model," Economics Letters, v. 77, pp. 425-31, 2002. pp. 1695-702, 2003.
[5] H. Kim, D. Kim, and S. Bang, "Face recognition using the mixture-of- [28] S. Sharma, Applied multivariate techniques. N.Y.: John Wiley, 1996.
eigenfaces method," Pattern Recognition Letters, v. 23, pp. 1549-58, [29] A. Zaknich, Neural networks for intelligent signal processing vol. 4.
2002. River Edge, NJ, USA: World Scientific, 2003.
[6] B. Bolstad, R. Irizarry, M. Astrand, and T. Speed, "A comparison of [30] D. Geschwind and J. Gregg, Microarrays for the neurosciences: an
normalization methods for high density oligonucleotide array data essential guide. Cambridge, MA, USA: MIT Press, 2002.
based on variance and bias," Bioinformatics (Oxford,UK), v. 19, pp. [31] Z. Ahmad, L. Balsamo, B. Sachs, B. Xu, and W. Gaillard, "Auditory
185-93, 2003. comprehension of language in young children: neural networks
[7] E. Dougherty, J. Barrera, M. Brun, S. Kim, R. Cesar, Y. Chen, M. identified with fMRI," Neurology, v.60, pp. 1598-605, 2003.
Bittner, and J. Trent, "Inference from clustering with application to [32] S. Patra and R. Misra, "Evaluation of probability mass function of
gene-expression microarrays," Journal of Computational Biology, v. flow in a communication network considering a multistate model of
9, pp. 105-26, 2002. network links," Microelectronics and Reliability, v. 36, pp. 415-21,
[8] J. Kasturi, R. Acharya, and M. Ramanathan, "An information 1996.
theoretic approach for analyzing temporal patterns of gene [33] C. C. Abnet and D. M. Freeman, "Deformations of the isolated mouse
expression," Bioinformatics (Oxford,UK), v. 19, pp. 449-58, 2003. tectorial membrane produced by oscillatory forces," Hearing
[9] J. Quackenbush, "Microarray data normalization and transformation," Research, vol. 144, pp. 29-46, June 2000.
Nature Genetics, v. 32, pp. 496-501, 2002. [34] J. Lattin, P. Green, and J. Carroll, Analyzing multivariate data. Pacific
[10] G. Tseng, M. Oh, L. Rohlin, J. Liao, and W. Wong, "Issues in cDNA Grove, CA, USA: Thomson Brooks/Cole, 2003.
microarray analysis: quality filtering, channel normalization, models [35] I. Sidorov, D. Hosack, D. Gee, J. Yang, M. Cam, R. Lempicki, and D.
of variations and assessment of gene effects," Nucleic Acids Research, Dimitrov, "Oligonucleotide microarray data distribution and
v. 29, pp. 2549-57, 2001. normalization," Information Sciences, v. 146, pp. 67-73, 2002.
198
TABLE 1: SEVEN DATA NORMALIZATION METHODS (WITH THE EQUATIONS) WERE APPLIED TO NINE (FROM “A” TO “I”)
PROTOTYPICAL ILD FUNCTIONS. THE RESULT WAS PRESENTED IN THIS TABLE WITH THE MINIMUM OF MINIMA (FOUR VECTORS)/
MAXIMUM OF MAXIMA (FOUR VECTORS) VALUES WERE ALL SHOWN IN SPIKE COUNTS.
Fig. 1: The four ILD patterns, Sigmoidal-EI (A4), Sigmoidal-IE (G4), Peak (D4), and Insensitive (I2) are all described in numbers of spike
counts “#sp.c.” (spikes/ stimulus) which varied between maxima of ‘100’ and minima of ‘0’ zero units, within -30dB to +30dB ILD level
differences. Nine possible ILD functions are generated from those four typical ILD sensitive function variations. These are; Sigmoidals with
varying number of spike count (# sp.c.) spikes/stimulus (A) position of the cutoff (B), the steepness of the slope (C), and four Peaked with
varying number of spikes/stimulus (D), the cutoff (E), the cutoff & slope (F), and Peaked with unilateral transition to Sigmoidal (G), and Peaked
with bilateral transition to Insensitive (H), and four Insensitive with varying the number of spike count spikes/stimulus (I).
199