Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/237198126
CITATIONS READS
0 304
2 authors:
All content following this page was uploaded by M. Nasser on 22 May 2014.
www.ijert.org 1
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 1 Issue 6, August - 2012
decomposition for the case of nonsquare matrices. It or even tens of thousands of genes. Gene expression
shows that any real matrix can be diagonalized by data are currently rather noisy, and SVD can detect and
using two orthogonal matrices. The eigen value extract small signals from noisy data. Since SVD can
decomposition, instead, works only on square matrices reduce data in both ways–columns (generally indicates
and uses only one matrix ( and its inverse) to achieve variables) and rows (generally indicates cases), and is
diagonalization. If the matrix is square and symmetric, more numerically stable, and moreover, PCA can be
then the two orthogonal matrices of SVD become undertaken as a by product of SVD, in modern research
equal, and eigen value decomposition and SVD become it is being used more frequently in place of classical
one and same thing. Because the SVD is much more PCA for data compression ( Diamantaras and
general than the eigen value decomposition and Kung,1996;)[16] , clustering (Murtagh,2002;)[17] and
intimately related to the matrix rank and reduced rank multivariate outliers detection (Penny and
least square approximations, it is a very important and Jolliffe,2001;)[18].
useful tool in matrix theory, statistics and signal
analysis. It can be used as a data reduction technique. 3. Low Rank Approximation of SVD
Low rank approximation (C. Eckart and G. Young,
1936)[10] is an important properties of SVD. It has an
Singular value decomposition, specially its low rank wonderful data reduction capacity with minimum
approximation property is an elegant part of modern recovery error. We can reduce variables as well as
matrix theory. After its inception (1936)[10] its two observations by using SVD. If X is m×n matrix of rank
ways fascinating data reduction capacity remained k min(m,n). Then by singular value decomposition we
unnoticed till the last quarter of last century. Since then can write,
statisticians have been showing increasing interest to
SVD for principal component analysis (PCA), X U VT (1)
canonical correlation analysis (CCA) and cluster
analysis. Principal component analysis (PCA), often
performed by singular value decomposition (SVD), is a
where U is the column orthonormal matrix whose
popular analysis method that has recently been T
explored as a method for analyzing large-scale columns are the eigen vectors of XX , is the
expression data (Raychaudhuri et al., 2000; Alter et al., diagonal matrix contain the singular values of X and V
2000)[11,12]. Additionally SVD/PCA has been used to is the orthogonal matrix whose columns are the eigen
identify high-amplitude modes of fluctuations in vectors of XTX .
macromolecular dynamics simulations (Garcia, 1992;
Romo et al., 1995)[13,14], and identify structural From (1) we can write
intermediates in lysozyme folding using small-angle T T T
scattering experiments (Chen et al., 1996)[15]. One of X uv
1 1 1 2 u 2 v2 k u k vk .
the challenges of bioinformatics is to develop effective
ways to analyze global gene expression data. A ~
Suppose we approximate X by X whose rank is r< k ≤
rigorous approach to gene expression analysis must min(m,n).
involve an up-front characterization of the structure of
the data. In addition to a broader utility in analysis ~ T T T
X uv
1 1 1 2 u 2 v2 r u r vr .
methods, singular value decomposition (SVD) and
principal component analysis (PCA) can be valuable ~
tools in obtaining such a characterization. SVD and X Ur r VrT
PCA are common techniques for analysis of
multivariate data, and gene expression data are well where U r is m×r, r is a diagonal matrix of
suited to analysis using SVD/PCA. A single microarray
experiment can generate measurements for thousands, order r and V r is n × r. Now post multiply
www.ijert.org 2
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 1 Issue 6, August - 2012
V r in both side we have, outside the box. So in our above graph we can say that
6, 25, 30, 32, 34, 36, 102, 104 -111 are unusual
~ observations. Hubert, Rousseeuw and
XVr Ur r Branden(2005)[19] declared observations 25, 30, 32,
34, 36, 102-111 as outliers by using ROBPCA.
its first column represents the first PC, second column
represents the second PC and so on. Hence we see that
~
X is a m×n matrix but X V r is a m×r. Generally n
represents no. of variables, so it reduces data by
minimizing no. of variables.
www.ijert.org 3
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 1 Issue 6, August - 2012
www.ijert.org 4
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 1 Issue 6, August - 2012
multivariate outliers but SVD based technique is better [10] Eckart C. and Young G. (1936). The
than those methods. approximation of one Matrix by another of lower Rank.
Psychometrika,1,211-218. doi: 10.1007/BF02288367
8. Reference
[11] Raychaudhuri S., Stuart J.M. and Altman R.B.
(2000). Principal components analysis to summarize
microarray experiments: application to sporulation time
[1] Rousseeuw P.J. and Leroy A. (1987). Robust series. Pac Symp Biocomput 2000:455-66.
Regression and Outlier Detection. New York: Wiley.
doi:10.1002/0471725382 [12] Alter O., Brown P.O. and Botstein D. (2000).
Singular value decomposition for genome- wide
[2] Rousseeuw P.J., Van Zomeren B.C. (1990). expression data processing and modeling. Proc Natl
Unmasking multivariate outliers and leverage points. Acad Sci USA 2000; 97:10101-06.
Journal of the American Statistical Association. Vol. doi:10.1073/pnas.97.18.10101
85(411), pp. 633-651.
[13] Garcia A.E. (1992). Large-Amplitude Nonlinear
[3] Srikantan K. S.(1961). Testing for the single outlier Motions in Proteins. Phys Rev Lett 1992; 68:2696-
in a regression model. Shankhya, vol-23, Series A, 251- 99. doi: 10.1103/PhysRevLett.68.2696
260.
[14] Romo T.D., Clarage J.B., Sorensen D.C. and
[4] Ellenberg J.H.(1976). Testing for single outlier Phillips G.N. Jr., (1995). Automatic identification of
from a general regression. Biometrics, 32, 637- 645. discrete substates in proteins: singular value
doi:10.2307/2529752 decomposition analysis of time-averaged
crystallographic refinements. Proteins 1995; 22:311-
[5] Rousseeuw P.J. (1984). Least median of squares
21. doi:10.1002/prot.340220403
regression. Journal of the American Statistical
Association, 79, 871 - 880. [15] Chen L., Hodgson K.O. and Doniach S. (1996). A
lysozyme folding intermediate revealed by solution
[6] Hadi A.S. and Simonoff J.S. (1993). Procedure for
X-ray scattering. J Mol Biol 1996; 261:658-71.
the identification of multiple outliers in linear models.
doi:10.1006/jmbi.1996.0491
Journal of the American Statistical Association, 88,
1264 - 1272. doi: 10.1080/01621459.1993.10476407 [16] Diamantaras K.I. and Kung S.Y. (1996). Principal
Components Neural Netwarks: Theory And
[7] Atkinson A.C. (1994). Fast very robust methods for
Applications, John wiley & sons, Inc. N.Y 45-46.
the detection of multiple outliers. Journal of the
American Statistical Association 89; 1329 - 1339. [17] Murtagh F.(2002). Clustering in High-dimensional
doi:10.1080/01621459.1994.10476872 Data Spaces. Classification, Clustering and Data
Analysis, eds. Jajuga,K., Sokolowski,A. and Bock,H.,
[8] Munier S. (1999). Multiple outlier detection in
Springer-Verlag, Berlin, pp.89-96.
logistic regression. Student 3, 117 -126.
[18] Penny K. I. and Jolliffe I. T. (2001). A comparison
[9] Imon A.H.M.R. and Hadi Ali S. (2005).
of multivariate outlier detection methods for
Identification of Multiple Outliers in Logistic
clinical laboratory safety data. Royal Statistical
Regression. International Statistics Conference on
Society, part 3 , PP.295-308. doi:10.1111/1467-
Statistics in the Tecnological Age, Institute of
9884.00279
Mathematical Sciences, University of Malaya,
Kualalumpur, Malaysia, December 2005. [19] Hubert M., Rousseeuw P.J. and Branden K.V.
doi:10.1080/03610920701826161 (2005). ROBPCA: a New Approach to Robust
Principal Components Analysis. Technometrics, 47, 64-
79. doi:10.1198/004017004000000563
www.ijert.org 5
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 1 Issue 6, August - 2012
www.ijert.org 6
View publication stats