< <
+
, where the
RMT bounds
and
+
are positive and close to 1 (explicit formula is
given in the next section). In consequence, it was argued that most of
the structure of cross-correlation matrix are due to noise. This clustered
group of eigenvalue is called the bulk.
There is one largest eigenvalue
max
which is 10 30 higher than the
maximum expected value predicted by RMT
+
and its corresponding
eigenvector is assigned to the market portfolio.
There are several other eigenvalue slightly greater than
+
which reect
the sector behavior
There are a number of eigenvalues smaller than
.
In consequence, the total ratio of eigenvalues which fall within RMT
bounds is far smaller[20, 22]
The point 1 is quite evident because in emerging markets, stocks move
more in tandem due to the low-diversication level of companies and the
2
important impact of common macro economic factors. In consequence, the
average correlation coecient in emerging markets is usually higher than
those in developed markets. Furthermore, as the market participants are less
professional, they create more volatility and higher uctuation of correlation
between stocks.
In investigating the largest eigenvalue time evolution, Kulkarni et al. [19]
found positive correlation between the largest eigenvalue and the market
volatility. Because it is commonly known that correlation tends to increase
during volatile period, the correlation between the largest eigenvalue and the
correlation level is implicitly mentioned. In subsequence section, we will show
numerically that the largest eigenvalue is proportioned to the average value of
correlation matrix elements. Together with the point 1, the largest eigenvalue
in emerging markets turns out to be higher than that in developed markets,
or the point 2. In [21], the author mentioned a temporal opposite movement
between the largest eigenvalue and the bulk of small eigenvalues due to the
propulsion eect: as the sum of all eigenvalues is always constant, a change in
value of the largest eigenvalue must be compensated by an opposite change
of the others, or the shift of the bulk. This remark explained the point 4
because a high value of the largest eigenvalue in emerging markets will result
in a shift of the eigenvalue bulk out of the lower RMT bound
.
A part from these deviations from the conventional RMT, there are more
subtle but systematic dierences also in the bulk of eigenspectrum as dis-
cussed in [23, 24]. These dierences, masked by noise, may contain useful
information on the cross-correlation structure. In addition, RMT is also used
in order to study time-lag cross correlations in multiple time series of nance
[25] as well as biology and atmospheric geophysics [26]. These authors found
long-range power-law cross correlations in the absolute values of returns that
quantify risk and nd that they decay much more slowly than cross correla-
tions between the returns.
In this paper, we investigate the statistical properties of cross-correlation
matrix of N = 90 stocks traded in the Ho Chi Minh city stock exchange
(HCMCSE) from 1 January 2007 to 2 May 2012 using RMT method. Our
studies are accorded to the previous ndings in both developed and emerging
markets. In addition, we try to quantify the magnitude of the largest eigen-
value by analyzing its dependencies on the stock number and the average
cross-correlation coecient. We employ the one-factor model similar to [21]
in order to demonstrate these behaviors. We present our method and the
one factor model in section 2, then we introduce briey the data in the Viet-
3
namese stock market in section 3. In section 4 we look at the result obtained
from empirical data as well as the simple model of correlation matrix. We
conclude in Section 5 with some potential extensions of the model.
2. Method
We consider a set of N stocks over a period of T trading days. Let S
i
(t)
the price of stock i (i = 1, ..., N) at day t (t = 1, ..., T). The daily return
R
i
(t) of stock i is dened by
R
i
(t) = ln(S
i
(t + 1)) ln(S
i
(t)) (1)
Since dierent stocks may have dierent volatility, we normalize the re-
turn by
r
i
(t) =
R
i
(t) R
i
(t)
i
(2)
where
i
=
R
2
i
(t) R
i
(t)
2
is the standard deviation or volatility of R
i
(t)
and ... denotes the average return (or trend) over the period studied. The
equal-time cross-correlation matrix C between N stocks is a N N matrix
with elements
C
ij
= r
i
(t)r
j
(t) (3)
C
ij
denotes the correlation coecient between stock i and j, which has value
between -1 and 1. C
ij
= 0 means that there is no correlation between stock
i and j while 0 < C
ij
1(1 C
ij
< 0) signies positive (negative) corre-
lation. By construction, C has value of 1 on its diagonal and between 1
and 1 when i = j. In practice, all non-diagonal elements of the matrix C are
empirical estimators of those of the true correlation matrix E and always
contain errors due to the nite time length T. The N eigenvalues
i
and
their corresponding eigenvectors u
i
are calculated by diagonalizing C, which
satisfy
Cu
i
=
i
u
i
, i = 1..N (4)
Note that
i
equals to the trace of C which is equal to N, regardless
the correlation structure of the market.
In RMT, proposed by Wigner, Dyson and Mehta [27, 28, 29, 30, 31, 32,
33], the Hamiltonian of a (nucleus) system is described by a random matrix
4
with independent random elements drawn from a known probability distri-
bution p. For the case of nancial time series, a random cross-correlation
matrix C
RMT
is obtained from N time series of length L of random re-
turn with zero mean and unit variance, using the same calculation of (2,3).
Statistical properties of such random cross-correlation matrices are solved by
Dyson and Mitra in [34, 35, 36]. In particular when N and L such
that Q L/N > 1 is xed, one obtain an analytical density of eigenvalue of
C
RMT
given by the Marchenko-Pastur formula [36]
RMT
() =
Q
2
(
+
)(
(5)
where lies between
and
+
, the minimum and maximum eigenvalues of
C
RMT
, respectively given by
= 1 +
1
Q
2
1
Q
(6)
and equal to zero elsewhere. As mentioned above, eigenvalues of C that fall
in between these bounds are interpreted as noise.
In our one-factor model, we propose a simple form of true NN cross-
correlation matrix
E, where all non-diagonal elements are constant, similar
to [21], of value
0
between (1, 1) (its diagonal elements are 1). We gener-
ate independently T times a jointly standard normal distributed N-vector.
Combining these T vectors we obtain N simulated time series of length T,
which replace the N stocks empirical time series of return. The simulated
correlation matrix
C is calculated using function 2. With T is large enough,
non-diagonal elements of
C will be normally distributed around
0
. It is easy
to deduce the eigenvalues of
E: one large eigenvalue equals to 1 +(N 1)
0
( N
0
when N is large enough) and (N 1) fold degenerate eigenvalues
equal to 1
0
< 1. Therefore, we expect that the largest eigenvalue of
C will
be of order N
0
and there will be a bulk of small eigenvalues distributed
around 1
0
< 1. These eigenvalues will be calculated from the simulated
data using functions 2 and 3. An immediate consequence of this assumption
is that for similar stock number N and average correlation coecients
0
,
the largest eigenvalues are of the same order of magnitude as pointed out in
[38].
5
3. Data
Vietnamese stock market is a frontier market which dated only since 2000
when HCMCSE was established. In 2006, the second exchange was opened
in Hanoi which trades relatively small stocks. The number of quoted stocks,
hence the market size, has increased gradually, and became considerable since
2007. As the market size of HCMCSE is 3 higher than that of the other,
we analyze the daily stock return in HCMCSE from 1 January 2007 to 2 May
2012. During this period there are N = 90 stocks which were continuously
trading for a total of L = 1324 trading days. We use the closing price to
calculate the daily return.
4. Results and discussion
4.1. Distribution and dynamics of correlation coecient
In this section, we analyze the statistical properties of the elements of
cross-correlation matrix C. Figure 1(a) shows the distribution P(C
ij
) of
90 stocks over the whole analyzing period from 1 January 2007 to 2 May
2012. Other descriptive statistics are presented in table 1. We found that
the average value C
ij
of 0.3663 is relatively large in comparison to others
studies, suggesting that stocks in Vietnamese market are strongly correlated.
Furthermore, the whole period correlation matrix elements are all positive,
suggesting that the diversication level within the Vietnamese stock market
is quite low. In consequence, its systematic risk is rather high. We also found
that over the whole period the distribution is relatively closed to normal, with
a small positive skewness of 0.13 (there is a few number of high elements
higher than 0.6) and kurtosis of 2.7 (3.0 is considered as normal). These
ndings support our above assumptions in the one-factor model.
In gure 1(b) we plot the time-varying distribution of cross-correlation
matrices using a sliding window of 250 days with ve days lag time. We found
that this distribution changes considerably over time. Figure 1(c) shows the
temporal dynamic of the average value C
ij
and others descriptive statistics
of elements of C using the same slide windows. The average correlation
coecients peaked at the crisis period of end 2008, then gradually decreases.
Until August 2010, its distribution standard deviation is relatively low, its
skewness is negative and kurtosis is closed to the corresponding normal value.
After August 2010, the distribution became positively skew and its shape
deviated from that of normal distribution.
6
Figure 1: (a) Probability density of C
ij
for the whole period 2007-2012. (b) Temporal
evolution of cross-correlation distribution using a sliding window of 250 days with ve
days lag time. One notices strong uctuation of the average value C
ij
as well as the
distribution shape. (c) Summary statistics of the distributions in (b) plotted in function
of times
7
Mean Standard deviation Skewness Kurtosis
0.3664 0.0910 0.1347 2.7147
Table 1: Statistics for cross-correlation of Vietnamese stock between Jan 2007 - May 2012
4.2. Eigenvalue spectrum
In this section, we decompose the cross-correlation matrix and calcu-
late its eigenvalues and eigenvectors. The eigenvalue spectrum is showed in
2(a) together with the spectrum predicted by RMT theory. We nd similar
characteristics as most other studies: the largest eigenvalue
max
= 34.4,
while N C
ij
= 90 0.3664 = 32.9. The similarity between
max
and
N C
ij
again support arguments in our simple one-factor model. In our
study, the theoretical
and
+
are 0.546 and 1.589, respectively. In the
insert graph, we found that apart from the largest eigenvalue, there are two
other eigenvalues which are beyond the maximum RMT value
+
. This devi-
ating eigenvalue number is small in comparison to other studies in developed
markets [9, 10], but in line with those in emerging markets [20]. These eigen-
values are assigned to the sector group in the correlation matrix [10]. That
means stocks in emerging markets are less sector-specic than in developed
markets.
Nb. Groups Market L N Q C
ij
max
+
1 V. Plerou US 8685 422 20.58 0.12 46.3 1.49*
2 A. Utsugi US 2598 297 8.75 - 52.2 1.79
3 A. Utsugi Tokyo 1848 493 3.75 - 121.6 2.3
4 S. Cukur Turkey 1516 206 7.36 0.35 83.3 1.87
5 G. Oh Korean 2845 473 6.01 0.20 96 1.8
6 D. Wilcoz S. Africa 1304 244 5.34 - 21.2 2.23
7 V. Kulkarni India 80 70 1.14 - 9.17 3.63
8 Our group Vietnam 1324 90 14.7 0.366 34.4 1.59
Table 2: Data summary and largest eigenvalue statistics of some previous studies; ():
estimated from publication; (*): calculated using formula 6. Remark that in studies 1 and
5, N are closed but as C
ij
is 2 higher in 5,
max
is also 2 higher in 5. In 4 and 8,
C
ij
are similar, but N is 2.3 higher in 4 resulting in
max
of 2.4 higher
In addition, we found that only about half of small eigenvalue bulk lies
between RMT bounds, considerably lower than results in developed mar-
kets [10]. We account this fact to the repulsion eect between the largest
8
Figure 2: (a) Probability density of
i
in comparison with RMT density (the red solid
line). Remark that the largest eigenvalue
max
is 21 times higher than
+
. Insert graph
show a zoom into the bulk. We found 2 other eigenvalues larger than
+
. The bulk is
signicantly deviated from RMT, with only half fall within the RMT bounds. (b) Inverse
participant ratio of eigenvalues (c) The time-varying comparision of C
ij
, the largest
eigenvalue, percentage of deviating eigenvalue and the average of 80 small eigenvalues
using the sliding windows of 250 days with 5 days lag time.
9
eigenvalue and the small eigenvalues [21] as the sum of all eigenvalues remains
constant: as
+
is higher in emerging markets as discuss previously, the small
eigenvalues bulk is repulse further to the left resulting in a high ratio of RMT
o-limit. We demonstrate this eect in gure 2(c) where the movements of
the largest eigenvalue and the average of 80 smallest eigenvalues are opposite
with correlation factor of -0.97. Combined with the previous ndings, we
deduce that the ratio of deviating eigenvalue from RMT bounds, mainly due
to the shift of small eigenvalues bulk beyond
l=1
[u
k
l
]
4
(7)
where u
k
l
, l = 1, ...., N are the components of the k
th
eigenvector. The inverse
of IPR represents the number of components that contribute signicantly in
the portfolio corresponding to that eigenvector. The IPR spectrum is showed
in gure 2(b). As expected, the value of IPR of the largest eigenvalue is
1/82.5, showing that almost stocks participate. The IPR of the second and
third largest eigenvectors that deviate from RMT are also low, suggesting
that these two eigenvalues may representing two large groups (it could be
sector specic or other reason) of stocks that have a higher degree of corre-
lation. On the other hand, the two smallest eigenvectors have the highest
IPR which show that only a few stocks participate in them, as we have seen
in the previous paragraph. These small eigenvalues are the results of some
particular high correlated pair or triple stocks[10]. It is interesting to note
that the I
k
is higher than that of the average bulk at the high and low ends
of the small eigenvalue bulk (see also [20]), suggesting that there are some
particular correlation structure of small group of stocks yet to be identied.
It is remarkable that the IPR at the high end of the bulk is higher than the
average bulk while their corresponding eigenvalues are around 1 and are still
inside the range predicted by RMT. One may suspect that the eigenvalues
fall within the RMT bound do not need to be pure noise [40].
4.3. One-factor model
In this section, we present simulation result of the one factor model for
the cross-correlation matrix described earlier. Our model takes N, L and
0
as input, and generates the cross-correlation matrix
C and its eigenvalue
spectrum as output. Firstly, we simulate the empirical matrix
C by taking
the same number of N, L and let
0
equal to C
ij
. We present the eigen-
value spectrum of
C in 4 (a) and the inverse participation ratio of random
eigenvalue in 4 (b). As N, L are the same as in Section 4.2, the RMT the-
oretical spectrum is the same and is plotted altogether. Figure 5 displays
11
Figure 3: Components of eigenvectors u
1
, u
2
, u
45
and u
90
. In u
1
the components of
all stocks are positive and relatively uniform, this eigenvector characterizes the market
portfolio. In the smallest u
90
there are only 3 signicant components. These three stocks
are found to have strong correlation with each other. Other two eigenvectors do not show
remarkable characteristic.
12
Figure 4: The largest eigenvalue and the small eigenvalue bulk could be well approximated
to empirical data by eigenvalue spectrum (a) and by the inverse participant ratio (b)
components of some eigenvectors. The eigenvalue statistics of simulated and
empirical data are showed in table 3.
Sample C
ij
max
min
>
+
<
% noise
Real data 0.3664 34.45 0.1874 3 44 0.48
Simulation 0.3712 34.05 0.3461 1 34 0.63
Table 3: Eigenvalue statistics of real and simulated data
We found that the largest eigenvalue of simulated data is closed to that
of empirical data, suggesting that the high magnitude of
max
, the most
important deviation from RMT, could be explained by an unique factor: the
positive average correlation between stocks. The high value of stock number
N will enhanced this eect but is not the original cause. On the other hand,
we found that the small eigenvalue bulk also shifts to the left of the RMT
bounds, as a consequence of the repulsion eect by the largest eigenvalue.
The model predicts about 70% of eigenvalues out the RMT low limit
. The
model also explains the IPR value of the largest and the bulk eigenvector.
However, it does not explain the appearance of few eigenvalues higher than
the RMT high
+
; the component spectrum of eigenvector corresponding to
the smallest eigenvalue (where there is a few high components as discussed
in 4.2; the high IPR number of some eigenvalues at both ends of the small
eigenvalue bulk as shown in gure 2(b). Addition features are needed to
include into the model in order to demonstrate these behaviors.
13
Figure 5: Components of eigenvectors u
1
, u
2
, u
45
of random generated data. Components
of largest u
1
are identical and equal to 1/