Sei sulla pagina 1di 11

Physica A 287 (2000) 429 439

www.elsevier.com/locate/physa

The entropy as a tool for analysing statistical


dependences in nancial time series
Georges A. Darbellaya; b; , Diethelm Wuertzc

a Laboratoire

de traitement des signaux, Ecole polytechnique federale (EPFL),


CH-1015 Lausanne, Switzerland
b Ustav
Pod vodarenskou vez 4,

teorie informace a automatizace, AV CR,


CZ-182 08 Prague, Czech Republic
c Institut fuer theoretische Physik, Eidgenoessische Hochschule (ETH), CH-8093 Zuerich, Switzerland
Received 8 May 2000; received in revised form 9 June 2000

Abstract
The entropy is a concept which may serve to de ne quantities such as the conditional entropy
and the mutual information. Using a novel algorithm for the estimation of the mutual information
from data, we analyse several nancial time series and demonstrate the usefulness of this new
c 2000
approach. The issues of long-range dependence and non-stationarity are discussed.
Elsevier Science B.V. All rights reserved.

1. Introduction
The entropy was introduced in thermodynamics by Clausius in 1865. Later, around
1900, within the framework of statistical physics established by Boltzmann and Gibbs,
it came to be understood as a statistical concept. Around the middle of the 20th century,
it found its way in engineering and mathematics, most notably through the works of
Shannon in communication engineering and of Kolmogorov in probability theory and
dynamical systems theory.
Concepts based on the entropy, such as the conditional entropy or the mutual information, are well suited for studying statistical dependences in time series [13].
By statistical dependences we mean any kind of statistical correlations, not only linear correlations. Traditionally, nonlinear correlations have been studied with higher
order moments (or cumulants). The entropy, in its ability of capturing a stochastic

Correspondence address: Laboratoire de traitement des signaux, Ecole polytechnique federale (EPFL),
CH-1015 Lausanne, Switzerland.

c 2000 Elsevier Science B.V. All rights reserved.


0378-4371/00/$ - see front matter
PII: S 0 3 7 8 - 4 3 7 1 ( 0 0 ) 0 0 3 8 2 - 4

430

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

relationship as a whole, and not necessarily as a sum of moment contributions, opens


another route.
Quite often, statistical studies which use the entropy assume that the variables of interest are discrete, or may be discretised in some straightforward manner (e.g. Ref. [4],
for the case of nancial data). The statistical estimation is obviously far easier when
dealing with data taking discrete values, but this restricts the range of problems to
which this approach may be applied. In many situations it is desirable to address the
continuous case, and this is what we will do in this contribution. This means that we
will work with random variables or processes taking continuous values. These random
variables may be vector-valued. The estimator we will use, for our data analysis in
Section 4, is described in detail in Refs. [2,5 7].
In mathematical nance the use of continuous processes is indeed widespread. The
random walk, and its continuous-time analogue, i.e., Brownian motion, have become
cornerstones of nancial modelling. Such models are based on the assumptions that the
increments of the prices, namely the returns, are independent and that they obey a Gaussian distribution. Such Gaussian processes make it possible to construct an elegant theory, which is today widely used, despite the fact that it is only a rst approximation [8].
Many empirical studies have shown that nancial returns do not follow a Gaussian
process, and the pioneering work of Benoit Mandelbrot in this area is particularly well
known (e.g. Ref. [9]). One way of showing that nancial returns cannot be governed
by a Gaussian process is to look at their scaling law. For a Gaussian process, the
average of the uctuations within a given time interval is proportional to the square root
of that time interval. If p(t) is the price at time t, and rt (t) = log p(t + t)
log p(t) the log-return at time t over some interval t, then one investigates the
scaling law
h|rt (t)|q i t (q)

(1)

with q = 1 and where h:i denotes the time average. For nancial returns, the exponent
(1) is however often di erent from 1=2. The case of the exchange rate between the US
Dollar (USD) and the German Mark (DEM) is shown in Fig. 1. To make matters yet
more complicated, the exponent is not stable over the years. This is illustrated in Fig. 2.
The data are quotes averaged over 30 min periods between October 1992 and May
1997. The volatiliy time, denoted as the upsilon time v, is a monotonous transformation
of the physical time: the period is shorter than 30 min when the volatility is high and
longer than 30 min when the volatiliy is low (e.g. during the weekends) [10]. This
kind of intrinsic time, to which we come back in Section 4, is also referred to as the
operational time, since heavy trading results in higher activity [11,12]. As we can see,
in both time scales the exponent is not 1=2.
The exponent  in the scaling law depends on the power q 0 to which the absolute
returns have been raised. For certain processes it is possible to derive the relation
between  and q [13]. For Levy processes of index 0 62, we have (q) = q= if
q and (q) = 1 if q . Gaussian processes correspond to the case = 2. Thus,

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

431

Fig. 1. Double-logarithmic plot of the mean absolute return versus the time interval t, over which the
return was calculated (see Eq. (1)).

some data generating process can only be a Levy process if the last relation is true for
all values of q.
The scaling analysis, as outlined above, leads to the conclusion that the central limit
theorem (CLT) does not apply to nancial returns. This conclusion may also be reached
by looking at histograms and realising that the returns follow heavy-tailed distributions.
To understand why the CLT is inapplicable, it is important to check whether the returns
are sequentially independent or not. Studies of the scaling behaviour, however, do not
provide much of an answer about the (in)dependence of the increments. Their lack of
independence is usually studied by looking at the linear correlations of some power
q of the absolute (log-)returns. In virtually all such investigations, with the exception
of Ref. [14], only the powers q = 1; 2 have been analysed (e.g. Refs. [12,15]). Again,
such analyses remain incomplete unless a whole range of q-values is investigated.
The approach based on the entropy proceeds along a more direct avenue. It does
not require to consider all powers q. In the next section, we provide some theoretical
background on our approach. The word prediction is obviously connected to the fact
that dependence means predictability. In Section 3 we consider the limiting case of
linear prediction theory. Two nancial time series are analysed in Section 4, and our
conclusions are summarised in Section 5.

432

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

Fig. 2. Values of the scaling exponent (1) = 1=E calculated over a rolling 12 months window, which is
shifted by 2 weeks for each new value.

2. Entropy and statistical dependences


2.1. Entropy and information
The BoltzmannGibbsShannon entropy h(X ) of a continuous random variable X
with probability density function fX , and taking its values in Rd , is
Z
(2)
h(X ) = fX (x) log fX (x) d x :
It has the advantage, over some other types of entropies, of satisfying many desirable properties, which were found to be important for a great variety of physical or
engineering systems. Before proceeding further, we note that the integration in (2)
is carried over the whole support of the function fX , and that log denotes the logarithm, which will be taken to be the natural logarithm. As a result the entropy will
be measured in nats. If instead of using the base e one uses the base 2, then the
entropy will be expressed in bits. It is straightforward to transform one into the other:
h[nats]=ln(2) h[bits], where ln =loge . The entropy (2) is called the di erential entropy
and it shares many properties with the entropy of a discrete random variable but not
all of them [16,17]. In particular, it is not necessarily positive and it is not invariant
under a one-to-one transformation of the random variable X .

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

433

Consider now a second random variable Y with probability density fY . The joint
probability density of X and Y will be fX; Y , and the joint entropy of X and Y is
de ned as
Z Z
(3)
h(X; Y ) =
fX; Y (x; y) log fX; Y (x; y) d x dy :
Again the integration is done over the whole support of the function fX; Y . The di erence
Z Z
fX; Y (x; y)
d x dy
(4)
h(Y |X ) = h(X; Y ) h(X ) =
fX; Y (x; y) log
fX (x)
de nes the conditional entropy. Similarly h(X |Y ) = h(X; Y ) h(Y ). The di erence
I (X ; Y ) = h(Y ) h(Y |X )
= h(X ) + h(Y ) h(X; Y )
Z Z
fX; Y (x; y)
d x dy
=
fX; Y (x; y) log
fX (x) fY (y)

(5)

is the mutual information between X and Y . It is a measure of the dependence between


X and Y . It satis es
I (X ; Y )0

(6)

with equality if and only if X and Y are independent. This follows from the inequality,
log u6u 1 for u R0 . Equality in the previous inequation and in (6) is achieved
if and only if, respectively, u = 1 or fX; Y (x; y) = fX (x) fY (y) x; y. If X and Y
are independent, then from (5) and (6), we have h(Y |X ) = h(Y ). Note that, in the
(in)equations above, X and Y may be understood as random vectors, i.e., vectors of
random variables.
2.2. Estimation from data
The diculty in calculating the mutual information from empirical data lies in the
fact that the relevant probability density functions are unknown. One standard way
is to approximate the densities by means of histograms. However, imposing some
arbitrary histogram will not do. In general, it will lead to gross underestimation or
overestimation, depending on the particular distribution governing the data set. What
we need is an adaptive histogram, i.e., a histogram that is able to adapt itself to any
(joint) probability density as well as possible. Fortunately, there is a general de nition
of the mutual information based on partitions, and this provides a way of building an
adaptive histogram.
A ( nite) partition of Rd is any nite system of non-intersecting subsets of Rd ,
whose sum is the whole of Rd . The subsets are often called the cells of the partition.
For obvious reasons, in practice one works with rectangular cells. In other words, the
cells are hyperrectangles in Rd . We will denote them as Ck = Ak Bk , where Ak is
the orthogonal projection of Ck onto the space where X takes its values, and Bk the

434

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

projection of Ck onto the space where Y takes its values. It can be shown that the
mutual information is a supremum over partitions [18],
I (X ; Y ) = sup

{Ck }

X
k

PX; Y (Ck ) ln

PX; Y (Ck )
:
PX (Ak )PY (Bk )

(7)

Here, {Ck } denotes a partition made of cells Ck , PX; Y (Ck ) is the probability that the
pair (X; Y ) takes its values in the cell Ck , PX (Ak ) the probability that X takes its
values in Ak and PY (Bk ) the probability that Y takes its values in Bk . It can also be
shown that by constructing a sequence of ner and ner partitions, the corresponding
sequence of mutual informations will monotonously increase. It will stop increasing
when conditional independence is achieved on all cells of the partition. Thus, by testing
for independence we can decide when to stop a (recursive) partitioning scheme. As a
criterion one may use any independence test, e.g. the X2 statistic. Full details, including
the computer code, are to be found in Refs. [2,5 7].

3. Application to nancial time series


For a long time, most economists considered nancial returns to be independent, at
least for all practical purposes. Then, during the course of the 1980s, it became widely
known that the (linear) autocorrelations of their squares or their absolute values show
some kind of long-range dependence e ect (for a review, see e.g. Refs. [19]). Finding
an explanation for this e ect is not easy, and we will return to it below.
On the other hand, the (linear) autocorrelations of the returns are not statistically
di erent from zero, except possibly at very short time lags. Most observers, though not
all of them, regard this putative dependence as insucient for making riskless pro ts,
and would thus not try to beat the market by using such autocorrelations.
We will now demonstrate that the mutual information provides a concise approach to
the question of dependences in the returns, and in their volatilities. Two time series will
be considered. The rst one is formed by the log-returns of the exchange rate between
the US Dollar (USD) and the German Mark (DEM). This time series, which contains
more than 81 788 points recorded at intervals of approximately 30 min, extends from
October 1992 to May 1997. The second time series is made of the log-returns of the
Dow Jones (DJ) industrial stock index from the New York Stock Exchange. These 24
277 daily data records cover a period from 1901 to 1998.
For the exchange rate time series, we have used, as in Section 1, the v-time, whose
clock runs faster during periods of high volatility [10]. High volatile market periods are
enlarged and less volatile market periods are shortened. This type of transformation,
whose aim is to reduce market seasonalities, was introduced by the Olsen group [11],
who used a yearly averaged time scale to de ne the so-called -time. The closely
related v-time is de ned by means of a weekly averaged time scale instead, and its
calculation is simpler [10].

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

435

Fig. 3. Mutual information functions hI (r(t); r(t ))i (full line) and hI (|r(t)|; |r(t )|)i (dashed line) as
a function of the time lag . r(t) denote the returns of the USDDEM foreign exchange rate and h:i the
averaging over t. The 336 lags cover one week. The two functions are virtually undistinguishable.

Since we have a single series of measurements across time, the mutual information
is calculated as a time average. The interpretation of such statistical estimates depends
on whether or not the time series is stationary and/or ergodic. Fig. 3 shows two autoinformation curves for the USDDEM returns. The rst curve (full line) is the mutual
information hI (r(t); r(t ))i, where  is the time lag and r(t) the return at time t. The
second curve (dashed line), hI (|r(t)|; |r(t )|)i, is virtually identical to the rst curve,
except maybe for the lag 1. The periodicity is caused by the remaining seasonality.
There are 10 daily peaks during the 2 weeks (the weekends have been washed out
as there is virtually no market activity during Saturdays and Sundays). We recall that
the mutual information is invariant with respect to bijective transformations (unlike the
linear correlation which is invariant with respect to linear transformations only). As a
result,
hI (|r(t)|q ; |r(t )|q )i = hI (|r(t)|; |r(t )|)i for all q 0 :

(8)

We can thus conclude that, except maybe for the rst lag, all the information about
the present returns contained in some past returns is carried by the amplitudes of the
returns and not by their signs. The same conclusion applies to the Dow Jones returns,
except for a small di erence between hI (r(t); r(t ))i and hI (|r(t)|; |r(t )|)i for
the rst ve days ( = 1; : : : ; 5).
Are the values of hI (r(t); r(t ))i and hI (|r(t)|; |r(t )|)i stable through time (windows)? To check this we recalculated these time averages over a rolling window, both
for the USDDEM as well as for the Dow Jones time series. If this is done for a xed
lag  one obtains a graph which is similar in spirit to Fig. 2 or Fig. 7. In other words,
the mutual information is not stable through time. This fact supports the view that
these time series are not stationary. This could also explain the long-range dependence
that can be seen in Fig. 3 or Fig. 4: the decline of the autoinformation functions is

436

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

Fig. 4. Mutual informations hI (s(t); s(t ))i as a function of the time lag , where the s(t) is the volatility
of the Dow Jones stock index returns.

not exponential but hyperbolic. One can construct stationary processes with long-range
dependence, by means of the so-called fractional processes. However, non-stationarity
may also cause long-range dependence. As a simple illustration, consider a stationary
time series with a low dispersion (which could be measured by the variance) and another stationary time series with a high dispersion. Let each series have a length N=2.
We assume them to obey one of these numerous processes with exponential decline
in the (linear) autocorrelation function or the autoinformation function (e.g. some autoregressive process). Now, append the two processes together and consider them as a
single process. Obviously, under the condition that the lag separating two data points
is suciently smaller than N , the data points belonging to the part with low variance
will often be paired, and those belonging to the part with high variance also. This will
induce long-range dependence.
The volatility is a measure of the variability of a random variable through time. Any
power q of the absolute returns could do. In our case one value suces, see (8), and
we will use q = 1. We thus consider
sm (t) =

1
m

t
X

|r(i)| ;

(9)

i=tm+1

where m is the length of the window for the calculation of the volatility. The mutual
information function displays a very clear hyperbolic decline. The case of the Dow
Jones stock index is shown in Fig. 4. A curve with a similar shape was obtained for
the USDDEM exchange rate. However, as for the returns, the mutual information
function is not stable through time windows. At this stage one may ask whether the
dependence in the volatility could be of any help in predicting the returns. To this end
we considered the mutual information between, on one hand, the return at time t, and,
on the other hand, the return at time t 1 and the volatility at time t 1. In Figs. 5

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

437

Fig. 5. Mutual informations hI (r(t); r(t 1); sm (t 1))i (full line), hI (|r(t)|; |r(t 1)|; sm (t 1))i (dashed
line), hI (r(t); sm (t 1))i (dashdotted line) and hI (|r(t)|; sm (t 1))i (dotted line) for the USDDEM foreign
exchange. They are displayed as the a function of the length m of the window for calculating the volatility.

Fig. 6. Same as Fig. 5 but for the Dow Jones stock index.

and 6 we show hI (r(t); r(t 1); sm (t 1))i (full line) and hI (|r(t)|; |r(t 1)|; sm (t 1))i
(dashed line) as function of the length of the volatility window m. It can be seen that
the two curves track each other quite well. So, there is again very little information that
could be exploited for predicting the sign of the return. The quantities hI (r(t); sm (t1))i
(dashdotted line) and hI (|r(t)|; sm (t 1))i (dotted line) are also shown. For the USD
DEM time series, the inclusion of the volatility as a second input does improve the
predictability of the absolute value of the returns. For the Dow Jones time series, this
e ect is very small.

438

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

Fig. 7. Evolution of the mutual information hI (r(t); r(t 1); sm (t 1))i, with m = 12, over a one year rolling
window, for the USDDEM time series.

In Fig. 7 the behaviour of hI (r(t); r(t 1); sm (t 1))i, with m = 12, over a one
year rolling window is displayed. A similar picture was obtained for the Dow Jones
time series. The mutual information is not stable through time. It may even sometimes
change quite drastically over some fairly short period.

4. Conclusions
We demonstrated that ideas revolving around the entropy may be sensibly applied to
nancial time series. The decisive advantage of this approach resides in its ability to
account for nonlinear dependences. Above, we have illustrated the method by answering
the following questions:
Are the returns statistically independent? No. Is there any information in the signs
of the returns? No, all the information is contained in their absolute values, except
possibly for very short time lags.
Are the volatilities statistically dependent? Yes. Can the volatilities help predict
the returns? No, except possibly for very short time lags, but the e ect is so small
that it is probably useless.
Are the (long-range) dependences in the absolute values of the returns or in the
volatilities stable over time windows? No, and a partial explanation of this seems
to be that nancial time series show some kind of non-stationarity.
Concepts and methods of statistical physics are increasingly being applied to economics,
to the point that a new word was coined, econophysics [20]. The process need not be
one-directional. It is worth noting that power-law scaling or Brownian motion were in
fact known in economics before they appeared in physics.

G.A. Darbellay, D. Wuertz / Physica A 287 (2000) 429 439

439

References
[1] H.-P. Bernhard, A tight upper bound on the gain of linear and nonlinear predictors for stationary
stochastic processes, IEEE Trans. Signal Process. 46 (1998) 29092917.
[2] G.A. Darbellay, Predictability: an information-theoretic perspective, in: A. Prochazka, J. Uhlr,
P.J.W. Rayner, N.G. Kingsbury (Eds.), Signal Analysis and Prediction, Birkhauser, Boston, 1998,
pp. 249262.
[3] W. Ebeling, J. Freund, F. Schweitzer, Komplexe Strukturen: Entropie und Information, Teubner,
Stuttgart, 1998.
[4] L. Molgedey, W. Ebeling, Local order, entropy and predictability of nancial time series, Eur. Phys. J.
B 15 (2000) 733.
[5] G.A. Darbellay, An estimator of the mutual information based on a criterion for independence, Comput.
Statist. Data Anal. 32 (1999) 117.
[6] G.A. Darbellay, J. Franek, http://siprint.utia.cas.cz/timeseries/.
[7] G.A. Darbellay, I. Vajda, Estimation of the information by an adaptive partitioning of the observation
space, IEEE Trans. Inform. Theory 45 (1999) 13151321.
[8] J.-P. Bouchaud, M. Potters, Theorie des Risques Financiers, Alea, Saclay, 1997.
[9] B. Mandelbrot, A Multifractal Walk down Wall Street, Sci. Am. (February 1999) 50 53.
[10] R. Schnidrig, D. Wuertz, Investigation of the volatiliy and autocorrelation function of the exchange rate
on operational time scales, ETH research report No. 95-04 (1995).
[11] M.M. Dacorogna, U.A. Mueller, R.J. Nagler, R.B. Olsen, O.V. Pictet, A geographical model for the
daily and weekly seasonal volatilities in the FX market, J. Int. Money Finance 12 (1993) 413438.
[12] S. Gallucio, G. Caldarelli, M. Marsili, Y.C. Zhang, Scaling in currency exchange, Physica A 245 (1997)
423436.
[13] F. Schmitt, D. Schertzer, S. Lovejoy, Multifractal analysis of foreign exchange data, Appl. Stochastic
Models Data Anal. 15 (1999) 2953.
[14] Z. Ding, C.W.J. Granger, R.F. Engle, A long memory property of stock market returns and a new
model, J. Empirical Finance 1 (1993) 83106.
[15] Y. Liu, P. Cizeau, M. Meyer, C.K. Peng, H.E. Stanley, Correlations in economic time series, Physica A
245 (1997) 437440.
[16] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
[17] N.S. Jayant, P. Noll, Digital Coding of Waveforms, Prentice-Hall, Englewoods Cli s, NJ, 1984.
[18] R.L. Dobrushin, General formulation of Shannons main theorem in information theory, Uspekhi Mat.
Nauk 14 (1959) 3104 (in Russian). Translated in Am. Math. Soc. Trans. 33 (1959) 323 438.
[19] J.Y. Campbell, A.W. Lo, A. Craig MacKinlay, The Econometrics of Financial Markets, Princeton
Universtity Press, Princeton, NJ, 1997.
[20] J.D. Farmer, Physicists attempt to scale the ivory towers of nance, Comput. Sci. Eng. (November/
December 1999) 2639.

Potrebbero piacerti anche