Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
www.elsevier.com/locate/physa
a Laboratoire
Abstract
The entropy is a concept which may serve to dene quantities such as the conditional entropy
and the mutual information. Using a novel algorithm for the estimation of the mutual information
from data, we analyse several nancial time series and demonstrate the usefulness of this new
c 2000
approach. The issues of long-range dependence and non-stationarity are discussed.
Elsevier Science B.V. All rights reserved.
1. Introduction
The entropy was introduced in thermodynamics by Clausius in 1865. Later, around
1900, within the framework of statistical physics established by Boltzmann and Gibbs,
it came to be understood as a statistical concept. Around the middle of the 20th century,
it found its way in engineering and mathematics, most notably through the works of
Shannon in communication engineering and of Kolmogorov in probability theory and
dynamical systems theory.
Concepts based on the entropy, such as the conditional entropy or the mutual information, are well suited for studying statistical dependences in time series [13].
By statistical dependences we mean any kind of statistical correlations, not only linear correlations. Traditionally, nonlinear correlations have been studied with higher
order moments (or cumulants). The entropy, in its ability of capturing a stochastic
Correspondence address: Laboratoire de traitement des signaux, Ecole polytechnique federale (EPFL),
CH-1015 Lausanne, Switzerland.
430
(1)
with q = 1 and where h:i denotes the time average. For nancial returns, the exponent
(1) is however often dierent from 1=2. The case of the exchange rate between the US
Dollar (USD) and the German Mark (DEM) is shown in Fig. 1. To make matters yet
more complicated, the exponent is not stable over the years. This is illustrated in Fig. 2.
The data are quotes averaged over 30 min periods between October 1992 and May
1997. The volatiliy time, denoted as the upsilon time v, is a monotonous transformation
of the physical time: the period is shorter than 30 min when the volatility is high and
longer than 30 min when the volatiliy is low (e.g. during the weekends) [10]. This
kind of intrinsic time, to which we come back in Section 4, is also referred to as the
operational time, since heavy trading results in higher activity [11,12]. As we can see,
in both time scales the exponent is not 1=2.
The exponent in the scaling law depends on the power q 0 to which the absolute
returns have been raised. For certain processes it is possible to derive the relation
between and q [13]. For Levy processes of index 0 62, we have (q) = q= if
q and (q) = 1 if q. Gaussian processes correspond to the case = 2. Thus,
431
Fig. 1. Double-logarithmic plot of the mean absolute return versus the time interval t, over which the
return was calculated (see Eq. (1)).
some data generating process can only be a Levy process if the last relation is true for
all values of q.
The scaling analysis, as outlined above, leads to the conclusion that the central limit
theorem (CLT) does not apply to nancial returns. This conclusion may also be reached
by looking at histograms and realising that the returns follow heavy-tailed distributions.
To understand why the CLT is inapplicable, it is important to check whether the returns
are sequentially independent or not. Studies of the scaling behaviour, however, do not
provide much of an answer about the (in)dependence of the increments. Their lack of
independence is usually studied by looking at the linear correlations of some power
q of the absolute (log-)returns. In virtually all such investigations, with the exception
of Ref. [14], only the powers q = 1; 2 have been analysed (e.g. Refs. [12,15]). Again,
such analyses remain incomplete unless a whole range of q-values is investigated.
The approach based on the entropy proceeds along a more direct avenue. It does
not require to consider all powers q. In the next section, we provide some theoretical
background on our approach. The word prediction is obviously connected to the fact
that dependence means predictability. In Section 3 we consider the limiting case of
linear prediction theory. Two nancial time series are analysed in Section 4, and our
conclusions are summarised in Section 5.
432
Fig. 2. Values of the scaling exponent (1) = 1=E calculated over a rolling 12 months window, which is
shifted by 2 weeks for each new value.
433
Consider now a second random variable Y with probability density fY . The joint
probability density of X and Y will be fX; Y , and the joint entropy of X and Y is
dened as
Z Z
(3)
h(X; Y ) =
fX; Y (x; y) log fX; Y (x; y) d x dy :
Again the integration is done over the whole support of the function fX; Y . The dierence
Z Z
fX; Y (x; y)
d x dy
(4)
h(Y |X ) = h(X; Y ) h(X ) =
fX; Y (x; y) log
fX (x)
denes the conditional entropy. Similarly h(X |Y ) = h(X; Y ) h(Y ). The dierence
I (X ; Y ) = h(Y ) h(Y |X )
= h(X ) + h(Y ) h(X; Y )
Z Z
fX; Y (x; y)
d x dy
=
fX; Y (x; y) log
fX (x) fY (y)
(5)
(6)
with equality if and only if X and Y are independent. This follows from the inequality,
log u6u 1 for u R0 . Equality in the previous inequation and in (6) is achieved
if and only if, respectively, u = 1 or fX; Y (x; y) = fX (x) fY (y) x; y. If X and Y
are independent, then from (5) and (6), we have h(Y |X ) = h(Y ). Note that, in the
(in)equations above, X and Y may be understood as random vectors, i.e., vectors of
random variables.
2.2. Estimation from data
The diculty in calculating the mutual information from empirical data lies in the
fact that the relevant probability density functions are unknown. One standard way
is to approximate the densities by means of histograms. However, imposing some
arbitrary histogram will not do. In general, it will lead to gross underestimation or
overestimation, depending on the particular distribution governing the data set. What
we need is an adaptive histogram, i.e., a histogram that is able to adapt itself to any
(joint) probability density as well as possible. Fortunately, there is a general denition
of the mutual information based on partitions, and this provides a way of building an
adaptive histogram.
A (nite) partition of Rd is any nite system of non-intersecting subsets of Rd ,
whose sum is the whole of Rd . The subsets are often called the cells of the partition.
For obvious reasons, in practice one works with rectangular cells. In other words, the
cells are hyperrectangles in Rd . We will denote them as Ck = Ak Bk , where Ak is
the orthogonal projection of Ck onto the space where X takes its values, and Bk the
434
projection of Ck onto the space where Y takes its values. It can be shown that the
mutual information is a supremum over partitions [18],
I (X ; Y ) = sup
{Ck }
X
k
PX; Y (Ck ) ln
PX; Y (Ck )
:
PX (Ak )PY (Bk )
(7)
Here, {Ck } denotes a partition made of cells Ck , PX; Y (Ck ) is the probability that the
pair (X; Y ) takes its values in the cell Ck , PX (Ak ) the probability that X takes its
values in Ak and PY (Bk ) the probability that Y takes its values in Bk . It can also be
shown that by constructing a sequence of ner and ner partitions, the corresponding
sequence of mutual informations will monotonously increase. It will stop increasing
when conditional independence is achieved on all cells of the partition. Thus, by testing
for independence we can decide when to stop a (recursive) partitioning scheme. As a
criterion one may use any independence test, e.g. the X2 statistic. Full details, including
the computer code, are to be found in Refs. [2,5 7].
435
Fig. 3. Mutual information functions hI (r(t); r(t ))i (full line) and hI (|r(t)|; |r(t )|)i (dashed line) as
a function of the time lag . r(t) denote the returns of the USDDEM foreign exchange rate and h:i the
averaging over t. The 336 lags cover one week. The two functions are virtually undistinguishable.
Since we have a single series of measurements across time, the mutual information
is calculated as a time average. The interpretation of such statistical estimates depends
on whether or not the time series is stationary and/or ergodic. Fig. 3 shows two autoinformation curves for the USDDEM returns. The rst curve (full line) is the mutual
information hI (r(t); r(t ))i, where is the time lag and r(t) the return at time t. The
second curve (dashed line), hI (|r(t)|; |r(t )|)i, is virtually identical to the rst curve,
except maybe for the lag 1. The periodicity is caused by the remaining seasonality.
There are 10 daily peaks during the 2 weeks (the weekends have been washed out
as there is virtually no market activity during Saturdays and Sundays). We recall that
the mutual information is invariant with respect to bijective transformations (unlike the
linear correlation which is invariant with respect to linear transformations only). As a
result,
hI (|r(t)|q ; |r(t )|q )i = hI (|r(t)|; |r(t )|)i for all q 0 :
(8)
We can thus conclude that, except maybe for the rst lag, all the information about
the present returns contained in some past returns is carried by the amplitudes of the
returns and not by their signs. The same conclusion applies to the Dow Jones returns,
except for a small dierence between hI (r(t); r(t ))i and hI (|r(t)|; |r(t )|)i for
the rst ve days ( = 1; : : : ; 5).
Are the values of hI (r(t); r(t ))i and hI (|r(t)|; |r(t )|)i stable through time (windows)? To check this we recalculated these time averages over a rolling window, both
for the USDDEM as well as for the Dow Jones time series. If this is done for a xed
lag one obtains a graph which is similar in spirit to Fig. 2 or Fig. 7. In other words,
the mutual information is not stable through time. This fact supports the view that
these time series are not stationary. This could also explain the long-range dependence
that can be seen in Fig. 3 or Fig. 4: the decline of the autoinformation functions is
436
Fig. 4. Mutual informations hI (s(t); s(t ))i as a function of the time lag , where the s(t) is the volatility
of the Dow Jones stock index returns.
not exponential but hyperbolic. One can construct stationary processes with long-range
dependence, by means of the so-called fractional processes. However, non-stationarity
may also cause long-range dependence. As a simple illustration, consider a stationary
time series with a low dispersion (which could be measured by the variance) and another stationary time series with a high dispersion. Let each series have a length N=2.
We assume them to obey one of these numerous processes with exponential decline
in the (linear) autocorrelation function or the autoinformation function (e.g. some autoregressive process). Now, append the two processes together and consider them as a
single process. Obviously, under the condition that the lag separating two data points
is suciently smaller than N , the data points belonging to the part with low variance
will often be paired, and those belonging to the part with high variance also. This will
induce long-range dependence.
The volatility is a measure of the variability of a random variable through time. Any
power q of the absolute returns could do. In our case one value suces, see (8), and
we will use q = 1. We thus consider
sm (t) =
1
m
t
X
|r(i)| ;
(9)
i=tm+1
where m is the length of the window for the calculation of the volatility. The mutual
information function displays a very clear hyperbolic decline. The case of the Dow
Jones stock index is shown in Fig. 4. A curve with a similar shape was obtained for
the USDDEM exchange rate. However, as for the returns, the mutual information
function is not stable through time windows. At this stage one may ask whether the
dependence in the volatility could be of any help in predicting the returns. To this end
we considered the mutual information between, on one hand, the return at time t, and,
on the other hand, the return at time t 1 and the volatility at time t 1. In Figs. 5
437
Fig. 5. Mutual informations hI (r(t); r(t 1); sm (t 1))i (full line), hI (|r(t)|; |r(t 1)|; sm (t 1))i (dashed
line), hI (r(t); sm (t 1))i (dashdotted line) and hI (|r(t)|; sm (t 1))i (dotted line) for the USDDEM foreign
exchange. They are displayed as the a function of the length m of the window for calculating the volatility.
Fig. 6. Same as Fig. 5 but for the Dow Jones stock index.
and 6 we show hI (r(t); r(t 1); sm (t 1))i (full line) and hI (|r(t)|; |r(t 1)|; sm (t 1))i
(dashed line) as function of the length of the volatility window m. It can be seen that
the two curves track each other quite well. So, there is again very little information that
could be exploited for predicting the sign of the return. The quantities hI (r(t); sm (t1))i
(dashdotted line) and hI (|r(t)|; sm (t 1))i (dotted line) are also shown. For the USD
DEM time series, the inclusion of the volatility as a second input does improve the
predictability of the absolute value of the returns. For the Dow Jones time series, this
eect is very small.
438
Fig. 7. Evolution of the mutual information hI (r(t); r(t 1); sm (t 1))i, with m = 12, over a one year rolling
window, for the USDDEM time series.
In Fig. 7 the behaviour of hI (r(t); r(t 1); sm (t 1))i, with m = 12, over a one
year rolling window is displayed. A similar picture was obtained for the Dow Jones
time series. The mutual information is not stable through time. It may even sometimes
change quite drastically over some fairly short period.
4. Conclusions
We demonstrated that ideas revolving around the entropy may be sensibly applied to
nancial time series. The decisive advantage of this approach resides in its ability to
account for nonlinear dependences. Above, we have illustrated the method by answering
the following questions:
Are the returns statistically independent? No. Is there any information in the signs
of the returns? No, all the information is contained in their absolute values, except
possibly for very short time lags.
Are the volatilities statistically dependent? Yes. Can the volatilities help predict
the returns? No, except possibly for very short time lags, but the eect is so small
that it is probably useless.
Are the (long-range) dependences in the absolute values of the returns or in the
volatilities stable over time windows? No, and a partial explanation of this seems
to be that nancial time series show some kind of non-stationarity.
Concepts and methods of statistical physics are increasingly being applied to economics,
to the point that a new word was coined, econophysics [20]. The process need not be
one-directional. It is worth noting that power-law scaling or Brownian motion were in
fact known in economics before they appeared in physics.
439
References
[1] H.-P. Bernhard, A tight upper bound on the gain of linear and nonlinear predictors for stationary
stochastic processes, IEEE Trans. Signal Process. 46 (1998) 29092917.
[2] G.A. Darbellay, Predictability: an information-theoretic perspective, in: A. Prochazka, J. Uhlr,
P.J.W. Rayner, N.G. Kingsbury (Eds.), Signal Analysis and Prediction, Birkhauser, Boston, 1998,
pp. 249262.
[3] W. Ebeling, J. Freund, F. Schweitzer, Komplexe Strukturen: Entropie und Information, Teubner,
Stuttgart, 1998.
[4] L. Molgedey, W. Ebeling, Local order, entropy and predictability of nancial time series, Eur. Phys. J.
B 15 (2000) 733.
[5] G.A. Darbellay, An estimator of the mutual information based on a criterion for independence, Comput.
Statist. Data Anal. 32 (1999) 117.
[6] G.A. Darbellay, J. Franek, http://siprint.utia.cas.cz/timeseries/.
[7] G.A. Darbellay, I. Vajda, Estimation of the information by an adaptive partitioning of the observation
space, IEEE Trans. Inform. Theory 45 (1999) 13151321.
[8] J.-P. Bouchaud, M. Potters, Theorie des Risques Financiers, Alea, Saclay, 1997.
[9] B. Mandelbrot, A Multifractal Walk down Wall Street, Sci. Am. (February 1999) 50 53.
[10] R. Schnidrig, D. Wuertz, Investigation of the volatiliy and autocorrelation function of the exchange rate
on operational time scales, ETH research report No. 95-04 (1995).
[11] M.M. Dacorogna, U.A. Mueller, R.J. Nagler, R.B. Olsen, O.V. Pictet, A geographical model for the
daily and weekly seasonal volatilities in the FX market, J. Int. Money Finance 12 (1993) 413438.
[12] S. Gallucio, G. Caldarelli, M. Marsili, Y.C. Zhang, Scaling in currency exchange, Physica A 245 (1997)
423436.
[13] F. Schmitt, D. Schertzer, S. Lovejoy, Multifractal analysis of foreign exchange data, Appl. Stochastic
Models Data Anal. 15 (1999) 2953.
[14] Z. Ding, C.W.J. Granger, R.F. Engle, A long memory property of stock market returns and a new
model, J. Empirical Finance 1 (1993) 83106.
[15] Y. Liu, P. Cizeau, M. Meyer, C.K. Peng, H.E. Stanley, Correlations in economic time series, Physica A
245 (1997) 437440.
[16] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
[17] N.S. Jayant, P. Noll, Digital Coding of Waveforms, Prentice-Hall, Englewoods Clis, NJ, 1984.
[18] R.L. Dobrushin, General formulation of Shannons main theorem in information theory, Uspekhi Mat.
Nauk 14 (1959) 3104 (in Russian). Translated in Am. Math. Soc. Trans. 33 (1959) 323 438.
[19] J.Y. Campbell, A.W. Lo, A. Craig MacKinlay, The Econometrics of Financial Markets, Princeton
Universtity Press, Princeton, NJ, 1997.
[20] J.D. Farmer, Physicists attempt to scale the ivory towers of nance, Comput. Sci. Eng. (November/
December 1999) 2639.