Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Department of Economics,
Mathematics and Statistics
Graduate Certi…cates and Diplomas
Economics, Finance, Financial Engineering;
BSc FE, ESP.
2009-2010
Ron Smith
Email R.Smith@bbk.ac.uk
CONTENTS
PART I: COURSE INFORMATION
1. Aims, readings and approach
2 Class Exercises
3. Assessment
4. How to do your project
PART II: NOTES
5. Introduction
6. Descriptive Statistics
7. Economic and Financial Data I: Numbers
8. Applied Exercise I: Ratios and descriptive statistics
9. Index Numbers
10. Probability
11. Discrete Random Variables
12. Continuous Random Variables
13. Economic and Financial Data II: Interest and other rates
14. Applied Exercise II: Sampling distributions
15. Estimation
16. Con…dence Intervals and Hypothesis Tests for the mean
17. Bivariate Least Squares Regression
18. Matrix Algebra & Multiple Regression
19. Properties of Least Squares estimates
20. Regression Con…dence Intervals and Tests
21. Economic and Financial Data III: Relationships
22. Applied Exercise III: Running regressions
23. Dynamics
24. Additional matrix results
25. Index
2
1. PART I: Course Information
1.1. Aims
Economists have been described as people who are good with numbers but not
creative enough to be accountants. This course is designed to ensure that you
are good with numbers; that you can interpret and analyse economic and …nan-
cial data and develop a critical awareness of some of the pitfalls in collecting,
presenting and using data. Doing applied work involves a synthesis of various
elements. You must be clear about why you are doing it: what the purpose of
the exercise is (e.g. forecasting, policy making, choosing a portfolio of stocks,
answering a particular question or testing a hypothesis). You must understand
the characteristics of the data you are using and appreciate their weaknesses. You
must use theory to provide a model of the process that may have generated the
data. You must know the statistical methods, which rely on probability theory,
to summarise the data, e.g. in estimates. You must be able to use the software,
e.g. spreadsheets, that will produce the estimates. You must be able to interpret
the statistics or estimates in terms of your original purpose and the theory. Thus
during this course we will be moving backwards and forwards between these ele-
ments: purpose, data, theory and statistical methods. It may seem that we are
jumping about, but you must learn to do all these di¤erent things together.
Part I of this booklet provides background information: reading lists; details of
assessment (70% exam, 30% project) and instructions on how to do your project.
Part II provides a set of notes. These include notes on the lectures, notes on
economic and …nancial data, and applied exercises.
Not all the material in this booklet will be covered explicitly in lectures, par-
ticularly the sections on economic and …nancial data. But you should be familiar
with that material. Lots of the worked examples are based on old exam questions.
Sections labelled background contain material that will not be on the exam. If
you have questions about these sections raise them in lectures or classes. If you
…nd any mistakes in this booklet please tell me. Future cohorts of students will
thank you.
3
1. Introduction
2. Descriptive Statistics
3. Index Numbers
4. Probability
5. Random Variables
SPRING
1. The normal and related distributions
2. Estimation
3. Con…dence Intervals and Hypothesis Tests
4. Bivariate Least Squares Regression
5. Matrix Algebra & Multiple Regression
6. Properties of Least Squares estimates
7. Tests for regressions
8. Dynamics
9. Applications
10. Revison
Tutorial Classes run through the spring term, doing the exercises in
section 2.
The sections in the notes on Economic and Financial Data and Applied Exer-
cises, will be used for examples at various points in the lectures. You should work
through them, where they come in the sequence in the notes. This material will
be useful for class exercises, exam questions and your project.
Explain how measures of economic and …nancial variables such as GDP, un-
employment and index numbers such as the RPI and FTSE are constructed,
be aware of the limitations of the data and be able to calculate derived sta-
tistics from the data, e.g. ratios, growth rates, real interest rates etc.
4
Explain the basic principles of estimation and hypothesis testing.
Read and understand articles using economic and …nancial data at the level
of the FT or Economist.
Conduct and report on a piece of empirical research that uses simple statis-
tical techniques.
Get familiar with economic and …nancial data by reading newspapers (the
FT is best, but Sunday Business sections are good), The Economist, etc.
In looking at articles note how they present Tables and Graphs; what data
they use; how they combine the data with the analysis; how they structure
the article. You will need all these skills, so learn them by careful reading.
Try to attend all lectures and classes, if you have to miss them make sure
that you know what they covered and get copies of notes from other students.
Do the exercises for the classes in the Spring term in advance. Continuously
review the material in lectures, classes and these notes, working in groups
if you can.
Identify gaps in your knowledge and take action to …ll them, by asking
questions of lecturers or class teachers and by searching in text books. We
are available to answer questions during o¢ ce hours (posted on our doors)
or by email.
5
Do the applied exercise (section 8 of the notes) during the …rst term. We
will assume that you have done it and base exam questions on it.
Start work on your project early in the second term, advice on this is in
section 4.
1.5. Reading
There are a large number of good text books on introductory statistics, but none
that exactly match the structure of this course. This is because we cover in one
year material that is usually spread over three years of an undergraduate degree:
economic and …nancial data in the …rst year, statistics in the second year, and
econometrics in the third year. Use the index in the text book to …nd the topics
covered in this course.
These notes cross-reference introductory statistics to Barrow (2009) and the
econometrics and more advanced statistics to Verbeek (2008). This is one of the
books that is used on the MSc in Economics econometrics course. There are a
large number of other similar books, such as Gujarati and Porter (2009) and Stock
and Watson (2009).
There are a range of interesting background books on probability and sta-
tistics. The history of probability can be found in Bernstein (1996), which is an
entertaining read as are other general books on probablility like Gigerenzer (2002),
and Taleb (2004, 2007). A classic on presenting graphs is Tufte (1983).
Where economic or …nancial topics appear in these notes, they are explained.
But it is useful to also do some general reading. On economics there are a range
of paperbacks aimed at the general reader such as Kay (2004) and Smith (2003).
Similarly, there are lots of paperbacks on …nance aimed at the general reader.
Mandelbrot and Hudson, (2005) is excellent. Mandelbrot a mathematician who
invented fractals has done fundamental work on …nance since the 1960s. Although
he is highly critical of a lot of modern …nance theory, he gives an excellent ex-
position of it. Das (2006) provides an excellent non-technical introduction to
derivatives, as well as a lot of funny and often obscene descriptions of what life
is actually like in …nancial markets. Although written before the credit crunch
Taleb, Mandlrot and Das all pointed to the danger of such events.
References
Barrow, Michael,2009) Statistics for Economics Accounting and Business Stud-
ies, 5th edition, FT-Prentice Hall.
6
Bernstein, Peter L. (1996) Against the Gods, the Remarkable Story of Risk,
Wiley.
Das, Satyajit (2006) Traders Guns and Money, Pearson
Gigerenzer, Gerd (2002) Reckoning with Risk, Penguin.
Gujarati D.N. and D.C. Porter, (2009) Basic Econometrics, 5th edition. Mc-
Graw Hill
Kay, John (2004) The Truth about Markets, Penguin
Mandelbrot, Benoit and Richard Hudson, (2005) The (Mis) Behaviour of Mar-
kets Pro…le Books
Smith, David (2003) Free Lunch, Pro…le Books
Stock, J.H. and M.W. Watson (2007) Introduction to Econometrics, 2nd edi-
tion, Pearson-Addison Wesley.
Taleb, Nassim Nicholas (2004) Fooled by Randomness: the hidden role of
chance in life and in the markets, 2nd edition, Thomson
Taleb, Nassim Nicholas (2007) The Black Swan: The impact of the highly
improbable, Penguin.
Tufte, Edward R (1983) The Visual Display of Quantitative Information,
Graphics Press
Verbeek, Marno (2008) A guide to modern econometrics, 3rd edition, Wiley.
7
2. Class exercises Spring term (Many are past exam ques-
tions).
2.1. Week 1 Descriptive Statistics
(1) In a speech, Why Banks failed the stress test, February 2009, Andrew Haldane
of the Bank of England provides the following summary statistics for the "golden
era" 1998-2007 and for a long period. Growth is annual percent GDP growth,
in‡ation is annual percent change in the RPI and for both the long period is
1857-2007. FTSE is the monthly percent change in the all share index and the
long period is 1693-2007.
Growth In‡ation FTSE
98-07 long 98-07 long 98-07 long
Mean 2.9 2.0 2.8 3.1 0.2 0.2
SD 0.6 2.7 0.9 5.9 4.1 4.1
Skew 0.2 -0.8 0.0 1.2 -0.8 2.6
Kurtosis -0.8 2.2 -0.3 3.0 3.8 62.3
(a) Explain how the mean; standard deviation, SD; coe¢ cient of skewness and
coe¢ cient of kurtosis are calculated.
(b) What values for the coe¢ cients of skewness and kurtosis would you ex-
pect from a normal distribution. Which of the series shows the least evidence of
normality.
(c) Haldane says "these distributions suggest that the Golden Era" distribu-
tions have a much smaller variance and slimmer tails" and "many risk manage-
ment models developed within the private sector during the golden decade were,
in e¤ect, pre-programmed to induce disaster miopia.". Explain what he means.
(2) The …nal grade that you get on this course (fail, pass, merit, distinction) is
a summary statistic. 40-59 is a pass, 60-69 is a merit, 70 and over is a distinction.
In the Grad Dips grade is based on marks (some of which are averages) in 5
elements. Merit or better is the criteria for entering the MSc.
Final overall grades are awarded as follows:
Distinction: Pass (or better) in all elements, with Distinction marks in three
elements and a Merit (or better) mark in a fourth.
Merit: Pass (or better) in all elements, with Merit marks (or better) in four
elements.
8
Pass: In order to obtain a Pass grade, a student should take all examinations
and obtain Pass marks (or better) in at least four elements.
Notice that the grade is not based on averages. This is like the driving test.
If you are good on average, excellent on steering and acceleration, terrible on
braking, you fail; at least in the UK.
Consider the following four candidates.
M ac M ic QT AES Opt
a 80 80 30 80 80
b 80 80 40 80 80
c 60 60 40 60 60
d 80 80 80 30 30
(a) What …nal grade would each get?
(b) How would rules grades based on mean, median or mode di¤er.
(c) What explanation do you think there is for the rules? Do you think that
they are sensible?
9
(c) Are the two events passing A and passing B (i) mutually exclusive (ii)
independent?
PA FA B
PB 50 80
FB
A 60 100
(3) You are in a US quiz show. The host shows you three closed boxes in one
of which there is a prize. The host knows which box the prize is in, you do not.
You choose a box. The host then opens another box, not the one you chose, and
shows that it is empty. He can always do this. You can either stick with the box
you originally chose or change to the other unopened box. What should you do:
stick or change? What is the probability that the prize is in the other unopened
box?
(4) (Optional). Calculate the probability that two people in a group of size
N will have the same birthday. What size group do you need for there to be a
50% chance that two people will have the same birthday? Ignore leap years.
10
Use a spreadsheet for this and work it out in terms of the probability of not
having the same birthday. In the …rst row we are going to put values for N
(the number of people in the group), in the second row we are going to put the
probability that no two people in a group of that size have the same birthday.
In A1 put 1, in B1 put =A1+1, copy this to the right to Z1.
In A2 put 1. Now in B2 we need to calculate the probability that two people
will NOT share the same birthday. There are 364 possible days, i.e. any day but
the …rst person’s birthday, so the probability is 364/365. So put in B2 =A2*(365-
A1)/365. Copy this right. Go to C2, the formula will give you 1 (364=365)
(363=365): The third person, has to have birthdays that are di¤erent from the
…rst and the second. Follow along until the probability of no two people having
the same birthday falls below a half.
11
1995 1996 1997
p q p q p q
Red 3 20 4 15 5 10 :
White 4 20 4 25 4 30
Orange 1 10 2 10 3 10
12
(2) Consider the following bivariate regression model:
Yi = + Xi + ui
estimated on a sample of data i = 1; 2; :::; N , where Yi is an observed dependent
variable, Xi is an observed exogenous regressor, ui is an unobserved disturbance,
and and are unknown parameters.
(a) Derive the least squares estimators for and :
(b) Under what assumptions about ui will these least squares estimators be
Best Linear Unbiased.
(c) Explain what Best Linear Unbiased means.
(d) Explain what exogenous means.
13
2.8. Week 8, Regression
Consider the linear regression model
y =X +u
where y is a T 1 vector of observations on a dependent variable, X a full rank T k
matrix of observations on a set exogenous variables, a k 1 vector of unknown
coe¢ cients, and u an unobserved disturbance with E(u) = 0 and E(uu0 ) = 2 I:
(a) Derive the least squares estimator b:
(b) Derive the variance covariance matrix of b.
(c) Show that b is unbiased.
14
2.10. Week 10, Regression.
Using US data on on company earnings, Et ; and the dividends paid out to share-
holders, Dt ; t =1872-1986 the following results were obtained (standard errors in
parentheses):
Dt = 0:011+ 0:088Et + 0:863Dt 1 +b u1t
(0:009) (0:008) (0:019)
2
R = 0:998; SER = 0:074:
ln Dt = 0:136+ 0:312 ln Et + 0:656 ln Dt 1 +b
u2t
(0:015) (0:025) (0:029)
2
R = 0:993; SER = 0:085:
SER is the standard error of regression.
(a) Test whether the intercepts in each equation are signi…cantly di¤erent from
zero at the 5% level and interpret them. Do they have sensible values?
(b) It is suggested that the linear equation is a better equation than the loga-
rithmic because it has a higher R2 : Do you agree with this?
Interpret the role of the lagged dependent variable and calculate the long-run
e¤ect of earnings on dividends in each case.
(c) A test for second order serial correlation had a p value of 0.008 in the
linear model and 0.161 in the logarithmic model. Explain what second order
serial correlation is and why it is a problem. Is it a problem in either of these
models?
Extra questions
1. From observations taken over many years it is found that marks on a
particular exam are normally distributed with an expected value of 50 and a
variance of 100. For a standard normal distribution Pr(Z < z) =0.6915 for z=0.5;
0.8413 for z=1; 0.9332 for z=1.5; 0.9772 for z=2.
(a) What is the probability of a student getting below 40 marks on this exam?
(b) What is the probability of a student getting below 30 marks on this exam?
(c) Suppose that in a class of 16 students a new teaching method was used and
the average mark in this class was 54. Is this statistically signi…cant evidence, at
the 5% level, that the new method is more e¤ective? Suppose that the average of
54 had been obtained in a class of 36 students, would this have been statistically
signi…cant evidence? Assume that the new teaching method did not change the
variance.
15
(d) Show that the arithmetic mean is an unbiased estimator of the expected
value.
(e) Give an example of the type of distribution where the arithmetic mean
would not be a good measure of the typical value of a random variable.
3. Data are available for the quantity of a good consumed Qt ; real income
Yt ; the price of the good Pt ; and the average of all other prices Pt for years
t = 1; 2; :::; T: The demand function is assumed to take the form
Qt = AYt Pt 1 P 2 eut
16
2.11. Answers to selected exercises
2.11.1. Week 2, question 1.
Note
x)2 = x2i + x2 + 2xxi
(xi
P 2
and that x is a constant, does not vary with i; so x = N x2
X
N X X
1
N (xi x)2 = N 1
x2i + N 1
N x2 N 1
2x xi
i=1
X
1
= N x2i + x2 2x2
X
1
= N x2i x2
17
the prize is in C and the host opened B
P (WB \ HC ) + P (WC \ HB )
= P (WB )P (HC j WB ) + P (WC )P (HB j WC )
1 1 2
= 1+ 1=
3 3 3
The second line folows from the de…nition of conditional probabilities. This for-
mula might seem complicated, but it appears in a very popular work of teenage
…ction: The curious incident of the dog in the night-time, by Mark Haddon, Ran-
dom House, 2003.
2.11.4. Week 7.
(a) is the income elasticity of demand for energy,
qP is the price elasticity.
P P
(b) R2 = 1 "2t = (qt q)2 ; SER =
b "2t =T k where k = 3 here.
b
R2 measures the …t relative to the variance of the dependent variable, the SER
just measures the …t. The rankings would only necessarily be the same if all the
dependent variables had the same variance.
(e) Include a time trend
qt = + y t + p t + t + "t ;
2.11.5. Week 10
(a) Intercept of the linear equation has t ratio 0.011/0.009=1.22 not signi…cantly
di¤erent from zero. Intercept in log equation has t ratio -0.136/0.015=-9.06 sig-
ni…cantly di¤erent from zero. The intercepts measure di¤erent things in the two
equations. See (c) below.
(b) No, the R2 cannot be compared because the dependent variables are dif-
ferent.
(c) The lagged dependent variable captures the e¤ect that …rms smooth divi-
dends and only adjust them slowly in response to earnings changes. The long run
relations are
0:011 0:088
D = + E
1 0:863 1 0:863
= 0:080 + 0:642E
18
0:136 0:312
ln D = + ln E
1 0:656 1 0:656
= 0:395 + 0:91 ln E
In the case of the linear model the long-run intercept should be zero, dividends
should be zero when earnings are zero, in the logarithmic case it is a constant of
proportionality exp( 0:395) = 0:67 so the long-run is
D = 0:67E 0:91 :
(d) Second order serial correlation is a relation between the residuals of the
form
ut = 1 ut 1 + 2 ut 2 + et
It is a problem because it indicates the model is likely to be misspeci…ed. The
linear model p value indicates that there is serial correlation p<0.05, the logarith-
mic model p value indicates that there is probably not serial correlation, p>0.05,
at the 5% level.
so
X
N
1
E(X) = + E(N ui ) = :
i=1
19
1e In any very skewed distribution, such as income, the average can be very
di¤erent from the typical, so the mode<median<mean.
2.a The price index for consumption is Pt = 100 Ct =RCt ; In‡ation is It =
100 (Pt Pt 1 )=Pt 1 ; the ex post real interest rate is RIR = T BRt It+1 the
Savings Rate is SR = 100(1 C=Y ): Thus
Y ear Pt It RIR SR
1995 100 3:21 8:16
1996 103:1 3:1 3:76 6:88
1997 105:7 2:5 5:03 6:78
1998 107:9 2:1
The real interest rate is rising, while the savings rate is falling, the opposite of
what one might expect.
3. (a) First take logarithms then estimate by LS.
(b) The income elasticity of demand > 0 (for a normal good); the own price
elasticity 1 < 0 (not a Gi¤en good); the cross price elasticity with all other goods
2 > 0 (it cannot be a complement with all other goods).
(c) Reparameterise the estimated equation as
ln Qt = a + ln Yt + 1 (linPt ln Pt ) + ( 2 1 ) ln Pt + ut
and conduct a t test on the hypothesis that the coe¢ cient of ln Pt is zero.
(d) Only relative prices matter.
20
3. Assessment and specimen exam.
3.1. Assessment
Assessment is 70% on the exam, 30% on the empirical project submitted in mid
May. You should read the advice on doing the project fairly early in the course to
get the general idea of what we are looking for and then refer back to it regularly
as you do your project.
The exam will have six questions, the …rst three questions are in Section A the
last three questions are in Section B and are more applied. You must do three
questions: at least one from each section and one other. The questions will be
about:
1. Least Squares, e.g. deriving the least squares estimator and its variance
covariance matrix, proving that it is unbiased Estimator, etc. This will be
the only question that requires matrix algebra.
3. Hypothesis testing, e.g. explain the basis of tests applied either to means or
regression coe¢ cients.
4. Economic and …nancial data, this will involve calculations, e.g. of index
numbers, growth rates, ratios, derived measures and some interpretation.
6. Probability and distributions, e.g. being able to use the basic rules of prob-
ability; given the mean and variance for a normal distribution, calculate the
probability of various events happening, etc.
Before 2004 there were separate exams for Applied Finance and Statistics and
Applied Economics and Statistics. Examples of question 1 can be found on AFS,
though not AES papers; examples of question 4 can be found on AES but not
AFS papers. Examples of the other questions appear in both.
The 2004 exam, with answers is given below.
21
3.2. Specimen Exam (2004)
Answer THREE Questions, at least one from each section and one other. All
questions are weighted equally.
SECTION A
1. Consider the model:
yt = 1 + 2 x2t + ut
Rt = + t + ut :
Rt = 6:37 +0:33 t +b
ut
:
(0:66) (0:10)
22
(d) What assumptions are required for least squares to give good estimates.
(3) Using information in question 2, and assuming that the 95% critical value
for a t test is 2 :
(a) Test the hypotheses = 0 and = 1 at the 5% level.
(b) Explain why the hypothesis = 1 might be interesting to test.
(c) Explain what Type I and Type II errors are. What is the probability of
Type I error in your test in part (a).
(d) Give a 95% con…dence interval for b. Explain what a con…dence interval
is.
SECTION B
4. The following data were taken from Economic Trends, February 2004.
N DY RDY RP I CP I HP T BY
2000 654; 649 654; 649 170:3 105:6 87:7 5:69
2001 700; 538 685; 263 173:3 106:9 95:1 3:87
2002 721; 044 696; 224 176:2 108:3 111:2 3:92
23
The estimated coe¢ cients, with their standard errors in parentheses are:
wi = 7:46 +6:6Ai +9:0Ei 6:7A2i +4:1Ai Ei
(0:19) (0:62) (1:01) (0:56) (1:95)
24
3.3. Answers
Question 1
(a). 2 3 2 3 2 3
y1 1 x21 u1
4 :: 5 = 4 :: :: 5 1
+ 4 :: 5
2
yT 1 x2T uT
0
(b). u0 u = (y X )0 (y X ) = y0y + X 0X 2 0 X 0 y, which is clearly a
function of :
(c).
@u0 u
= 2X 0 X 2X 0 y = 0
@
X 0X = X 0y
b = (X 0 X) 1 X 0 y
(d)
b = (X 0 X) 1 X 0 (X + u)
= + (X 0 X) 1 X 0 u
E b = + (X 0 X) 1 X 0 E(u)
E b =
Question 2
(a) In Rt = b + b t + u bt ; u
bt is the estimated residual, the di¤erence between
the actual and predicted
P 2 P value of Rt :
2
(b) R = 1 bt = (Rt R)2 gives the proportion of the variation in the
u
dependentpP variable explained by the regression, 29% in this case so quite low.
s= b2t =T 2 is a measure of the average error, 2.75 percentage points in
u
this case, quite
P a 2large P error in predicting the interest rate;
2
DW = bt = u
u bt is a test for serial correlation, it should be close to two,
so at 0.63 this regression su¤ers from severe positive serial correlation.
25
(c) is the value the interest rate would take if in‡ation were zero, interest
rates would be 6.37%; is the e¤ect of in‡ation on interest rates: a 1 percentage
point increase in in‡ation raises interest rates by 0.33 percentage points.
(d) In the model;
the regressors should be exogenous, uncorrelated with the errors, E( t ut ) = 0,
the regressors should not be linearly dependent, the variance of t not equal
zero, in the case of a single regressor.
The disturbances should have
expected value (mean) zero, E(ut ) = 0;
be serially uncorrelated, E(ut ut s ) = 0; s 6= 0
with constant variance, E (u2t ) = 2 .
Question 3
(a) t( = 0) = 6:37=0:66 = 9:65 reject the hypothesis that equals zero;
t( = 1) = (0:33 1)=0:10 = 6:7 reject the hypothesis that equals one.
(b) If the real interest rate is constant plus a random error (Fisher Hypothesis)
It = I + ut then Rt = I + t + ut then in the regression = I and = 1:
(c) Type I error is rejecting the null when it is true, Type II error is accepting
the null when it is false. The probability of type I error in (a) is 5%.
(d). The 95% con…dence interval is 0:33 2 0:10 i.e. the range 0:13 to 0:53:
We are 95% con…dent that this range covers the true value of :
Question 4
(a) De‡ator is the ratio N DY =RDY
N DY RDY Ratio 100
2000 654; 649 654; 649 100
2001 700; 538 685; 263 102:3
2002 721; 044 696; 224 103:6
(b) In‡ation, 2001-2:
DY=1.27%, RPI=1.67%; CPI=1.31; HP=16.92.
Massive house price boom, RPI slightly higher than CPI or DY.
(c). Real return is capital gain on house prices, less interest cost, less rate of
in‡ation. Using CPI (others are acceptable)= 16.92-3.92-1.31=11.69%.
(d) Yes even if you own your own home and have paid o¤ your mortgage there
is an implicit rental cost of home ownership and this will increase when house
prices increase.
Question 5.
26
(a) The stochastic component is ui it is the bit of log earnings not explained
by the regressors, will re‡ect unmeasured ability etc.
(b) We would expect earnings to rise and then fall with age, the quadratic
terms captures this feature.
(c) Yes. For earnings to rise and fall with age, we need 1 > 0 and 1 < 0:
You earn more with better education so 2 > 0: The positive coe¢ cient on the
interaction term 2 makes peak earnings later for more highly educated men,
which is likely.
(d) To get the maximum
@w
= 1 + 2 1A + 2E = 0
@A
A = (2 1 ) 1 ( 1 + 2 E)
= 0:0746(6:6 + 4:1E)
= 0:49 + 0:31E
Question 6.
(a) The probability that a random variable Z takes a value less than a particular
value z.
(b)
(i) z=(1000-1005)/2=2.5. P(z<2.5)=1-P(z<2.5)=1-0.9938=0.0062=, 0.6%, roughly
one chance in 200
(ii) z=(1004-1005)/2=0.5. P(-0.5<Z<0.5)=2x(0.6915-0.5)=0.383=38%.
p
(c) (i) standard error of the mean is 2= 16 = 2=4 = 0:5;
(ii) P (Z < 1004) : z=(1004-1005)/0.5=-2. P(z<-2)=1-P(z<2)=1-0.9772=0.0228,
2.28%.
(d) From c(ii) the probability of getting this value or less is 2.28%, which is a
small number, so it is probably not working correctly. This is not needed for the
answer but if you wanted to test at the 5% level the null hypothesis that it was
working properly ( = 1005), you would need to be careful whether the alternative
was 6= 1005; in which case there would be 2.5% in each tail; or < 1005; in
27
which case there would be 5% in the lower tail. Since the probability is less than
2.5%, you would reject either on a two tailed or one tail test. P
(e) Estimate the sample variance from your sample of 16 s2 = (16 1) 1 (xi
x)2 ; to check whether the variance seemed to have increased from 4. This is not
needed for the answer but you would use a variance ratio F test.
28
4. Doing Your Project
To do your project you need to choose a topic and collect some data; do some
statistical analysis (e.g. graphs, summary statistics); draw some conclusions and
write up your results clearly in a standard academic style in less than 3,000 words.
The project will count for 30% of the total marks for the course and must be
submitted in mid May. This is to test your ability to collect and interpret data,
not a test of the material covered in this course, e.g. you do not need to do
regressions or give text-book material on statistical procedures.
You must submit a hard-copy version of the project, with an electronic copy of
the data (e.g. on CD). We will not return your project, which we keep on …le for
writing references, etc. Make a copy for your own use. We do not show projects
to anyone but the examiners. This is to allow students to use con…dential data
from work.
Keep safe backup copies of your data and drafts of your text as you
work (college computers are a safe place). We are very unsympathetic
if you lose work because it was not backed up properly. If you lose
work, it was not backed up properly.
29
The …rst page should be a title page with the following information:
The course title and year (eg GDE ASE Project 2010)
Title of project
Your name
You must graph the data (line graphs, histograms or scatter diagrams)
All graphs and tables must have titles and be numbered
You must have a bibliography
You must detail the sources of your data and provide it.
The project must be your own work. You can discuss it with friends or col-
leagues and it is a good idea for students to read and comment on each others
work but it must be your work which you submit. Plagiarism is a serious o¤ence
(see the section in the course handbook).
30
topic, such as CAPM or the aggregate consumption function, than about a slightly
more unusual topic. We get bored reading over 100 projects, try to make yours
memorable.
Analysis. Does your work indicate a good understanding of the relevant
context, e.g. economics, institutions? Have you brought appropriate concepts,
e.g. economic or …nance theory, to bear on your work? Can you develop a logical
argument and use evidence e¤ectively to support your argument? Did you answer
the question you posed? Are you clear about the direction of causality?
Data collection/limitations. Have you collected appropriate data (given
time limitations)? Have you taken reasonable care to check the raw data and
derived variables? Do you understand what your data actually measure? Are you
aware of the limitations of your data? You will receive some credit for any unusual
amount of work you have put into collecting data. Unless you have experience in
designing surveys, do not conduct a survey to collect data.
Data summary and presentation. Have you computed appropriate de-
rived variables? Have you noticed apparently anomalous observations? Do you
demonstrate the ability to summarize and present data in a clear and e¤ective
way?
Statistical Methods. Have you used appropriate statistical methods? Use
the simplest technique that will answer your question. Have you quali…ed any
conclusions that you have drawn? e.g. pointed out that the sample size is small
or that you have been unable to control for certain factors, etc. Beware of using
advanced statistical techniques that you do not understand; you will be penalised
for any mistakes you make in their use.
Interpretation. How well have you interpreted your data? Have you borne
its limitations in mind when interpreting it? Does your interpretation reveal
understanding of the relevant concepts?
31
4.4. DATA
Barrow Ch. 9 discusses data. You must give us a copy of the data you have
used. If you need to use con…dential work-related data, we can provide a letter
to your employer explaining that it will be kept con…dential. You should choose
a topic on which you can …nd data without too much e¤ort. If you cannot make
substantial progress in …nding data in 2-3 hours systematic search, either in the
library or over the internet, you should probably change your topic. There is a
vast amount of statistics available on the Web from governments, central banks,
international organisations (IMF, OECD or World Bank). Also check Birkbeck
eLibrary, statistical databases; datastream is available in the library. The main
UK source is the O¢ ce of National Statistics, US Data is available on the Federal
Reserve Economic Database and the Bureau of Economic Analysis. Try Google
or other search engines: just type the topic you are interested in and then data,
e.g. “Road Tra¢ c Deaths Data”got various sites with international data on road
tra¢ c deaths.
Check your data, no matter where it comes from. Errors (eg a decimal point in
the wrong place) can cause havoc if you miss them. Check for units, discontinuities
and changes in de…nitions of series (e.g. uni…cation of Germany). Check derived
variables as well as the raw data. Calculating the minimum, maximum and mean
can help to spot errors. Carry out checks again if you move data from one type
of …le to another.
4.5.1. ABSTRACT
Here you must summarize your project in 100 words or less. Many journals print
abstracts at the start of each paper, copy their form
4.5.2. INTRODUCTION.
Explain what you are going to investigate, the question you are going to answer,
and why it is interesting. Say brie‡y what sort of data you will be using (eg.
32
quarterly UK time-series 1956-2009 in section 7.7). Finish this section with a
paragraph which explains the organization of the rest of your report.
4.5.3. BACKGROUND
This section provides context for the analysis to follow, discusses any relevant
literature, theory or other background, e.g. explanation of specialist terms. Do
not give standard textbook material; you have to tell us about what we do not
know, not what we do know. On some topics there is a large literature on others
there will be very little. The library catalogue, the EconLit database and the
library sta¤ can help you to …nd literature.
In many cases, this section will describe features of the market or industry
you are analyzing. In particular, if you are writing about the industry in which
you work, you should make sure you explain features of the industry, or technical
terms used in it, which may be very well known to everyone in it, but not to
outsiders.
4.5.4. DATA
Here you should aim to provide the reader with enough information to follow the
rest of the report, without holding up the story line. Details can be provided in
an appendix. You should discuss any peculiarities of the data, or measurement
di¢ culties. You may need to discuss changes in the de…nition of a variable over
time.
4.5.5. ANALYSIS
The background should guide you in suggesting features of the data to look at,
hypotheses to test, questions to ask. You must have tables and graphs describing
the broad features of the data. In the case of time series data these features might
include trends, cycles, seasonal patterns and shifts in the mean or variance of the
series. In the case of cross-section data they might include tables of means and
standard deviations, histograms or cross-tabulations. In interpreting the data, be
careful not to draw conclusions beyond those that are warranted by it. Often
the conclusions you can draw will be more tentative than you would like; data
limitations alone may ensure this. Do not allow your emotional or ethical responses
to cloud your interpretation of what you …nd in the data.
33
If you run regressions, report: the names of variables (including the depen-
dent variable); number of observations and de…nition of the sample; coe¢ cents
and either t-ratios, standard errors or p values; R-squared (or R-bar-squared);
standard error of the regression; and any other appropriate test statistics such as
Durbin-Watson for time-series.
4.5.7. BIBLIOGRAPHY
You must give a bibliographic citation for any work referred to in the text, follow
the Harvard system, used in section 1.5.
4.5.8. APPENDICES
You must have a data appendix, giving precise de…nitions of variables, and details
of the sources. The guiding principle is that you should provide enough detail
to enable the reader to reproduce your data. Give the data in electronic form
attached to the project.
34
5. PART II: NOTES
The word Statistics has at least three meanings. Firstly, it is the data themselves,
e.g. the numbers that the O¢ ce of National Statistics collects. Secondly, it has a
technical meaning as measures calculated from the data, e.g. an average. Thirdly,
it is the academic subject which studies how we make inferences from the data.
Descriptive statistics provide informative summaries (e.g. averages) or presen-
tations (e.g. graphs) of the data. We will consider this type of statistics …rst.
Whether a particular summary of the data is useful or not depends on what you
want it for. You will have to judge the quality of the summary in terms of the
purpose for it is used, di¤erent summaries are useful for di¤erent purposes.
Statistical inference starts from an explicit probability model of how the data
were generated. For instance, an empirical demand curve says quantity demanded
depends on income, price and random factors, which we model using probability
theory. The model often involves some unknown parameters, such as the price
elasticity of demand for a product. We then ask how to get an estimate of this
unknown parameter from a sample of observations on price charged and quantity
sold of this product. There are usually lots of di¤erent ways to estimate the
parameter and thus lots of di¤erent estimators: rules for calculating an estimate
from the data. Some ways will tend to give good estimates some bad, so we need
to study the properties of di¤erent estimators. Whether a particular estimator is
good or bad depends on the purpose.
For instance, there are three common measures (estimators) of the typical value
(central tendency) of a set of observations: the arithmetic mean or average; the
median, the value for which half the observations lie above and half below; and the
mode, the most commonly occurring value. These measure di¤erent aspects of the
distribution and are useful for di¤erent purposes. For many economic measures,
like income, these measures can be very di¤erent. Be careful with averages. If we
have a group of 100 people, one of whom has had a leg amputated, the average
number of legs is 1.99. Thus 99 out of 100 people have an above average number
of legs. Notice, in this case the median and modal number of legs is two.
We often want to know how dispersed the data are, the extent to which it
can di¤er from the typical value. A simple measure is the range, the di¤erence
between the maximum and minimum value, but this is very sensitive to extreme
values and we will consider other measures below.
Sometimes we are interested in a single variable, e.g. height, and consider its
average in a group and how it varies in the group? This is univariate statistics,
35
to do with one variable. Sometimes, we are interested in the association between
variables: how does weight vary with height? or how does quantity vary with
price? This is multivariate statistics, more than one variable is involved and
the most common models of association between variables are correlation and
regression, covered below.
A model is a simpli…ed representation of reality. It may be a physical model,
like a model airplane. In economics, a famous physical model is the Phillips
Machine, now in the Science Museum, which represented the ‡ow of national
income by water going through transparent pipes. Most economic models are just
sets of equations. There are lots of possible models and we use theory (interpreted
widely to include institutional and historical information) and statistical methods
to help us choose the best model of the available data for our particular purpose.
The theory also helps us interpret the estimates or other summary statistics that
we calculate.
Doing applied quantitative economics or …nance, usually called econometrics,
thus involves a synthesis of various elements. We must be clear about why we are
doing it: the purpose of the exercise. We must understand the characteristics of
the data and appreciate their weaknesses. We must use theory to provide a model
of the process that may have generated the data. We must know the statistical
methods which can be used to summarise the data, e.g. in estimates. We must
be able to use the computer software that helps us calculate the summaries. We
must be able to interpret the summaries in terms of our original purpose and the
theory.
36
ships, reducing the probability of them damaging the ships. Other examples of
this sort of use of statistics in World War II can be found in The Pleasures of
Counting, T.W. Korner, Cambridge University Press, 1996.
yt = yt 1 + "t :
This says that the value a variable takes today, time t; is the value that it had
yesterday, time t 1; plus a random shock, "t . The shock can be positive or
negative, averages zero and cannot be predicted in advance. Such shocks are often
called ‘White noise’. To a …rst approximation, this is a very good description of the
logarithm of many asset prices such as stock market prices and foreign exchange
rates, because markets are quite e¢ cient: the change in log price (the growth
rate) yt = yt yt 1 = "t is random, unpredictable. Suppose that people knew
something that will raise the price of a stock tomorrow, they would buy today
and that will raise the price of the stock today. Any information about the future
that can be predicted will be re‡ected in the price of the stock now. So your
best estimate of tomorrow’s price is today’s price. What will move the price of
the stock will be new, unpredicted, information. The random shock or error "t
represents that unpredictable information that changes prices. Most of our models
will involve random shocks like "t : Sometimes a …rm will report a large loss and its
stock price will go up. This is because the market had been expecting even worse
losses, which had been re‡ected in the price. When reported losses were not as bad
as expected the price goes up. Whether the e¢ cient market hypothesis is strictly
true is a subject of controversy, but it is an illuminating …rst approximation.
If the variable has a trend, this can be allowed for in a random walk with drift
yt = + yt 1 + "t :
37
5.3. Notation
It is very convenient to express models in mathematical notation, but notation
is not consistent between books and the same symbols means di¤erent things
in di¤erent disciplines. For instance, Y often denotes the dependent variable
but since it is the standard economic symbol for income, it often appears as an
independent variable. It is common to use lower case letters to indicate deviations
from the mean, but it is also common to use lower case letters to denote logarithms.
Thus yt could indicate Yt Y or it could indicate ln(Yt ): The logarithm may be
written ln(Yt ) or log(Yt ); but in empirical work natural logarithms, to the base
e, are almost always used. The number of observations in a sample is sometimes
denoted T for time series and sometimes N or n for cross sections.
In statistics we often assume that there is some true unobserved parameter
and wish to use data to obtain an estimate of it. Thus we need to distinguish the
true parameter from the estimate. This is commonly done in two ways. The true
parameter, say the standard deviation, is denoted by a Greek letter, say ; and
the estimate is denoted either by putting a hat over it, b; said ‘sigma hat’or by
using the equivalent latin letter, s. In many cases we have more than one possible
estimator (a formula for generating an estimate from the sample) and we have
to distinguish them. This is the case with the standard deviation, there are two
formulae for calulating it, denoted in these notes by b and s. However, books are
not consistent about which symbol they use for which formula, so you have to be
careful.
The Greek alphabet is used a lot. It is given below, with the upper case letter,
lower case, name and example.
A alpha; often used for intercept in regression.
B beta; often used for regression coe¢ cients and a measure of the risk of
a stock in …nance.
gamma.
delta; used for changes, yt = yt yt 1 ; often rate of depreciation.
E or " epsilon; " often error term.
Z zeta.
H eta; often elasticity.
theta; sometimes parameter space; often a general parameter.
I iota.
K kappa.
lambda; often a speed of adjustment.
M mu; often denotes expected value or mean.
38
N nu.
xi.
O o omicron.
pi; (ratio Y
of circumference to diameter) often used for in‡ation. is the
product symbol: yi = y1 y2 ::: yn .
P rho; often denotes autocorrelation coe¢ cient.
sigma; 2 usually a variance, a standard deviation, is the summation
operator, also sometimes used for a variance covariance matrix.
T tau.
upsilon.
' phi; (y) sometimes normal distribution function; (y) normal density
function.
X chi; 2 distribution.
psi.
! omega; often a variance covariance matrix.
39
6. Descriptive statistics
Data tend to come in three main forms:
-time-series, e.g. observations on annual in‡ation in the UK over a number of
years;
-cross-section, e.g. observations on annual in‡ation in di¤erent countries in a
particular year; and
-panels e.g. observations on in‡ation in a number of countries in a number of
years.
Time-series data have a natural order, 1998 comes after 1997; cross-section
data have no natural order; the countries could be ordered alphabetically, by size
or any other way.
The data are usually represented by subscripted letters. So the time-series data
on in‡ation may be denoted yt , t = 1; 2; :::; T: This indicates we have a sequence
of observations on in‡ation running from t = 1 (say 1961) to t = T (say 1997) so
the number of observations T = 37. For a set of countries, we might denote this
by yi , i = 1; 2; :::; N . Where, if they were arranged alphabetically, i = 1 might
correspond to Albania and i = N to Zambia. Panel data would be denoted yit ;
with a typical observation being on in‡ation in a particular country i; say the UK,
in a particular year, t; say 1995, this gives T N observations in total. We will use
both T and N to denote the number of observations in a sample.
Graphs are generally the best way to describe data. There are three types of
graph economists commonly use. Firstly, for time-series data, we use a line graph,
plotting the series against time. We can then look for trends (general tendency to
go up or down); regular seasonal or cyclical patterns; outliers (unusual events like
wars or crises). Secondly, we can plot a histogram, which gives the number (or
proportion) of observations which fall in a particular range. Thirdly we can plot
one variable against another to see if they are associated, this is a scatter diagram
or X-Y Plot. Barrow Chapter 1 has lots of examples.
40
xi ; i = 1; 2; :::; N , where N = 4: The sum of these is 20; which we denote
X
N
xi = 2 + 4 + 6 + 8 = 20
i=1
this simply says add together the N elements of x. If we multiply each number
by a constant and add a constant to each number to create yi = a + bxi , then
X
N X
N X
N
yi = (a + bxi ) = N a + b xi :
i=1 i=1 i=1
In the example above for a = 1; b = 2, then yi = 5; 9; 13; 17, with sum 44, which
is the same as 4 1 + 2 20.
In this example, it is 20/4=5. The formula just says add up all the values and
divide by the number of observations. There are other sorts of mean. For instance,
the geometric mean is the N th root of the product of the numbers
p
GM (x) = N x1 x2 ::: xN
and can be calculated as the exponential (anti-log) of the arithmetic mean of the
logarithms of the numbers, see Barrow P54.
41
of the observations will lie in the range the mean plus or minus two standard
deviations.
One estimator of the variance of xi , (sometimes called the population variance)
is
2
XN
b = (xi x)2 =N:
i=1
Notice here we distinguish between the true value 2 and our estimate of it b2 :
This formula gives a set of instructions. It says take each of the observations
P and
subtract the mean, (xi x); square them (xi x)2 ; add them together N i=1 i x)
(x 2
PN
and divide them by the number of observations, 4 in this case: i=1 (xi x)2 =N =
20=4 = 5:
i xi xi x (xi x)2
1 2 3 9
2 4 1 1
3 6 1 1
4 8 3 9
sum 20 0 20
In this case both the Mean and the Variance are 5. The standard deviation,
SD(x) = b is the square root of the variance: 2.24 in this case.
Another estimator of the variance of xi , (sometimes called the sample variance)
is
XN
s2 = (xi x)2 =(N 1):
i=1
X
N
Cov(x; y) = (xi x) (yi y) =N:
i=1
The Covariance will be positive if high values of x are associated with high values
of y; negative if high values of x are associated with low values of y: It will be zero
is there is no linear relationship between the variables. The covariance can be
di¢ cult to interpret, so it is often standardised to give the correlation coe¢ cient,
42
by dividing the covariance by the product of the standard deviations of the two
variables.
Cov(x; y)
r=
SD(x)SD(y)
The correlation coe¢ cient lies between plus and minus one, 1 r 1: A
correlation coe¢ cient of 1 means that there is an exact negative linear relation
between the variables, +1 an exact positive linear relation, and 0 no linear re-
lation. Correlation does not imply causation. Two variables may be correlated
because they are both caused by a third variable.
This new variable, zi; has mean zero and variance (and standard deviation) of
one. Notice the correlation coe¢ cient is the covariance between the standardised
measures of x and y:
6.1.5. Moments
A distribution is often described by:
P P
its moments, which are N r
i=1 xi =N . The mean x = xi =N is the …rst
moment, r = 1.
P P
its centred moments N i=1 (xi x)r =N: The variance, 2 = N i=1 (x i x)2 =N
P N
is the second centred moment, r = 2: The …rst centred moment, i=1 (xi
x)=N = 0.
P r
its standardised moments zi =N; where zi = (xi x)=s: The third stan-
dardised moment, r = 3; is a measure of whether the distribution is sym-
metrical or skewed. The fourth standardised moment, r = 4; is a measure
of kurtosis (how fat the tails of the P distribution are). For a normal dis-
tribution,Pthe coe¢ cient of skewness zi3 =N is zero, and the coe¢ cient of
kurtosis zi4 =N is 3.
43
Some distributions do not have moments. The average (mean) time to get a
PhD is not de…ned since some students never …nish, though the median is de…ned,
the time it takes 50% to …nish.
44
and half below) falls in the range 0 to 5%. The distribution is broadly symmetric.
To calculate means and variances, we need to assign values to the ranges, which
is inevitably arbitrary to some extent. We will use mid-points of closed ranges,
e.g. the mid point of 5-15% is 10%; and for the open ranges treat more than 15%
as 20%, and less than -10% as -15%. This gives the values Xi below. We also give
pi the proportions in each range, percentages divided by 100.
We cannot use our standard formula for the mean, so we need to adjust it. See
Barrow p27. Call the total number of respondents, N = 225: Call the number who
responded in each range Ni : So the number in the lowest range (who reponded
that they would fall by more than 10%), N1 = 0:18 225 = 40:5: The percentages
are rounded, which is why it is not an integer. We could calculate the mean by
multiplying the the value of each answer by the number who gave that value,
adding those up over the 5 ranges and dividing by the total number, but it is
easier to calculate, if we rewrite that formula in terms of the proportions.
X
5 X
5
Ni X
5
X=( Ni Xi )=N = Xi = pi Xi :
i=1 i=1
N i=1
So in the table, we give the values Xi ; the proportions pi (note they sum to one);
calculate the product pi Xi ; then sum these to get X = 2:05: Then we calculate
(Xi X) and pi (Xi X) (note these sum to zero). Then multiply pi (Xi X) by
(Xi X) to get pi (Xi X)2 : We sum these to get the variance. The calculations
are given below.
Xi pi pi Xi (Xi X) pi (Xi X) pi (Xi X)2
20 0:15 3 17:95 2:6925 48:330375
10 0:25 2:5 7:95 1:9875 15:800625
2:5 0:18 0:45 0:45 0:081 0:03645
5 0:24 1:2 7:05 1:692 11:9286
15 0:18 2:7 17:05 3:069 52:32645
Sum 1 2:05 2:25 0 128:4225
2
Since the variance = 128:4225 the standard deviation, its square root is
= 11:33:
45
To summarise the mean forecast of SBE respondents
p was for house price growth
of 2.05, with a standard deviation of 11.33= 128:4225, which indicates the large
range of disagreement. The mean falls in the same range as the median, which also
indicates that the distribution is fairly symmetric, but it is bimodal with about a
quarter thinking prices will rise between 5 and 15%, and a quarter thinking that
they would fall by up to 10%.
In June 2004 when the survey was conducted, the average house price in the
UK according to the Nationwide Building Society was £ 151,254. In June 2007,
three years later, it was £ 184,074; a rise of 22%, though prices subsequently fell
and in June 2009 it was £ 156,442.
the ratio of the di¤erence between the mean return on the stock from the risk
free rate of interest to the standard deviations of the stock’s returns. There are a
range of other …nancial performance measures that combine risk and return.
You can reduce risk by diversifying your portfolio, holding more than one
share. Suppose there are two stocks, with mean returns 1 and 2 ; variances 21
and 22 ; and covariance between them of 12 : Suppose you form a Portfolio with
share w in stock one and (1 w) in stock two. The mean return on the portfolio
is
p = w 1 + (1 w) 2
the variance of the portfolio is
2
p = w2 2
1 + (1 w)2 2
2 + 2w(1 w) 12 :
46
Therefore, if the covariance between the stocks is negative, the portfolio will have
a smaller variance: when one stock is up, the other is down. Even if the covariance
is zero there are gains from diversi…cation. Suppose 21 = 22 = 2 and 12 = 0.
Then
2
p = w2 2
+ (1 w)2 2
= (1 + 2(w2 w)) 2
since w < 1; then w2 < w and the second term is negative, making 2
p < 2
:
47
7. Economic and Financial Data I: numbers and graphs
7.1. Tables and Calculations
Data will typically come in a Table, either electronic or hard-copy. When con-
structing your own tables, make sure you put a Title, full de…nitions
of the variables, the units of measurement and the source of the data.
Be clear on the units of measurement and get a feel for the orders of magnitudes:
what are typical values, what is the range (the highest and lowest values). When
comparing series or graphing them together make sure they are in comparable
units.
Consider the following table.
Gross National Product, GNP, and Gross Domestic Product are two di¤erent
measures of the output of a country. GNP includes production by the nationals
of the country, GDP includes the production within the boundary of the country
whether by nationals or foreigners. The di¤erence is net property income from
48
abroad. The de…nition of Population is straightforward, though it can be di¢ cult
to count everybody. Number in the armed forces can raise problems of de…nition
in countries with para-military units like the French Gendarmerie, are they armed
forces or not? De…ning what should be included in military expenditure also
raises di¢ culties and many countries are secretive about what they spend on the
military. There are quite large margins of error on all these numbers. Many
economic measures are only ROMs (rough orders of magnitude) others are WAGs
(wild arsed guesses).
From this table we can calculate derived measures like (a) per-capita income,
GNP per head, by dividing GNP by population for the world as a whole and for the
developed and developing countries (we would have to calculate the world totals)
(b) the average annual growth rate of population or per-capita income (GNP per
head) between 1985 and 1995; (c) the percentage share of military expenditure
in GDP (d) the number of people in the armed forces per 1000 population for
the world as a whole and for the developed and developing countries in 1985 and
1995.
When doing these calculations, it is crucial to be careful about units. These
variables are all measured in di¤erent units and ratios will depend on the units
of the numerator and denominator. Expressing the units as powers of 10 is often
useful. 1 = 100 ; 10 = 101 ; 100 = 102 ; 1; 000; 000 = 106 : The power gives you the
number of zeros after the one.
GNP and Military Expenditure are measured in US billions, thousand mil-
lions, 109 1995 US$, so the military expenditure GDP ratio is a proportion. For
developed countries in 1985 it is 1100:8=21190 = 0:0519 to convert to percent
multiply by 100, 5.19% of GNP was devoted to the military.
Population is measured in millions, 106 ; the number in the armed forces in
thousands, 103 . Divide the number in the armed forces by population for devel-
oped countries in 1985 it is 11920=1215:7 = 9:8: This is in units of the numerator
divided by the denominator: 103 =106 = 103 6 = 10 3 ; one per thousand. Thus
there are roughly 10 members of the armed forces per thousand population in the
developed countries in 1985; about 1% of the population.
GNP per capita is GNP (109 ) divided by population (106 ) so is (103 ) so the
1985 …gure for developed countries 21190=1215 = 17:44; is measured in thousands
(103 ):Thus average income in the developed world in 1985 in 1995 dollars was
about $17; 500 compared the …gure for the developing world to 4184=3620:8 =
1:155; $1155:
The growth rate in GNP for the developed countries 1985 to 1995 is (23950
49
21190)=21190 = 23950=21190 1 = 0:13; 13% over ten years, roughly 1:3% per
annum. Notice whether growth rates are expressed as proportions (0:13) or per-
centages (13%) and the period they are calculated over.
7.2. Graphs
Graphs are usually the most revealing way to present data. See examples in 7.7.
For time-series data the starting point is a line graph, just a plot over time. If you
want to plot more than one series on the graph to see if they move together, make
sure that they are of the same scale. Do not put too many series on the same
graph. When you look at a time-series graph look for trends (steady upward, or
less often downward, movement in the series); seasonal patterns (if it is quarterly
or monthly data); cycles, periods of boom or recession spreading over a number
of years; outliers which look out of line with the other numbers. Outliers may
be produced by unusual events (wars, …nancial crises, etc) or they may just be
mistakes in the data.
Scatter diagrams plot one series against another and are typically used to
investigate whether there is an association between two variables. Look to see
how close the association is, whether it is positive or negative, whether there are
outliers which do not …t into the standard pattern.
Histograms, or frequency distributions, are pictures which give the number
(or proportion) of the observations that fall into particular ranges. We will use
these extensively in lectures. Look to see whether the distribution is unimodal or
bimodal; whether it is skewed; how dispersed it is and whether there are outliers.
7.3. Transformations
In many cases, we remove the e¤ects of trends, changes in price levels etc. by
working with either growth rates, or with ratios. In economics and …nance certain
ratios tend to be reasonably stable (i.e. not trended). An example is the Average
Propensity to Consume (the ratio of Consumption to Income) or the Savings
Ratio. Since income equals savings plus consumption Y = S + C, the average
propensity to consume equals one minus the savings rate AP C = C=Y = 1 S=Y .
Where Y is income, S savings, C consumption, SR savings ratio, AP C average
propensity to consume. SR and AP C can be expressed either as proportions as
here or multiplied by 100 to give percent. In …nance, we work with ratios like the
Price-Earnings Ratio or the Dividend Yield. Notice these ratios can be compared
50
across countries, because the units of currency in the numerator and denominator
cancel.
Theory will often tell you what variables to construct, e.g.
-Real Interest Rates equal to the nominal ordinary interest rate minus the
(expected) rate of in‡ation;
-real exchange rate, the nominal exchange rate times the ratio of foreign to
domestic price indexes;
-the velocity of circulation, the ratio of nominal GDP to the money supply.
51
production), environmental impacts, illegal activities, etc. If there is an increase in
crime which leads to more security guards being hired and more locks …tted, this
increases GDP. There are various attempts to adjust the totals for these e¤ects
although, so far, they have not been widely adopted. You should be aware of the
limitations of GDP etc as measures. There is a good discussion of the issues and
alternatives in the Report by the Commission on the Measurement of Economic
and Social Progress (2009) available on www.stiglitz-sen-…toussi.fr.
The accounts are divided by sector. The private sector covers …rms (the cor-
porate sector usually divided into …nancial and non-…nancial) and households;
the public sector covers general government (which may be national or local)
and sometimes state owned enterprises, though they may be included with the
corporate sector; the overseas sector covers trade. Corresponding to the output,
expenditure and income ‡ows, there are …nancial ‡ows between sectors. De…ne
Tt as taxes less transfer payments. The total Yt Tt factor income minus taxes
plus transfer payments (e.g. state pensions or unemployment bene…t) is known as
disposable income. Subtract Tt from both sides of the income-expenditure identity
Yt Tt = Ct + It + Gt Tt + Xt Mt
note that savings St = Yt Tt Ct : Move Ct and It to the left hand side to give:
7.5. Unemployment
We often have a theoretical concept and need to provide an ‘operational’de…ni-
tion, a precise set of procedures which can be used by statistical o¢ ces, to obtain
measures. This raises questions like what is the best operational measure and
how well does it correspond to the particular theoretical concept. Unemploy-
ment is a case in point. There are a number of di¤erent theoretical concepts of
unemployment and a number of di¤erent ways of measuring it.
52
Do you think the following people are unemployed:
a student looking for a summer job who cannot …nd one;
a 70 year old man who would take a job if one was o¤ered;
a mother looking after her children who would take a job if she could …nd good
child care;
an actor ‘resting’between engagements;
someone who has been made redundant and will only accept a job in the …eld
they previously worked in for the same or better salary?
One method is the ‘Claimant count’, i.e. the number who are registered unem-
ployed and recieving bene…t. But this is obviously very sensitive to exact political
and administrative decisions as to who is entitled to receive bene…t.
An alternative is a survey, which asks people of working age such questions as
(i) are you currently employed; if not
(ii) are you waiting to start a job; if not
(iii) have you looked for work in the last four weeks.
Those in category (iii) will be counted as unemployed.
53
return to gold at its pre-war parity of 4.87. The consequence was continued
recession and mass unemployment.”
Answer
Y ear Growth% IN F % EARIR EP RIR $=$
1918
1919 11 33:3 29:8 10:6 4:42
1920 2 7:1 13:3 15:8 3:66
1921 11 9:6 14:2 23:7 3:85
1922 5 19:1 21:7 4:43
(a) Growth is 100(Yt =Yt 1 1): For 1919 it is 100((48=54) 1) 11%; per
capita income fell by over 10%, and continued falling till 1921. This fall produced
rising unemployment. In‡ation is calculated as the percentage change in the RPI.
Prices were rising between 1918 and 1919, but then fell giving negative in‡ation,
de‡ation. Between 1919 and 1922 prices fell by almost a third.
(b) If you lend £ 1 at 15% for a year you get £ 1.15 at the end of the year, but
if prices rise at 10%, over the year what you can buy with your £ 1.15 has fallen,
the real rate of return is only 5%=15%-10%. This is the expost (from afterwards)
rate, using the actual in‡ation over the time you lend. So the ex post real interest
rate for 1919 is EPRIR=3:5 ( 7:1) = 10:6:
(c) When you lend the money you do not know what the rate of in‡ation will
be, the ex ante (from before) rate is the interest rate minus the expected rate of
future in‡ation. In many cases the expected rate of in‡ation can be approximated
by the current rate of in‡ation, which you know, so the ex ante real interest rate
is the nominal interest rate minus the current rate of in‡ation. So the ex ante
real interest rate is EARIR=3:5 33:3 = 29:8: At the beginning the ex ante
real rate was negative because in‡ation was higher than the nominal interest rate,
subsequently with quite high nominal rates and de‡ation (negative in‡ation) real
rates became very high.
(d) The statement is true; the combination of sharply reduced military spend-
ing and high real interests rate caused de‡ation (falling prices), falling output,
rising unemployment and after 1920 a strengthening of the exchange rate. The
Chancellor of the Exchequer, Winston Churchill returned sterling to the gold stan-
dard at its pre-war parity in 1925. Keynes blamed this policy for the depression
of the early 1920s.
54
7.7. Example: Were the nineties and noughties NICE?
7.7.1. Introduction
Mervyn King, the Governor of the Bank of England, described the UK economic
environment at the end of the 20th century and the beginning of the 21st century
as NICE: non-in‡ationary, consistently expanding. Subsequently it became VILE:
volatile in‡ation, less expansion. This example uses descriptive statistics and
graphs to compare UK growth and in‡ation over the period 1992-2007, with their
earlier behaviour to see how nice this period was.
7.7.2. Data
The original series, from the O¢ ce of National Statistics, are for 1955Q1-2009Q2,
Qt = Gross Domestic Product, chained volume measure, constant 2003 prices,
seasonally adjusted (ABMI) and 1955Q1-2009Q1 Et = Gross Domestic Product
at market prices: Current price: Seasonally adjusted (YBHA). The price index,
the GDP de‡ator, is Pt = Et =Qt : Growth (the percentage change in output), gt ;
and in‡ation (the percentage change in prices), t ; are measured over the same
quarter in the previous year as:
Such annual di¤erences smooth the series and would remove seasonality if they
were not already seasonally adjusted. Notice that by taking the four quarter
change, whereas the data for ouput and prices starts in 1955Q1, the data for
growth and in‡ation only starts in 1956Q1.
55
the middle 1980s, culminating in joining the European Exchange Rate Mechanism.
With the ejection of sterling from the ERM in September 1992, in‡ation targets
were adopted. The Bank of England was given independent responsibility for
targetting in‡ation, when the Labour Government was elected in 1997. This
history is re‡ected in the graphs for growth and in‡ation below. The "stop-go"
pattern of the 1950s and 1960s is obvious, then there is a peak when growth
reached almost 10% during the "Barber Boom".of the early 1970s following the
collapse of the Bretton Woods system of …xed exchange rates. Anthony Barber was
the conservative Chancellor at the time. The …rst oil price shock of 1973 following
the Arab-Israeli war sent the economy into deep recession, with growth negative in
most quarters between 1974Q1 and 1975Q4. During 1976 the UK had to borrow
from the IMF. Growth recovered in the later 1970s, before a further recession in
the early 1980s following the second oil price shock after the Iranian revolution and
Mrs Thatcher’s monetarist policies. Growth recovered in the later 1980s with the
boom under Nigel Lawson, the Conservative Chancellor, then sank into recession
again in the early 1990s, possibly worsened by the …xed exchange rate required
by membership of the European Exchange Rate Mechanism. The UK left the
ERM in September 1992 and adopted in‡ation targetting, with independence of
the Bank of England in 1997. There was then a period of relative stability, before
the e¤ects of the 2007 Credit Crunch began to impact on the economy. Output
fell by 5.8% in the year up to 2009Q2, the lowest observed in this sample; but the
2009 …gures are likely to be revised as more data becomes available.
In‡ation was fairly low, below 10%, though volatile during the 1960s and 1970s.
In the mid 1970s it shot up to almost 25%, before falling back to almost 10%, then
rising again over 20% following the election of Mrs Thatcher in 1979. It then came
down below 5%, with a burst in the late 1980s and early 1990s, before stabilising
at a low level subsequently. There are a number of di¤erent measures of in‡ation,
CPI, RPI etc., and they show slightly di¤erent patterns from the GDP de‡ator
used here.
56
GROWTH
10
-2
-4
-6
55 60 65 70 75 80 85 90 95 00 05
INFLATION
30
25
20
15
10
0
55 60 65 70 75 80 85 90 95 00 05
57
35
Series: GROWTH
30 Sample 1955Q1 2009Q2
Observations 214
25
Mean 2.374050
Median 2.508702
20
Maximum 9.516651
Minimum -5.795526
15 Std. Dev. 2.161040
Skewness -0.678412
10 Kurtosis 5.299602
5 Jarque-Bera 63.56818
Probability 0.000000
0
-6 -4 -2 0 2 4 6 8 10
58
50
Series: INFLATION
Sample 1955Q1 2009Q2
40 Observations 213
Mean 5.656730
30 Median 4.360406
Maximum 24.87334
Minimum 0.310697
20 Std. Dev. 4.690943
Skewness 1.905004
Kurtosis 6.913777
10
Jarque-Bera 264.7751
Probability 0.000000
0
0 2 4 6 8 10 12 14 16 18 20 22 24
59
30
25
20
INFLATION
15
10
0
-8 -4 0 4 8 12
GROWTH
60
7.7.6. Di¤erences between periods
We shall divide the sample into period A, the 37 years 1956-1992, and period
B the 15 years 1993-2007. The table gives mean, median, standard deviation,
coe¢ cients of skewness and kurtosis and the number of observations for growth
and in‡ation over these two periods Relative to the …rst period, in‡ation was
much lower and much less volatile, while growth was slightly higher and much
less volatile
p in the second period. The standard
p error of mean growth in period A
is 2:3= 148 0:18 and in period B is 0:7= 60 0:0:09: So the 95% con…dence
interval for the period A mean growth is 2:3 2(0:18), that is 2:66 to1:94; while
that for period B is 2:9 2(0:09) that is 3:08 to 2:72: They do not overlap.
Growth and in‡ation, A: 1956-1992, B: 1993-2007
Inf l Gr
periods A B A B
M ean 7:1 2:5 2:3 2:9
M edian 5:8 2:5 2:2 2:9
St:Dev 5:0 0:7 2:3 0:7
Skew 1:5 0:2 0:2 0:4
Kurt 5:4 2:7 3:8 3:0
N Obs 148 60 148 60
7.7.7. Conclusion
Compared to previous (and subsequent?) history, the period 1993-2007 was nice.
From being high and volatile, in‡ation became low and stable. Growth was slightly
higher and much less volatile. Although there was an economic cycle 1997-2007,
it was less pronounced than in earlier years. Thus there is some basis for the claim
by Gordon Brown, Chancellor then Prime Minister, to have abolished boom and
bust. Whether this was a matter of luck or good policy remains a matter of
debate, as does whether the easy-money policy of these years contributed to the
subsequent …nancial crisis. This "Great Moderation" was not con…ned to the UK,
but seems to have been a general feature of many advanced economies. There
were global economics shocks over this period: the Asian crisis of 1997-8; the
Russian default and the LTCM crisis of 1998; the dot.com boom and bust of
2001; the gyrations in the oil price, which went from around $10 in 1998 to $147
in 2008; and the 9/11 attacks and the wars in Iraq and Afghanistan. But despite
these shocks, there was smooth non-in‡ationary growth in the UK, as in many
economies. Whether the Great Moderation was merely a transitory interlude of
61
stability in a crisis-prone system remains to be seen, as the warning on …nancial
products says: past performance is not necessarily a guide to the future.
8.1. Data
This will be in an Excel …le called Shiller.xls. Copy this …le to your own directory
and when you have …nished the exercise copy the new …le to your own disk using
a new name.
In the …le, rows 2 to 131, contain US data from 1871 to 2000; row 1 contains
headings:
A-YEAR
B-NSP: Stock Price Index (Standard & Poors composite S&P) in current prices
(Nominal terms), January …gure.
C-ND: Dividends per share in current prices, average for the year.
D-NE: Earnings per share in current prices, average for the year.
E-R: Interest rate, average for the year.
F-PPI:The producer price index, January …gure, 1967 average=100.
The letters A to F indicate the columns in the spreadsheet. Note 2000 data
are missing for ND, NE, R.
62
8.2. Transformations: ratios, growth rates, correcting for in‡ation, etc
In column G construct the (backwards) Price Earnings Ratio. Type PER in cell
G1 as a heading. Put =B3/D2 in cell G3. Copy this down for the rest of the
years ending in G131. The price earnings ratio is a measure of the underlying
value of a share, how much you have to pay to buy a stream of earnings.
Highlight the data for PER over G3:G131. Use Chart wizard (on toolbar) and
choose a line graph, top left sub-type, choose next and go to next. Comment
on the picture. Can you identify the Stock Market Booms of the late 1920s and
1990s. What happened next in both cases? What other features of the data can
you see?
In the same way, create the following new variables for the period 1872 to
1999, i.e. rows 3 to 130, with the variable name at the top:
H- Dividend Yield: DY (Type DY in cell H1 and type the formula =C3/B3
in cell H3 and copy down)
I-Capital Gain: CG (type CG in I1) type the formula =(B4-B3)/B3 in
I3 and copy down
J-In‡ation:INF=(F4-F3)/F3 type formula in J3
K-Real Return on equities: RRE=H3+I3-J3
L- Real Interest Rate RIR=(E3/100)-J3.
Notice that these are proportions, e.g. numbers like 0.05. This corresponds to
5%. Why do we subtract the rate of in‡ation to get the real return, whereas we
would divide the stock price index by the price level to get the real stock price
index? Why do we divide the interest rate by 100? Plot in‡ation. Notice how we
have matched the dates, e.g. in de…ning in‡ation, we have had to take account of
the fact that the price index is a January …gure.
63
to L132 and 133. You now have the mean (average) and the (unbiased estimate
of) the standard deviation for the real return on equities and the real interest
rate. The mean real return on equities is much higher than that from the interest
rate. This is known as the equity premium. But the risk on equities, measured
by the standard deviation, is also much higher. Calculate the average real return
on equities, 1872-1990 and 1991-1999.
X
T
y= yt =T
i=1
64
Then we calculate the minimum value; the value which 25% of the observations
lie below; the median, the value which 50% of the observations lie below; the value
which 75% of the observations lie below; and the maximum. These are known as
Quartiles. Excel also allows you to calculate percentiles: the value for which x%
lie below, for any x: Returns were negative in over 25% of the years. The median
is very similar to the mean which suggests that the distribution is symmetric. The
range of real returns is very large, between minus 50% and plus 50%.
The measure of skewness is roughly
1X
T 3
(yt y)
:
n i=1 s
the standarised third moment. In fact Excel makes degrees of freedom adjust-
ments, similar to the sample standard deviation above. If the distribution is
symmetrical, the measure of skewness should be zero. In this case, it is pretty
near zero.
The measure of (excess) kurtosis is roughly
1X
T 4
(yt y)
3:
n i=1 s
if the distribution is normal the expected value of the …rst term (the fourth stan-
dardised centred moment) is three, so values around zero indicate a roughly normal
distribution. You can get exact de…nitions from HELP, STATISTICAL FUNC-
TIONS, SKEW & KURT.
65
9. Index Numbers
9.1. Introduction
In‡ation, the growth rate of the price level, is measured by percentage change in
a price index:
where Pt is a price index. There are a lot of di¤erent price indexes. In the past
the Bank of England had a target for in‡ation in the Retail Price Index excluding
mortgage interest payments (which go up when interest rates are raised) RPIX of
2.5%. In 2004 this was replaced by a 2% target for in‡ation in the Consumer Price
Index, CPI. This was previously known as the Harmonised Index of Consumer
Prices HICP, the type of index the European Central Bank uses. There are two
main di¤erences. One is in the method of construction. RPI uses arithmetic
means, CPI uses geometric means. This di¤erence makes the RPI run about 0.5%
higher, hence the reduction in target from 2.5% to 2%. The other is that the
CPI excludes housing. In August 2003 RPIX was 2.9%, CPI 1.3%, most of the
di¤erence accounted by the high rate of UK housing in‡ation, while in May 2009
the CPI was at +2.2% and the RPI at -1.1%, de‡ation, not in‡ation, because
of falling house prices. The RPI and CPI measure consumer prices, the GDP
de‡ator, another price index used in section 7.7, measures prices in the whole
economy.
Distinguish between the price level and the rate of in‡ation. When the in‡ation
rate falls, but is still positive, prices are still going up, just at a slower rate. If
in‡ation is negative, prices are falling. Suppose the Price Index was 157 in 1995
and 163 in 1996, then the rate of in‡ation is 3.82%. Notice that this can also be
expressed as a proportion, 0.0382. In many cases, we will calculate the growth
rate by the change in the logarithm, which is very close to the proportionate
change for small changes, e.g. < 0:1, i.e.10%. We usually work with natural logs
to the base e, often denoted by LN rather than LOG, sometimes used just for
base 10. Price indexes are arbitrarily set at 100, or 1, in some base year, so the
indexes themselves cannot be compared across countries. The index can be used
to compare growth relative to the base year if they all have the same base year,
e.g. 1990=100 for all countries.
If the in‡ation rate rises from 3% to 6% it has risen by three percentage points.
It has not risen by three percent, in fact it has risen by 100%. If something falls
66
by 50% and then rises by 50%, it does not get back to where it started. If you
started at 100, it would fall to 50, then rise by 50% of 50, 25, to get to 75.
67
present this is on a graph with price and quantity on the two axes. Revenue is
then the area of the rectangle, price times quantity. Draw the two rectangles for
years t and t 1: The di¤erence between their areas will be made up of the three
components of the …nal equation.
Most of the time, we are not dealing with a single good, but with aggregates
of goods, so that total expenditure is the sum of the prices times the quantities of
the di¤erent goods, i = 1; 2; :::; N whose prices and quantities change over time.
X
n
Et = pit qit :
i=1
This is like your supermarket receipt for one week, it lists how much of each item
bought at each price and the total spent. To provide a measure of quantity, we
hold prices constant at some base year, 0; say 2000 and then our quantity or
constant price measure is
Xn
Qt = pi0 qit :
i=1
Monetary series can be either in nominal terms (in the current prices of the
time, like expenditures) or in real terms (in the constant prices of some base year
to correct for in‡ation, to measure quantities). To convert a nominal series into
a real series it is divided by a price index. So if we call nominal GDP Et and
real GDP Qt , and the price index Pt then Et = Pt Qt : So given data on nominal
(current price) GDP and a price index we can calculate real (constant price) GDP
as Qt = Yt =Pt ; where Pt is the value of a price index. Alternatively if we have
data on current price (nominal) and constant price (real) GDP, we can calculate
the price index (usually called the implicit de‡ator) as the ratio of the current to
constant price series: Pt = Yt =Qt :
Most statistical sources only give two of the three of the possible series, nom-
inal, real, price, assuming (somewhat implausibly) that users will know how to
calculate the third from the other two.
68
P
in year t, e.g. current price GDP, is Et = N i=1 pit qit : We could also express this
as an index, relative to its value in some base year:
PN
I pit qit
Et = PNi=1
i=1 pi0 qi0
here the index would be 1 in the base year, usually they are all multiplied by 100
to make them 100 in the base year. If the base is 100, then EtI 100 gives the
percentage change between the base year and year t. Index numbers are ‘unit
free’. This is an expenditure index.
A constant price series would measure quantities all evaluated in the same
base year prices. Suppose we used year zero, then the constant price measure of
quantity would be
X
N
Qt = pi0 qit :
i=1
Constant price GDP was a measure of this form, where the base year was changed
every …ve years or so. Recently this …xed base approach has been relaced by a
moving base called a chain-weighted measure.
We can construct a price index as the ratio of the expenditure series to the
constant price series (in the case of GDP, this would be called the GDP de‡ator)
PN
1 Et i=1 pit qit
Pt = = PN :
Qt i=1 pi0 qit
It measures prices in year t relative to prices in year zero, using quantities in year
t as weights. Where t = 0, Pt1 = 1. The index always equals 1 (or 100) in its base
year. This is a price index.
We could also use quantities in year zero as weights, and this would give a
di¤erent price index. PN
2 i=1 pit qi0
Pt = PN :
i=1 pi0 qi0
Notice that these will give di¤erent measures of the price change over the period
0 to t. In particular, for goods that go up (down) in price, quantities in year
t are likely to be lower (higher) than in year 0. Indexes that use beginning of
the period values as weights are called Laspeyres indexes, those that use end of
period values are called Paasche indexes. There are a range of other ways we could
calculate price indexes; chain indexes use moving weights. Apart from the problem
69
of choosing an appropriate formula, there are also problems of measurement; in
particular, measuring the quantities of services supplied, accounting for quality
change and the introduction of new goods. Barrow Chapter 2 discusses index
numbers.
You will often …nd that you have overlapping data. For instance, one edition
of your source gives a current price series and a constant price series in 1980
prices for 1980 to 1990; the second gives you a current price series and a constant
price series in 1985 prices for 1985 to 1995. This raises two problems. Firstly the
current price series may have been revised. Use the later data where it is available
and the earlier data where it is not. Secondly, you have to convert the data to
a common price basis. To convert them, calculate the ratio in 1985 (the earliest
year of the later source) of the 1985 constant price series to the 1980 constant
price series; then multiply the earlier 1980 price series by this ratio to convert
the 1980 constant price series to 1985 constant prices. If the two estimates of the
current price series for 1985 were very di¤erent, you would also have to adjust for
the ratio of the current price series.
70
(c) Using 2000 quantities as weights in‡ation is 25%=100(50,000/40,000-1) ,
using 2001 quantities as weights in‡ation is –20% =100(40,000/50,000-1).
(d) Because of demand responses to price (the …rm bought more hardware
which had fallen in price and less software which had risen in price), base weighted
measures tend to overestimate in‡ation (+25%) and terminal weighted measures
tend to underestimate it (-20%). The truth lies somewhere in between.
71
t = T; (31/7/03) the S&P index fell 30.5%:
PN PN
i=1 ViT i=1 Vi0
R1 = PN = 0:305:
i=1 Vi0
where the terms in [::] are the weights, the share of the market accounted for by
…rm i, in the base year. In 1 + R2 each of the weights is just 1=N:
Most indexes are weighted by market capitalisation, but other forms of weight-
ing are becoming more widespread, e.g. ‘fundamental indices’, which use measures
like the …rms revenues.
72
of the contract price over the lifetime of the contract. This particular over-spend
got very little publicity because most journalists and MPs tend to fall asleep once
index numbers are mentioned.
To see the relation between prices and wages, write the total value of sales
(price times quantity) as a markup on labour costs, wages times number employed
Pt Qt = (1 + )Wt Et
Pt = (1 + )Wt Et =Qt
so if mark-ups are constant, output price in‡ation is the rate of growth of wages
minus the rate of growth of productivity:
ln Pt = ln Wt ln(Qt =Et )
73
10. Probability
10.1. Introduction
We need to analyse cases where we do not know what is going to happen: where
there are risks, randomness, chances, hazards, gambles, etc. Probabilities provide
a way of doing this. Some distinguish between (a) risk: the future is unknown
but you can assign probabilities to the set of possible events that may happen;
(b) uncertainty: you know the set of possible events but cannot assign proba-
bilities to them; and (c) unawareness where you cannot even describe the set of
possible events, what US Defense Secretary Donald Rumsfeld called the unknown
unknowns, the things you do not even know that you do not know about. People
seem to have di¢ culty with probabilities and it is a relatively recent branch of
mathematics, Nobody seems to have regarded probabilities as things that could
be calculated before about 1650 (after calculus) and the axiomatic foundations of
probability theory were only provided in the 1930s by the Russian Mathematician
Kolmogorov.
Probabilities are numbers between zero and one, which represent the chance
of an event happening. Barrow chapter 2 discusses them. If an event is certain
to happen, it has probability one; if an event is certain not to happen, it has
probability zero. It is said that only death and taxes are certain, everything
else is uncertain. Probabilities can either represent degrees of belief, or be based
on relative frequency, the proportion of times an event happens. So if in past
horse races the favourite (the horse with the highest probability, the shortest odds
o¤ered by bookmakers) won a quarter of the time, you might say the probability of
the favourite winning was 0.25; this is a relative frequency estimate. Alternatively
you could look at a particular future race, study the history (form) of the horses
and guess the probability of the favourite in that race winning, this is a degree of
belief estimate. You bet on the favourite if your estimate of the probability of the
favourite winning is greater than the bookmakers estimate, expressed in the odds
o¤ered; the odds are the ratio of the probability to one minus the probability.
There is a large literature on the economics and statistics of betting. Notice that
although the probabilities of the possible events should add up to one (it is certain
that some horse will win the race), the implied probabilities in the odds o¤ered
by bookmakers do not. That is how they make money on average. There are also
systematic biases. For instance, the probability of the favourite winning is usually
slightly better than the bookmaker’s odds suggest and the the probability of an
outsider slightly worse. This favourite-longshot bias has been noted for over 60
74
years in a variety of horse-races, but its explanation is still subject to dispute.
If you throw a dice (one dice is sometimes known as a die) there are six possible
outcomes, 1 to 6, and if the die is fair each outcome has an equal chance; so the
probability of any particular number is 1/6. On one throw you can only get one
number, so the probability of getting both a 3 and a 4 on a single throw is zero,
it cannot happen. Events which cannot both happen (where the probability of
both happening is zero) are said to be mutually exclusive. For mutually exclusive
events, the probability of one or the other happening is just the sum of their
probabilities, so the probability of getting either a 3 or a 4 on one throw of a dice
is 1/6+1/6=2/6=1/3.
Suppose two people, say A and B, each throw a dice the number B gets is
independent of the number A gets. The result of A’s throw does not in‡uence B’s
throw. The probability of two independent events happening is the product of
their probabilities. So the probability of both A and B getting a 3 is 1=6 1=6 =
1=36. There are 36 (62 ) possible outcomes and each are equally likely. The 36
outcomes are shown in the grid below, with the six cases where A and B get an
equal score shown in bold. So there is a probability of 6=36 = 1=6 of a draw. We
can also use the grid to estimate the probability of A getting a higher score than
B. These events correspond to the 15 events above the diagonal, so the probability
of A winning is 15=36 = 5=12; the probability of B winning is also 5=12 and the
probability of them getting an equal score, is 1=6 = 2=12: Notice the 3 events (A
wins, B wins, a draw) are mutually exclusive and their probabilities sum to one,
12=12.
A
1 2 3 4 5 6
1 x x x x x x
2 x x x x x x
B 3 x x x x x x
4 x x x x x x
5 x x x x x x
6 x x x x x x
When events are not mutually exclusive, one has to allow for the probability of
both events happening. This seems to have been …rst pointed out by Bernouilli in
his Ars cojectandi in 1713, with a gruesome example. "If two persons sentenced
to death are ordered to throw dice under the condition that the one who gets the
smaller number of points will be executed, while he who gets the larger number
will be spared, and both will be spared if the number of points are the same,
75
we …nd that the expectation of one of them is 7/12. It does not follow that the
other has an expectation of 5/12, for clearly each of them has the same chance,
so the second man has an expectation of 7/12, which would give the two of them
of them an expectation of 7/6 of life, i.e. more than the whole life. The reason
is that there is no outcome such that at least one of them is not spared, while
there are several in which both are spared."1 A will win 5=12 times, draw 2=12
times, so survives 7=12 times. Similarly for B. The probability of at least one
surviving is the sum of the probability of each surviving minus the probability of
both surviving: 5=6+5=6 1=6 = 1. The probability of both has to be substracted
to stop double counting. Check that the probability of getting either a 3 or a 4
on two throws of a dice is 1=3 + 1=3 1=9 = 20=36: Notice this is di¤erent from
the probability of getting either a 3 or a 4 on both throws of the dice, which
is (1=3)2 = 1=9: You must be careful about exactly how probability events are
described.
76
Below we will calculate the probability of winning the jackpot in the lottery.
Strictly this is a conditional probability: the probability of an event A (winning
the jackpot), given event B (buying a lottery ticket). Winning the jackpot and
not buying a ticket are mutually exclusive events. Conditional probabilities play
a very important role in decision making. They tell you how the information that
B happened changes your estimate of the probability of A happening. If A and
B are independent P (A j B) = P (A); knowing that B happened does not change
the probability of A happening: Similarly, the probability of B happening given
that A happens is:
P (A \ B)
P (B j A) = : (10.2)
P (A)
Multiply both sides of (10.1) by P (B) and both sides of (10.2) by P (A); and
rearrange to give
the joint probability is the product of the conditional probability and the marginal
probability in each case. Using the two right hand side relations gives Bayes
Theorem:
P (B j A)P (A)
P (A j B) = :
P (B)
This formula is widely used to update probabilities of an event A, in the light of
new information, B: In this context P (A) is called the prior probability of A,
P (B j A) is called the likelihood, and P (A j B) is called the posterior probability.
77
disease P (T P j N ) = 1 P (T N j N ) is called the probability of a false positive.
The probability of testing negative when you do have the disease, P (T N j D) =
1 P (T P j D) is the probability of a false negative. The fact that a decision can
lead to two sorts of error, false positives and false negatives in this case, appears
under a variety of di¤erent names in many areas.
Question
Suppose there is a disease which 1% of the population su¤er from P (D) = 0:01:
There is a test which is 99% accurate, i.e. 99% of those with the disease test
positive and 99% of those without the disease test negative: P (T P j D) = P (T N j
N ) = 0:99: Suppose you test positive, what is the probability that you actually
have the disease?
Answer
It is often simpler and clearer to work with numbers rather than probabilities
and present the results as numbers. This is also often more useful for non-specialist
audiences. Imagine a population of one hundred thousand: 100,000. Then a
thousand (1000 = 0:01 100; 000) have the disease and 99,000 are healthy. Of
those with the disease, 990 (0:99 1000) test positive, 10 test negative. Of those
without the disease, 990 (0:01 99; 000) also test positive, 98010 test negative.
Of the 2 990 = 1980 people who test positive, half have the disease, so the
probability of having the disease given that you tested positive is 50%. Thus you
should not worry too much about a positive result. A negative result is reassuring
since only 10 out of 98020, who test negative have the disease. Positive results
are usually followed up with other tests, biopsies, etc.
We could represent the joint and marginal frequencies as a table.
D N
T P 990 990 1,980
T N 10 98,010 98020
1,000 99,000 100,000
We could also calculate the conditional probability directly using Bayes The-
orem
P (T P j D)P (D) 0:99 0:01
P (D j T P ) = = = 0:5
P (T P ) 0:99 0:01 + 0:01 0:99
In practice screening is con…ned to groups where P (D) is high to avoid this prob-
lem. The decision to establish a screening program depends on a judgement of the
balance between (a) the bene…ts of detecting the disease, e.g. whether early treat-
ment saves lives, (b) the costs of false positives: inappropriate treatment, worry
78
etc. and (c) the cost of testing, e.g. time o¤ work to take the test. Since people
disagree about these costs and bene…ts, screening is controversial. For instance,
there is a test for an indicator of prostate cancer, PSA. The British Medical Asso-
ciation say "two-thirds of men with high PSA do not have prostate cancer, some
men with prostate cancer do not have high PSA and no evidence exists to show
whether treating localised prostate cancer does more harm than good". There is
a pro…table private industry in screening tests.
P (N ) = 1 P (A [ B)
= 1 (P (A) + P (B) P (A \ B))
0:45 = 1 (0:5 + 0:4 0:35):
Notice the categories innovator, 550; and non-innovator 450 are mutually exclu-
sive, the probability of being both an innovator and a non-innovator is zero by
de…nition.
(b) If they were independent the product of the probability of product inno-
vation times the probability of process innovation would give the probability of
doing both: P (A)P (B) = P (A \ B). In this case 0:5 0:4 = 0:2 which is much
79
less than 0:35, so they are not independent. You are more likely to do a second
type of innovation if you have already done one type.
(c) The probabilty of doing product innovation conditional on process innova-
tion is the probability of doing both divided by the probability of doing process
P (A \ B) 0:35
P (A j B) = = = 0:875
P (B) 0:4
P (A \ B) 0:35
P (B j A) = = = 0:7
P (A) 0:5
70% of product innovators also introduce a new process. Notice that the answers
to (c) and (d) are di¤erent.
80
The report that the cab was blue increases the probability that the cab was
blue from the unconditional prior probability of 0:15 to the conditional posterior
probability of 0:41; but it is still a lot less than 0:8:
In this case we knew the prior probabilities, the proportion of blue and green
cabs, that we used to adjust the report. In other cases where people report events
we do not know the prior probabilities, e.g. when 15% of people in California
report having being abducted by aliens.
81
the realisations of the random variable. For instance, X the total obtained from
throwing two dice is a discrete random variable. It can take the values 2 to 12.
After you throw the dice, you observe the outcome, the realisation, a particular
number, xi . Associated with the random variable is a probability distribution,
pi = f (xi ); which gives the probability of obtaining each of the possible outcomes
the random variable can take. The cumulative probability distribution,
j
X
F (xj ) = f (xi ) = P (X xj )
i=1
gives the probability of getting a value less than or equal to xj : So in the dice
case:
xi f (xi ) F (xj )
1 0 0
2 1=36 1=36
3 2=36 3=36
4 3=36 6=36
5 4=36 10=36
6 5=36 15=36
7 6=36 21=36
8 5=36 26=36
9 4=36 30=36
10 3=36 33=36
11 2=36 35=36
12 1=36 36=36
Make sure that you can calculate all the probabilities, use the 6x6 grid in
section 10.1 if necessary. Notice f (1) = 0; it is impossible to get 1; and F (12) = 1;
you are certain to get a value less than or equal to 12. f (7) = 6=36, because there
are six di¤erent ways of getting a 7: (1,6), (6,1), (2,5), (5,2), (3,4), (4,3). These
are the diagonal elements
P (running from bottom left to top right) in the grid above
in section 10.1. f (xi ) = 1: This is always true for a probability distribution.
This probability distribution is symmetric with mean=median=mode=7.
The mathematical expectation or expected value of a random variable (often
denoted by the Greek letter mu) is the sum of each value it can take, xi ; multiplied
by the probability of it taking that value pi = f (xi ) :
X
N
E(X) = f (xi )xi = : (11.1)
i=1
82
The expected value of the score from two throws of a dice is seven; calculated as
1 2 3 1
7=2 +3 +4 ::::: + 12 :
36 36 36 36
If all the values are equally likely, f (xi ) = 1=N , so the expected value is the
arithmetic mean.
The variance of a random variable is de…ned as
X
N
V (X) = E(X E(X))2 = f (xi )(xi )2 = 2
: (11.2)
i=1
If f (xi ) = 1=N this is just the same as the population variance we encountered
in descriptive statistics, section 6.1.2: This is the same formula that we used in
section 6.2 with f (xi ) = pi : In the dice example, the variance is 5.8 and the
standard deviation 2.4.
Suppose that there are two random variables X and Y with individual (mar-
ginal) probabilities of f (xi ) and f (yi ) and joint probabilities f (xi ; yi ): The joint
probability indicates the probability of both X taking a particular value, xi ; and
Y taking a particular value, yi ; and corresponds to P (A \ B) above. So if X is
the number on the …rst dice and Y is the number on the second dice
f (6; 6) = P (X = 6 \ Y = 6) = 1=36
If the random variables are independent, then the joint probability is just the
product of the individual probabilities as we saw above
f (xi ; yi ) = f (xi )f (yi )
and if they are independent, the expected value of the product is the product of
the expected values
E(XY ) = E(X)E(Y ):
P
Expected values behave like N 1 : So if a is a constant E(a) = a: If a and b are
constants E(a + bxi ) = a + bE(xi ):
The Covariance between two random variables is
Cov(X; Y ) = E[(X E(X))(Y E(Y ))]:
If f (xi ) = 1=N this is
1 X
N
Cov(X; Y ) = (xi x)(yi y):
N i=1
83
as we saw in section 6.1.3. If the random variables are independent the covariance
is zero. However, a covariance of zero does not imply that they are independent,
independence is a stronger property.
6 5 4 3 2 1 720
=
49 48 47 46 45 44 10068347520
this is a 1 in 13; 983; 816 chance, 1 in 14 million. Notice that low probability events
are not necessarily rare, it depends on the population exposed to them. Winning
the jackpot is a low probability event for any particular person, but it happens to
someone almost every week. Always check the time horizon that the probability
applies to. Someone shouting “we are all going to die”is not very worrying, since
that is certainly true eventually, though if they mean in the next …ve minutes, it
may be more worrying.
The usual formula for calculating the lottery is the number of ways in which a
group of r objects (in this case 6) can be selected from a larger group of n objects
(in this case 49) where the order of selection is not important. It is just the inverse
of the formula above.
n n! 49 48 47 46 45 44
Cr = =
r!(n r)! 6 5 4 3 2 1
The expected value of any particular game depends on whether the jackpot
has been increased by being rolled over from previous games where it was not
won. Even if the jackpot is over £ 14m, the expected value may not be positive,
84
because you may have to share the jackpot with other winners who chose the
same number, (unless you are a member of a gang that bought all the available
tickets and made sure nobody else could buy any tickets). Choosing an unpopular
number, that others would not choose, will not change the probability of winning
but may increase the probability of not having to share the jackpot. For instance,
people sometimes use birthdays to select numbers, so do not choose numbers over
31. You can choose to buy random numbers to avoid this problem. Optimal
design of lotteries raises interesting economic questions.
X
1
V (1 p)t X1
(1 p)
t
PV = =V
t=0
(1 + r)t t=0
(1 + r)
1
(1 p)
= V 1
(1 + r)
1
1 + r (1 p)
= V
1+r
(1 + r)V
=
r+p
85
12. Continuous random variables
Whereas a discrete random variable can only take speci…ed values, continuous
random variables (e.g. in‡ation) can take an in…nite number of values. Corre-
sponding to the probabilities f (xi ) for discrete random variables there is a prob-
ability density function, pdf; also denoted f (xi ) for continuous random variables
and a distribution function F (xi ) = Pr(X xi ) which gives the probability that
the random variable will take a value less than or equal to a speci…ed value xi :
The Bank of England publishes its estimate of the probability density function
for in‡ation as a fan chart. Since there are an in…nite number of points on the
real line, the probability of any one of those points is zero, although the pdf will
be de…ned for it. But we can always calculate the probability of falling into a
particular interval, e.g. that in‡ation will fall into the range 1.5% to 2.5%. In the
de…nitions of expected value and variance for a continuous random variable we
replace the summation signs in (11:1) and (11:2) for the discrete case by integrals
so
Z
E(X) = xf (x)dx =
Z
V (X) = (x )2 f (x)dx = 2 :
86
sample mean will be approximately normal, whatever the distribution of the orig-
inal variable and that this approximation to normality will get better the larger
the sample size. The normal distribution is completely de…ned by a mean (…rst
moment) and variance (second moment), it has a coe¢ cient of skewness (third
moment) of zero and a coe¢ cient of kurtosis (fourth moment) of three. The
standard deviation is the square root of the variance. For a normal distribution
roughly two thirds of the observations lie within one standard deviation of the
mean and 95% lie within two standard deviations of the means.
Many economic variables, e.g. income or …rm size, are not normally distributed
but are very skewed and not symmetrical. However, the logarithm of the variable
is often roughly normal. This is another reason we often work with logarithms of
variables in economics.
Suppose that we have a random variable Y which is normally distributed with
expected value E(Y ) = and variance
V (Y ) = E(Y E(Y ))2 = E(Y )2 = 2
87
This is called the standard normal, has expected value zero and variance (and
standard deviation) of one (like any standardised variable) and is tabulated in
most statistics and econometrics books. Barrow Table A2 gives the table of 1
F (z) = P (Z > z) for values of z > 0: So from the table in Barrow P (Z > 0:44) =
0:33: Read down the …rst column till 0:4 and then go across the row to the 0:04
column. Since the normal distribution is symmetric P (Z > z) = P (Z < z): So
P (Z < 0:44) = 0:33 also.
The standard normal is useful because we can always convert from Y to z
using the formula above and convert from z back to Y using
Yi = z i + :
The distribution has a bell shape and is symmetric with mean=median=mode.
The formula for the normal distribution is
1 1 yi
f (yi ) = p expf ( )2 g
2 2 2
z2
f (zi ) = (2 ) 1=2 exp i
2
where zi = (yi )= is N (0; 1) standard normal. The normal distribution is the
1=2
exponential of a quadratic. The (2 2 ) makes it integrate (add up) to unity,
which all probability distributions should.
88
greater than the mean. There is an 84% chance, of getting a value less than
the mean plus one standard deviation, z = 1. The chance of being within one
standard deviation of the mean is P ( 1 < Z < +1) = 0.6826=0.8413-(1-0.8413).
There is a 16% (1-0.84) chance of being less than one standard deviation below
the mean, and a 16% chance or more than one standard deviation above the
mean. The chance of being more than two standard deviations from the mean
is 0.0456=2(1-0.9772), roughly 5%. Strictly 95% of the normal distribution lies
within 1.96 standard deviations from the mean, but 2 is close enough for most
practical purposes.
Yt N (10%; (15%)2 );
89
students are also less erratic with a standard deviation of 5 compared to a standard
deviation of 10 for green students.
(a)What proportion of blue students get more than 70?
(b) What proportion of green students get more than 70?
(c) Of those who get over 70 what proportion are green and what proportion
are blue?
Answer
We have B N (60; 52 ); G N (55; 102 )
(a) We want to …nd the probability that the mark is over 70. For Blue students
70 60
z= =2
5
so the probability of a mark over 70 is P (Z > 2) = 1 P (Z < 2) = 1 0:9772 =
0:0228 or 2.28%. The 0.9772 came from the table of areas under a normal
distribution.
(b) For Green Students
70 55
z= = 1:5
10
so the probability of a mark over 70 is P (Z > 1:5) = 1 P (Z < 1:5) = 1
0:9332 = 0:0668 or 6.68%. The 0.9332 came from the table of areas under a
normal distribution.
(c) In a large class with equal number of blue and green students, 4.48%
of all students, (2.28+6.68)/2, would get over 70. The proportion of those
that are blue is 25% (=2.28/(2.28+6.68)), the proportion that are green is 75%
(=6.68/(2.28+6.68)).
Even though the question says blues are better at maths and it is true that
their average is higher, three quarters of the top group in maths are green (as
are three quarters of the bottom group). The lesson is to think about the whole
distribution, not just the averages or parts of the distribution (e.g. the top of the
class), and try not to be in‡uenced by value-laden descriptions: ‘better’or ‘less
erratic’.
90
12.3.1. Chi-squared
Suppose zi is IN (0; 1), independently distributed, standard normal and we form
X
n
A= zi2 2
(n)
i=1
12.3.2. t distribution
A standard normal divided by the square root of a Chi-squared distribution, di-
vided by its degrees of freedom, say n, is called the t distribution with n degrees
of freedom r
2 (n)
t(n) = z=
n
We often divide an estimate of the mean or a regression coe¢ cient (which are
normally distributed) by their standard errors (which are the square root of a 2
divided by its degrees of freedom, and this is the formula for doing this. The t dis-
tribution has fatter tails than the normal, but as the sample size gets larger, about
30 is big enough, the uncertainty due to estimating the standard error becomes
small and the distribution is indistinguishable from a normal. It is sometimes
called the Student’s t distribution. W.S. Gosset, who discovered it, worked for
Guiness and because of a company regulation had to publish it under a pseudonym,
and he chose Student.
91
12.3.3. F distribution
Fisher’s F distribution is the ratio of two independent Chi-squared divided by
their degrees of freedom.
2
(n1 )=n1
F (n1 ; n2 ) = 2 (n
:
2 )=n2
V0 = 100
V1 = 110
V2 = 121
V3 = 133:1
For other examples, V might be GDP (growth rates), a price index (in‡ation), etc.
The gross rate of return in the …rst year is (1 + r1 ) = V1 =V0 = 1:1: The (net) rate
of return in the …rst year is r1 = (V1 V0 )=V0 = V0 =V1 1 = 0:1; the percentage
rate of return is 100r1 = 10%: Be aware of the di¤erence between proportionate,
0:1 and percentage, 10%; rates of interest and return. Interest rates are also often
expressed in basis points, 100 basis points is one percentage point.
From the de…nition of the gross return, (1 + r1 ) = V1 =V0 ; we can write V1 =
(1 + r)V0 : The rate of return in this example is constant at 10%, ri = r = 0:1:
Check this by calculating r2 and r3 : The value of the investment in year 2 is
V2 = (1 + r)V1 = (1 + r)2 V0
92
and for year t
Vt = (1 + r)t V0 : (13.1)
Notice how interest compounds, you get interest paid on your interest. Interest
rates are often expressed at annual rates even when they are for shorter or longer
periods. If interest was paid out quarterly during the year, you would get 2:5% a
quarter, not 10% a quarter. However it would be paid out four times as often so
the formula would be
Vt = (1 + r=4)4t V0
or if it is paid out n times a year
Vt = (1 + r=n)nt V0 :
Vt = ert V0 : (13.2)
The irrational number e 2:718 seems to have been discovered by Italian bankers
doing compound interest in the late middle ages. Since ln Vt = rt + ln V0 the
continuously compounded return is
d ln V 1 dV
= = r:
dt V dt
For discrete data this can be calculated as
r = (ln Vt ln V0 ) =t:
r = exp(fln Vt ln V0 g=t) 1
In addition
Vt
ln Vt ln Vt 1 = ln = ln(1 + rt ) rt
Vt 1
if r is small, e.g. 0:1 another justi…cation for using the di¤erence of the logarithms.
Growth rates and in‡ation rates are also calculated as di¤erences in the logarithms.
Multiply them by 100 if you want percentage rates.
93
So far we have assumed the rate of return is constant. Suppose instead of
investing in a safe asset with a …xed return we had invested in a speculative
equity and the values of our asset in 2000, 2001, and 2002 were
V0 = 100
V1 = 1000
V2 = 100
so r1 = 1000=100 1 = 9 = 900%; r2 = 100=1000 1 = 0:9 = 90%. The
average (arithmetic mean) return is (900 90)=2 = 405%: This example brings out
two issues. Firstly percentages are not symmetric. Our investment got back to
where it started after a 900% increase and only a 90% fall. Secondly, the arithmetic
mean does not seem a very good indicator of the average return (except perhaps
to the salesman who sold us the stock). We get an average return of 405% and
have exactly the same amount as when we started. The geometric mean return
in this case is zero
p
GM = (1 + r1 )(1 + r2 ) 1
p
= (10)(0:1) 1
p
= 1 1=0
which seems more sensible.
There are also interest rates at various maturities, depending on how long the
money is being borrowed or lent. The pattern of interest rates with respect to
maturity is called the term structure of interest rates or yield curve. Typically
the term structure slopes upwards. Long-rates, interest rates on money borrowed
for a long period of time, such as 10 year government bonds, are higher than
short rates, money borrowed for a short period of time, such as 3 month Treasury
Bills. Interest rates are usually expressed at annual rates, whatever the length
of the investment. When monetary policy is tight, the term structure may slope
downwards, the yield curve is inverted: short-rates are higher than long-rates.
This is often interpreted as a predictor of a forthcoming recession. Monetary
policy is operated by the Central Bank through the control of a short overnight
interest rate called the policy rate, Repo rate, Bank Rate or in the US Federal
Funds Rate. Usually other short rates such as LIBOR (London Inter-Bank O¤er
Rate, the rate at which banks lend to each other) are very close to the policy rate.
However, during the credit runch starting in August 2007 they diverged: Banks
required a risk premium to lend to other banks.
94
13.2. Exchange Rates
13.2.1. Spot and forward
The spot exchange rate is the rate for delivery now: the exchange takes place
immediately. The spot exchange rate is usually quoted as domestic currency per
unit of foreign currency, with the dollar being treated as the foreign currency:
Swiss Francs per Dollar for instance. A rise indicates a depreciation in the Swiss
Franc: more Swiss Francs are needed to buy a dollar. Some are quoted as foreign
currency per unit domestic, in particular Sterling, which is quoted Dollars per
Pound. In this case a rise indicates an appreciation of the Pound, a pound buys
more dollars. Forward rates are for delivery at some time in the future. The one
year forward rate is for delivery in a years time when the exchange takes place at
a rate quoted and agreed upon today.
95
ential. This relation is called covered interest parity and follows from arbitrage.
If it did not hold banks could make a riskless return by exploiting the di¤erence.
Notice that the forward rate on 27/7/4 indicated that the market expected sterling
to fall in value over the next year. In fact because it is determined by the interest
rate di¤erential, the forward rate tends to be a poor predictor of the future spot
rate. In this case the rate on 29/7/5 was $1.759, so the forward rate was quite a
good predictor. Verbeek section 4.11 gives some empirical examples of the use of
these relationships.
96
is just one of a large number of possible samples that we might have taken. This
exercise is designed to illustrate the idea of a sampling distribution. You do not
need any data for this exercise, you create it yourself.
Go into Excel and in cell A1 type =RAND( ). This generates a random number
uniformly distributed over the interval zero-one. Each number over that interval
has an equal chance of being drawn. Copy this cell right to all the cells till O1.
In P1 type =AVERAGE(A1:O1). In Q1 type =STDEVP(A1:O1). In R1 type
=STDEV(A1:O1). Copy this row down to line 100.
You now have 100 samples of 15 observations from a uniform distribution and
100 estimates of the mean and 100 estimates for each of two estimators of the
standard deviation. An estimator is a formula which tells you how to calculate an
estimate from a particular sample; an estimate is the number that results from
applying the estimator to a sample.
We can then look at the sampling distribution of the mean and standard
deviation. Calculate and draw the histogram for the 100 estimates of the mean.
Do the same for the two estimators of the standard deviation. What do you think
of their shapes? Are they close to a normal distribution? Go to P101, type in
=AVERAGE(P1:P100). Go to P102 type in =STDEV(P1:P100). Go to P103
type in =SKEW(P1:P100). Go to P104 type in KURT(p1:p100). Copy these to
the Q and R columns to give the descriptive statistics for the two estimates of the
standard deviation.
If x is uniformly distributed over [a; b] then E(x) = (a + b)=2 and V ar(x) =
(b a)2 =12: In this case a = 0; b = 1, so the theoretical mean should be 0.5
(compare this with the number
p in cell P101) and the theoretical variance 1/12,
with standard deviation 1=12 = 0.288675 (compare this with the number in
Q101, which should be biased downwards and with the number in R101, which
should be closer). The standard deviation of the mean from a sample of size N
(in this case 15) is p
SD(x) = V ar(x)=N
so we would expect the standard deviation of our distribution of means to be
0.07453 (compare this with the number in P102). As N becomes large (the num-
ber of observations in each sample, 15 in this case which is not very large), the
distribution of the mean tends to normality (the central limit theorem). Do the
measures of skewness and excess kurtosis given in P103 and P104 suggest that
these means are normal? The values should be close to zero for normality. Is the
mean more normally distributed than the standard deviation?
What we have done here is called a ‘Monte Carlo’ simulation. We have ex-
97
amined the properties of the estimators by generating lots of data randomly and
looking at the distributions of the estimates from lots of samples. In practice, 100
replications of the sample is rather small, in Monte Carlo studies many thousands
of replications are typically used. Because 100 is quite small, you will get rather
di¤erent answers (e.g. for the overall mean) from me. However, by making the
number of replications su¢ ciently large, we could make the di¤erence between
you and me as small as we like (law of large numbers).
In this case, we did not need to do a Monte Carlo because we can derive the
properties of the estimators theoretically. But for more complicated problems this
is not the case and we must do it numerically like here. However, doing the Monte
Carlo gives you a feel for what we mean when we discuss the distribution of an
estimator.
15. Estimation
15.1. Introduction
In the …rst part of the course we looked at methods of describing data, e.g. using
measures like the mean (average) to summarise the typical values the variable
took. In the second part of the course, we learned how to make probability
statements. Now we want to put the two together and use probability theory to
judge how much con…dence we have in our summary statistics. The framework
that we will use to do this is mathematical, we will make some assumptions and
derive some results by deduction. Chapter 4 and 5 of Barrow covers these issues.
There are a number of steps.
1. We start with a model of the process that generates the data. For instance,
the e¢ cient market theory says that the return on a stock in any period t,
Yt = + ut Where Yt is the return, which we can observe from historical
data, is the expected return, an unknown parameter, and ut is an unpre-
dictable random error that re‡ects all the new information in period t: We
make assumptions about the properties of the errors ut : We say that the
error ut is ‘well behaved’when it averages zero, E(ut ) = 0; is uncorrelated
through time E(ut ut 1 ) = 0; and has constant variance, E(u2t ) = 2 :
98
hat) of ; that gives Yt = b + u
bt : (1) method of moments, which chooses the
estimator that makes
P our population assumptions, e.g. E(ut ) = 0; hold in
the sample so N 1 u bt = 0 (2) least squares,Pwhich chooses the estimator
that has the smallest variance and minimises b2t . In the cases we look at
u
these two procedures give the same estimator, but this is not generally true.
3. We then ask how good the estimator is. To do this we need to determine
what the expected value of the estimator is and the variance of the esti-
mator, or its square root: the standard error. We then need to estimate
this standard error. Given our assumptions, we can derive all these things
mathematically and they allow us to determine how con…dent we are in our
estimates. Notice the square root of the variance of a variable is called its
standard deviation, the square root of the variance of an estimator is called
its standard error.
4. We then often want to test hypotheses. For instance, from Applied Exercise
I, we found that the mean real return on equities over the period 1872-1999
was 0.088 (8.8%) with a standard deviation of 0.184; but the mean 1872-
1990 was only 0.080, while the return during 1991-1999 was 0.18 (18%) with
a standard deviation of 0.15. In the year 2000, you might have wanted
to ask whether there really was a New Economy, with signi…cantly higher
returns (over twice the historical average) and lower risk (a lower standard
deviation); or whether you might just get the numbers observed during the
1990s, purely by chance.
We will go through this procedure twice, …rst for estimating the sample mean
or expected value and testing hypotheses about it, and then follow the same
procedure for regression models, where the expected value is not a constant, but
depends on other variables.
15.1.1. A warning
The procedures we are going to cover are called classical statistical inference and
the Neyman-Pearson approach to testing. When …rst encountered they may seem
99
counter-intuitive, complicated and dependent on a lot of conventions. But once
you get used to them they are quite easy to use. The motivation for learning these
procedures is that they provide the standard approach to dealing with quantita-
tive evidence in science and other areas of life, where they have been found useful.
However, because they are counter-intuitive and complicated it is easy to make
mistakes. It is claimed that quite a large proportion of scienti…c articles using
statistics contain mistakes of calculation or interpretation. A common mistake is
to confuse statistical signi…cance with substantive importance. Signi…cance just
measures whether a di¤erence could have arisen by chance it does not measure
whether the size of the di¤erence is important. There is another approach to
statistics based on Bayes Theorem. In many ways Bayesian statistics is more in-
tuitive, since it does not involve imagining lots of hypothetical samples as classical
statistics does. It is conceptually more coherent, since it just involves using your
new data to update your prior probabilities in the way we did in section 10.2.2.
However, it is often mathematically more complex, since it usually involves inte-
grals. Modern computers are making this integration easier. Gary Koop, Bayesian
Econometrics, Wiley 2003 provides a good introduction.
It is important to distinguish two di¤erent things that we are doing. First, in
theoretical statistics we are making mathematical deductions: e.g. proving that
an estimator has minimum variance in the class of linear unbiased estimators.
Second, in applied statistics, we making inductions, drawing general conclusions
from a particular set of observations. Induction is fraught with philosophical
di¢ culties. Even if every swan we see is white, we are not entitled to claim ‘all
swans are white’, we have not seen all swans. But seeing one black swan does
prove that the claim ‘all swans are white’is false. Given this, it is not surprising
that there are heated methodological debates about the right way to do applied
statistics and no ‘correct’rules. What is sensible depends on the purpose of the
exercise. Kennedy, A Guide to Econometrics, …fth edition chapter 21 discusses
these issues.
Yt = + ut
where ut is a random variable with mean zero and variance 2 and the observa-
tions are uncorrelated or independent through time; i.e. E(ut ) = 0, E(u2t ) = 2 ,
100
E(ut ut i ) = 0: Notice the number of observations here is T , earlier we used N or
n for the number of observations. We wish to choose a procedure for estimating
the unknown parameter from this sample. We will call the estimator b (said
alpha hat). We get an estimate by putting in the values for a particular sample
into the formula. We derive the estimator b in two ways: method of moments
which matches the sample data to our population assumptions and least squares
which mimimises the variance.
X
T
S = (Yt b )2
t=1
X
= (Yt2 + b 2 2b Yt )
X X
= Yt2 + T b 2 2b Yt
To …nd the b that minimises this, we take the …rst derivative of S with respect
to b and set it equal to zero:
@S X
= 2T b 2 Yt = 0:
@b
PT
Divide through by 2, move the t=1 Yt to the other side of the equality,
101
P
gives T b = Yt or
X
T
b= Yt =T:
t=1
so again b = Y :
102
15.3.2. The variance and standard error of the mean b
2
P of b ; say V (b ) = E(b E(b )) = E(b
The variance )2 since E(b ) = . Since
b = ut =T X
E(b )2 = E( ut =T )2
The right hand side can be written
u1 u2 uT u1 u2 uT
=E + + ::: + + + ::: +
T T T T T T
This product will have T 2 terms. There are T terms with squares like u21 , and
T 2 T terms with cross-products like u1 u2 . The expectation of the squares are
E(u2t )=T 2 = 2 =T 2 , since the variance of the ut , E (u2t ) = 2 ; is assumed constant
for all t. There are T terms like this, so the sum is T ( 2 =T 2 ) = 2 =T: The
expectations of the cross products are of the form E(ut ut j )=T 2 : But since the
errors are assumed independent E(ut ut i ) = 0; for i 6= 0, so the expectation of
all the cross-product terms equals zero. Thus we have derived the variance of the
mean, which is:
2
V (b ) = E(b E(b ))2 =
T
where T is the number of observations. p
The square root of the variance = T is called the standard error of the
mean. It is used to provide an indication of how accurate our estimate is. Notice
when we take the square root of the variance of a variable we call it a standard
deviation; when we take the square root of a variance of an estimator, we call
it a standard error. They are both just square roots of variances.
103
The proof of this is simple, but long, so we do not give it. The second estimator,
s2 ; sometimes called the sample variance, is an unbiased estimator. The bias
arises because we use an estimate of the mean and the dispersion around the
estimate is going to be smaller than the dispersion around the true value because
the estimated mean is designed to make the dispersion as small as possible. If we
used the true value of there would be no bias. The correction T 1 is called the
degrees of freedom: the number of observations minus the number of parameters
estimated, one in this case, b . We estimate the standard error of the mean by
s
SE(b ) = p
T
On the assumptions that we have made it can be shown that the mean is the
minimum variance estimator of the expected value among all estimators which
are linear functions of the Yi and are unbiased. This is described as the mean
being the Best (minimum variance) Linear Unbiased Estimator (BLUE) of the
expected value of Y: This is proved later in a more general context, but it is a
natural result because we chose this estimator to minimise the variance.
15.3.5. Summary
So far we have (1) found out how to estimate the expected value of Y = E (Y )
by the mean; (2) shown that if the expected value of the errors is zero the mean
is an unbiased estimator, (3) shown that if the errors also have constant variance
2
and are independent, the variance of the mean is 2 =T where T is the number
104
of observations
p (4) shown that the standard error of the mean can be estimated
by s= T ; where s2 is the unbiased estimator of the variance and claimed (5) that
the mean had the minimum variance possible among linear unbiased estimators
(the Gauss-Markov theorem) and (6) that for large T the distribution of b ; will
be normal whatever the distribution of Y; (the central limit theorem).
105
16. Con…dence intervals and Hypothesis Tests
Earlier we noted that if a variable was normally distributed, the mean plus or
minus two standard deviations would be expected to cover about 95% of the
observations. This range, plus or minus two standard deviations is called a 95%
con…dence interval. In addition to constructing con…dence intervals for a variable
we also construct con…dence intervals for our estimate of the mean, where we use
the standard error of the mean instead of the standard deviation. Barow Chapter
4 discusses these issues.
Notice
p that we are using the mean calculated over 9 years, 1991-99, so we divide by
9; not by the number of years that we used to calculated the standard deviation.
If our estimate of the mean is normally distributed, with a true expected value
0 = 8% we would expect there to be a 95% chance of the estimate falling into
the range
1:96se(b ) = 8 1:96 (6:13) = 8 12:
Thus we would be 95% con…dent that the range 4 to+20 would cover the true
value: There is a 2:5% chance that a 9 year average would be above 20% and a
106
2:5% chance it would be below -4%. The historical estimate falls in this range
so even if the true expected return is equal to its historical value, 8%; we would
expect to see 9 year average returns of 18% just by chance, more than 5% of the
time. Suppose the true expected value is 8%, what is the probability of observing
18% or more? We can form
b 18 8 10
z= = = = 1:63
se(b ) 6:13 6:13
Using the tables of the normal distribution we …nd P (Z 1:63) = 0:0516: Just
over 5% of the time, we will observe periods of 9 years with mean returns of 18%
or greater, if expected return is constant at 8%.
This assumes everything is normally distributed, if the distribution had heavier
tails than a normal, these probabilities would be a little larger. We could also
centre our con…dence interval on b rather than 0 and report b 1:96se(b ): This
would be 18 12 the range, 6 to 30: Notice this con…dence interval covers the
historical mean.
We would conclude that, assuming normality, at the 95% level, the 1990s
return is not statistically signi…cantly di¤erent from the historical mean.
So
2
b N( ; =T )
p p
s(b ) = = T is the standard error of b and can be estimated by, s( d) = s= T ;
where rP
(Yt b )2
s=
T 1
From tables of the normal distributions, we know that it is 95% certain that b
will be within 1.96 standard errors of its true value : The range b 1:96s(b ) is
107
called the 95% con…dence interval. The 68% con…dence interval is b s(b ) : the
range covered by the estimate plus and minus one standard error will cover the
true value just over two thirds of the time. If that con…dence interval covers some
hypothesised value 0 ; then we might be con…dent that the true value could be
0 : If b is more than about 2 standard errors from the hypothesised value, 0 ; we
think it unlikely that the di¤erence could have occurred by chance (there is less
than a 5% chance) and we say the di¤erence is statistically signi…cant. That is
we calculate the test statistic
b 0
=
d)
s(
and reject the null hypothesis that = 0 at the 5% level if the absolute value
of the test statistic is greater than 1.96. We can also calculate the ‘p value’, the
probability of getting the observed value of if the hypothesis was true and reject
the hypothesis if the p value is less than 0.05.
108
the variance was large and we could use the normal distribution. If we had used
the estimate of the standard deviation from the 1990s the number of observations
would have been small, 9, and we would have had to use the t distribution.
16.4. Testing
In testing we start with what is called the null hypothesis, H0 : = 0 : It is called
null because in many cases our hypothesised value is zero, i.e. 0 = 0: We reject
it in favour of what is called the alternative hypothesis, H1 : 6= 0 ; if there is
very strong evidence against the null hypothesis. This is a two sided alternative,
we reject if our estimate is signi…cantly bigger or smaller. We could also have one
sided alternatives < 0 or > 0 : The convention in economics is to use two
sided alternatives.
The problem is how do we decide whether to reject the null hypothesis or not
to reject the null hypothesis. In criminal trials, the null hypothesis is that the
defendant is innocent. The jury can only reject this null hypothesis if the evidence
indicates guilt “beyond reasonable doubt”. Even if you think the defendant is
probably guilty (better than 50% chance) you have to acquit, this is not enough.
In civil trials juries decide “on the balance of the evidence”, there is no reason
to favour one decision rather than another. So when OJ Simpson was tried for
murder, a criminal charge, the jury decided that the evidence was not beyond
reasonable doubt and he was acquitted. But when the victims family brought a
civil case against him, to claim compensation for the death, the jury decided on
the balance of the evidence that he did it. This di¤erence re‡ects the fact that
losing a criminal case and losing a civil case have quite di¤erent consequences.
Essentially the same issues are involved in hypothesis testing. We have a null
hypothesis, defendant is innocent. We have an alternative hypothesis that the
defendant is guilty. We can never know which is true. There are two possible
decisions. Either accept the null hypothesis (acquit the defendant) or reject the
Null hypothesis (…nd the defendant guilty). In Scotland the jury has a third
possible verdict: not proven. Call the null hypothesis, H0 ; this could be defendant
innocent or = 0 . Then the possibilities are
H0 true H0 false
Accept H0 Correct Type II error
Reject H0 Type I error Correct
In the criminal trial Type I error is convicting an innocent person. Type II error
is acquitting a guilty person. Of course, we can avoid Type I error completely:
109
always accept the null hypothesis: acquit everybody. But we would make a lot of
type II errors, letting guilty people go. Alternatively we could make type II errors
zero, convict everybody. Since we do not know whether the null hypothesis is true
(whether OJ is really innocent), we have to trade o¤ the two risks. Accepting the
null hypothesis can only be tentative, this evidence may not reject it, but future
evidence may.
Statistical tests design the test procedure so that there is a …xed risk of Type
I error: rejecting the null hypothesis when it is true. This probability is usually
…xed at 5%, though this is just a convention.
So the procedure in testing is
1. Specify the null hypothesis, = 0 :
2. Specify the alternative hypothesis 6= 0 :
3. Design a test statistic, which is only a function of the observed data and
the null hyposthesis, not a function of unknown parameters
b 0
=
d)
s(
4. Find the distribution of the the test statistic if the null hypothesis is true.
In this case the test statistic, , has a t distribution in small samples (less than
about 30), a normal distribution in large samples.
5. Use the distribution to specify the critical values, so that the probability of
b being outside the critical values is small, typically 5%.
6. Reject the null if it is outside the critical values, (in this case outside the
range 2); do not reject the null otherwise.
7. Consider the power of the test. The power is the probability of rejecting
the null hypothesis when it is false (1 P (typeIIerror)), which depends on the
true value of the parameters.
In the medical example, of screening for a disease, that we used in section 11,
we also had two types of errors (false positives and false negatives), and we had
to balance the two types of error in a similar way. There we did it on the basis
of costs and bene…ts. When the costs and bene…ts can be calculated that is the
best way to do it. In cases where the costs and bene…ts are not known we use
signi…cance tests.
Statistical signi…cance and substantive signi…cance can be very di¤erent. An
e¤ect may be very small of no importance, but statistically very signi…cant, be-
cause we have a very large sample and a small standard error. Alternatively,
an e¤ect may be large, but not statistically signi…cant because we have a small
110
sample and it is imprecisely estimated. Statistical signi…cance asks: ‘could the
di¤erence have arisen by chance in a sample of this size?’ not ‘is the di¤erence
important?’
When we discussed con…dence intervals we said that the 68% con…dence inter-
val is b s(b ) : the range covered by the estimate plus and minus one standard
error will cover the true value, ; just over two thirds of the time. There is a
strong temptation to say that the probability that lies within this range is two
thirds. Strictly this is wrong, is …xed not a random variable, so there are no
probabilities attached to : The probabilities are attached to the random variable
b ; which di¤er in di¤erent samples. Bayesian statistics does treat the parame-
ters as random variables, with some prior probability distribution; uses the data
to update the probabilities; and does not use the Neyman-Pearson approach to
testing set out above.
111
(d) Probability of a return less than -50%? z=(–50-10)/20=-3. Distribution is
symmetrical so P (Z < 3) = P (Z > 3) = 1 P (Z < 3) : Prob=1-0.9987=0.0013
or 0.13%
(e) Probability of a positive return:
z=(0-10)/20=-0.5; P(Z>-0.5)=P(Z <0.5)=0.6915 or 69%.
Notice the importance of whether we are p using the standard deviation of re-
turns or the standard error of the mean = T :
112
where qi = 1 pi : Our null hypothesis is p1 p2 = 0: If the null hypothesis is true
there is no di¤erence, then our best estimate of p = p1 = p2 is (18+22)/100=0.4.
and the standard error is
r
0:4 0:6 0:4 0:6
se(b
p) = + 0:1
50 50
our test statistic is then
pb1 pb2 0:36 0:44
= = = 0:8
se(b
p) 0:1
This is less than two in absolute value, so we would not reject the null hypothesis
that the proportion who died was the same in the treatment and control group.
The di¤erences could have easily arisen by chance. To check this we would need
to do a larger trial. Barrow chapter 7 discusses these issues.
It should not make a di¤erence, but in practice how you frame the probabilities,
e.g. in terms of proportion who die or proportion who survive, can in‡uence how
people respond.
113
disturbance, which re‡ects the factors that shift the return on stock i other than
movements of the whole market. The riskiness of a stock is measured by ; if
= 1 it is as volatile as the market; if > 1; it is more volatile than the market;
if < 1; it is less volatile than the market. The riskier the stock, the higher
the return required relative to the market return. Given data on Rti ; Rt and Rtm ;
for time periods t = 1; 2; :::; T we want to estimate and for the stock and
determine how much of the variation of the stock’s returns can be explained by
variation in the market. Verbeek, section 2.7 discusses this example in more detail.
rt = Rt t
suppose the real interest rate is roughly constant, equal to a constant plus a
random error
rt = r + ut
then we can write
Rt = rt + t =r+ t + ut :
Then if we ran a regression
Rt = + t + ut
Yt = + Xt + ut ;
and we will continue to assume that ut is a random variable with expected value
zero and variance 2 and the observations are uncorrelated or independent through
time; i.e. E(ut ) = 0, E(u2t ) = 2 , E(ut ut i ) = 0: We will further assume that the
114
independent variable varies, V ar(Xt ) 6= 0; and is independent of the error so that
the covariance between them is zero Ef(Xt E(Xt ))ut g = 0: If we can estimate
and ; by b and b; then we can predict Yt for any particular value of X:
Ybt = b + bXt
these are called the …tted or predicted values of the dependent variable. We can
also estimate the error:
bt = Yt
u Ybt = Yt (b + bXt )
these are called the residuals. Notice we distinguish between the true unobserved
bt the estimates of the errors.
errors, ut ; and the residuals, u
As with the expected value above there are two procedures that we will use to
derive the estimates, method of moments and least squares.
b = Y bX
Yt = b + bXt + u
bt = Y bX + bXt + u
bt
115
using lower case letters to denote deviations from the mean, this is.
yt = b x t + u
bt (17.1)
This says that our estimate of b is the ratio of the (population) covariance of Xt
and Yt to the variance of Xt ; (remember lower case letters denote deviations from
the means).
P
the derivative of b2t with respect to b is
u
P 2 X X
@ u bt
= 2b x2t 2 xt yt = 0 (17.2)
@b t t
116
since squares are positive, so this is a minimum.
Our estimates
b = Y bX
P
b = (Xt X)(Yt Y )
P
(Xt X)2
(i) make the sum of the estimated residuals zero and the estimated residuals
uncorrelated with the explanatory variable and (ii) minimise the sum of squared
residuals.
using the same sort of argument as in deriving the standard error of the mean
above.
117
Notice that covariance and correlation only measure linear relationships. You
could have an exact non-linear relationship (e.g. a circle) and the correlation
would be zero. In regression we use the square of the correlation coe¢ cient, r2 ,
usually written R2 and called the coe¢ cient of determination. This gives you the
proportion of the variation in Y that has been explained by the regression.
We measure the dispersion around the line in exactly the same way that we
measured the dispersion around the mean, either using the biased estimator
X
b2 = b2t =T
u
Show this is the same as r2 de…ned above. If b = 0; then nothing has been
bt = Yt b , where b is just the mean and R2 = 0:
explained: u
2
Computer packages also often calculate adjusted R2 or R (R bar squared).
This corrects the numerator and denominator for degrees of freedom:
P 2
2 bt =(T k)
u
R =1 P
(Yt Y )2 =(T 1)
where k is the number of explanatory variables, including the intercept. This can
be negative.
118
From our estimates we can also calculate the predicted value of Y for a par-
ticular X:
Ybt = b + bXt
and the residual or unexplained part of Y for a particular observation is:
bt = Yt
u Ybt = Yt (b + bXt )
119
addition we add another assumption to the model, that the errors are normally
distributed, then our estimates will also be normally distributed and we can use
this to construct test statistics to test hypotheses about the regression coe¢ cients.
Even if the errors are not normally distributed, by the central limit theorem our
estimates will be normally distributed in large samples; in the same way that the
mean is normally distributed whatever the distribution of the variable in large
samples.
We need the assumption that X is exogenous, to make causal statements
about the e¤ect of X on Y: When we are only interested in predicting Y , as in the
height-weight example, we do not need the exogeneity assumption and have the
result that the least squares prediction Ybt is the Best (minimum variance) Linear
Unbiased Predictor of Yt :
120
assumptions we made about the errors hold for the estimated residuals. We can
test whether our assumption that E(u2t ) = 2 a constant, holds in the data. We
can also test whether E(ut ut i ) = 0; for i not equal 0. This is the assumption of
independence, or no serial correlation or no autocorrelation. We might also have
assumed that ut is normally distributed and we can test whether the skewness
and kurtosis of the residuals are those of a normally distributed variable. We will
discuss how we test these assumptions later, but in many cases the best way to
check them is to look at the pictures of the actual and predicted values and the
residuals. The residuals should look random, with no obvious pattern in them
and the histogram should look roughly normal.
What do you do if you have unhealthy residuals, that show serial correlation
or heteroskedasticity? The text books tend to suggest that you model the dis-
turbances, typically by a procedure called Generalised Least Squares. However,
in most cases the problem is not that the true disturbances are heteroskedastic
or serially correlated. The problem is that you have got the wrong model, and
the error in the way you speci…ed the model shows up in the estimated residuals.
Modelling the disturbances often is just treating the symptoms, the solution is to
cure the disease: get the model speci…ed correctly.
ln Qt = 1 + 2 ln Yt + 3 ln Pt + 4 ln Pt + ut (18.1)
where Qt is quantity demanded, Yt real income and Pt the price of the good, Pt a
measure of the price of all other goods, and ln denotes natural logarithms. Given
the log equation then 2 is the income elasticity of demand (the percentage change
in demand in response to a one percent change in income), which we would expect
to be positive, and 3 the own price elasticity, which we expect to be negative and
4 is the cross-price elasticity, which for all other goods should be positive. It is
standard to use logarithms of economic variables since (a) prices and quantities
are non-negative so the logs are de…ned (b) the coe¢ cients can be interpreted
as elasticities, so the units of measurement of the variables do not matter (c) in
many cases errors are proportional to the variable, so the variance is more likely
121
to be constant in logs, (d) the logarithms of economic variables are often closer
to being normally distributed (e) the change in the logarithm is approximately
equal to the growth rate and (f) lots of interesting hypotheses can be tested in
logarithmic models. For instance in this case if 3 = 4 (homogeneity of degree
zero)only relative prices matter. Notice the original model is non-linear
Qt = BYt 2 Pt 3 Pt 4
exp(ut )
Notice output will be zero if either capital or labour are zero. We can make this
linear by taking logarithms
ln Qt = ln A + b ln Kt + c ln Lt + dt + ut : (18.2)
ln Qt ln Lt = ln A + b [ln Kt ln Lt ] + (b + c 1) ln Lt + dt + ut (18.3)
and do a t test on the coe¢ cient of ln Lt ; which should be not signi…cantly di¤erent
from zero if there is constant returns to scale. Notice (18:2) and (18:3) are identical
statistical equations, e.g. the estimates of the residuals would be identical.
122
X3t to ln Pt ; X4t to ln Pt : The problem is the same as before. We want to P
…nd the
estimates of i ; i = 1; 2; 3; 4 that minimise the sum of squared residuals, b2t :
u
X X
b2t =
u (yt b1 b2 X2t b3 X3t b4 X4t )2
we have to multiply out the terms in the brackets and take the summation in-
side and derive the …rst order conditions, the derivatives with respect to the four
parameters. These say that the residuals should sum to zero and be uncorre-
lated all four Xit : The formulae, expressed as summations are complicated. It is
much easier to express them in matrix form. Verbeek Appendix A reviews matrix
algebra.
We can write this is vector form
0
Yt = Xt + ui
y X u
= + :
(T 1) (T 4) (4 1) (T 1)
This gives us a set of T equations. Notice, in writing Xit ; we have departed from
the usual matrix algebra convention of having the subscripts go row column. This
generalises to the case where X is a T k matrix and a k 1 vector, whatever
k: Notice that for matrix products, the inside numbers have to match for them to
be conformable and the dimension of the product is given by the outside numbers.
18.3. Assumptions
We now want to express our assumptions about the errors in matrix form. The
assumptions were: (a) that E(ut ) = 0, on average the true errors are zero; (b)
that E(u2t ) = 2 , errors have constant variance; and (c) E(ut ut i ) = 0; for i 6= 0;
123
di¤erent errors are independent. The …rst is just that the expected value of
the random T 1 vector u is zero E(u) = 0: To capture the second and third
assumptions, we need to specify the variance covariance matrix of the errors,
E(uu0 ) a T T matrix. u0 is the transpose of u; a 1 T vector. The transpose
operation turns columns into rows and vice versa. Note u0 u is a scalar, 1 1 the
sum of squared errors. Writing out E(uu0 )and putting our assumptions in:
2 3
E(u21 ) E(u1 u2 ) :: E(u1 uT )
6 E(u1 u2 ) E(u22 ) :: E(u2 uT ) 7
E(uu0 ) = 6 4
7
5
:: :: :: ::
E(u1 uT ) E(u2 uT ) :: E(u2T )
2 2 3
0 :: 0
6 0 2
:: 0 7
= 6 4 :: :: :: :: 5
7
0 0 :: 2
18.4. Estimating b
As before we will consider two methods for deriving the estimators method of
moments and least squares. This time for the model y = X + u; where y and u
are T x1 vectors, is a k 1 vector and X a T k matrix.
124
18.4.1. Method of moments
Our exogeneity assumption is E(X 0 u) = 0; the sample equivalent is X 0 u
b = 0, a
k 1 set of equations, which for the case k = 4 above, gives
X
bt = 0;
u
X
bt = 0;
X2t u
X
bt = 0;
X3t u
X
bt = 0:
X4t u
So
X 0u
b = X 0 (y X b) = X 0 y X 0 X b = 0:
Since X is of rank k; (X 0 X) 1
exists (X 0 X is non-singular, its determinant is
non-zero) so
b = (X 0 X) 1 X 0 y:
b0 u
u b = (y X b)0 (y X b)
0 0
= y 0 y + b X 0 X b 2b X 0 y
To derive the least square estimator, we take derivatives, and set them equal to
zero. If is a k 1 vector we get k derivatives, the …rst order conditions are the
k 1 set of equations,
u0 u
@b b
= 2X 0 X b 2X 0 y = 0
@ b
125
19. Properties of Least Squares
We can derive the expected value of b.
b = (X 0 X) 1 X 0 y = (X 0 X) 1 X 0 (X + u)
= (X 0 X) 1 X 0 X + (X 0 X) 1 X 0 u = + (X 0 X) 1 X 0 u
So
b= + (X 0 X) 1 X 0 u (19.1)
E(b) = + E((X 0 X) 1 X 0 u)
since is not a random variable, and if X and u are independent E((X 0 X) 1 X 0 u) =
E((X 0 X) 1 X 0 )E(u) = 0 since E(u) = 0: Thus E(b) = and b is an ubiased es-
timator of :
From (19.1) we have
b = (X 0 X) 1 X 0 u
The variance-covariance matrix of b is
b = (X 0 X) 1 X 0 u
so
126
2
variable Y is normally distributed with mean ; and variance . Then any linear
function of Y is also normally distributed.
2
Y N ;
X = a + bY N a + b ; b2 2
where a and b are scalars. The matrix equivalent of this is that, for k 1 vectors,
Y and M and k k matrix (not the summation sign) Y N (M; ) ; then for
h 1 vectors X and A; and h k matrix B
X = A + BY N (A + BM; B B 0 )
Notice the variance covariance matrix of X say V (X) = B B 0 is (h k) (k
k) (k h) = h h:
Since y is a linear function of u it follows that y is also normally distributed:
y N (X ; 2 I): In this case B is the identity matrix. Since b = (X 0 X) 1 X 0 y is
a linear function of y, it is also normally distributed.
b N ((X 0 X) 1 X 0 X ; (X 0 X) 1 X 0 2
I X(X 0 X) 1
N ( ; 2 (X 0 X) 1 )
in this case A is zero, B = (X 0 X) 1 X 0 : (X 0 X) 1 is equal to its transpose because
it is a symmetric matrix. (X 0 X) 1 X 0 X = I:
This says that b is normally distributed with expected value (and is therefore
unbiased) and variance covariance matrix 2 (X 0 X) 1 :
The variance covariance matrix is a k k matrix and we estimate it by
V (b) = s2 (X 0 X) 1
where s2 = ub0 u
b=(T k): The square roots of the diagonal elements of s2 (X 0 X) 1
give the standard errors of the estimates of the individual elements of ; which
are reported by computer programs:
127
Then we run another regression which includes zt :
y t = x t + zt + u t (19.3)
We get two estimates of the coe¢ cient on xt : b and . What is the relation
between them? To understand this we need to look at the relationship between xt
and zt . We can summarise the relationship between xt and zt by another regression
equation:
zt = dxt + wt (19.4)
wt is just the bit of zt that is not correlated with xt . d may be zero, if there is no
relationship. Put (19:4) into (19:3)and we get (19:2):
yt = xt + (dxt + wt ) + ut
yt = ( + d)xt + ( wt + ut )
So b = ( + d); the coe…cient of xt picks up the bit of zt that is correlated with xt :
Bits of zt that are not correlated with xt end up in the error term ( wt + ut ). This
is why looking for patterns in the error term is important, it may suggest what
variable you have left out. If you add a variable that is not correlated with xt , the
coe¢ cient of xt will not change. If you add a variable that is highly correlated
with xt , the coe¢ cient of xt will change a lot.
Yt = + Xt + Zt + ut
we can test the signi…cance of the individual coe¢ cients using t ratios exactly as
we did for the mean
b
t( = 0) =
se b
where se b is the estimated standard error of b: This tests the null hypothesis
H0 : = 0: If this t ratio is greater than two in absolute value we conclude that
128
b is signi…cant: signi…cantly di¤erent from zero at the 5% level. Computers often
print out this t ratio automatically. They usually give the coe¢ cient, the standard
error, the t ratio and the p value. The p value gives you the probability that the
null hypothesis is true. If it was less than 0:05, we would reject the hypothesis at
the 5% level.
We could test against other values than zero. Suppose economic theory sug-
gested that = 1 the t statistic for testing this would be
b 1
t( = 1) =
se (b)
and if this t statistic is greater than two in absolute value we conclude that b is
signi…cantly di¤erent from unity at the 5% level.
129
e.g. other variables should not be able to explain them. Our alternative hypoth-
esis is that the model is misspeci…ed in a particular way, and since there are lots
of ways that the model could be misspeci…ed (the errrors could be serially cor-
related, heteroskedastic, non-normal or the model could be non-linear) there are
lots of these tests, each testing the same null, the model is well speci…ed, against
a particular alternative that the misspeci…cation takes a particular form. This
is like the fact that there are lots of di¤erent diagnostic tests that doctors use.
There are lots of di¤erent ways that a person, or a regression, can be sick.
The Durbin-Watson test for serial correlation is a diagnostic test for serial
correlation. It is given by
PT
(b
ut u bt 1 )2
DW = t=2PT
b2t
t=1 u
it should be around 2; say 1:5 to 2:5: If it is below 1:5 there is positive serial
correlation, residuals are positively correlated with their previous (lagged) values,
above 2:5 negative serial correlation. It is only appropriate if (a) you are interested
in …rst order serial correlation; (b) there is an intercept in the equation, so the
residuals sum to zero and (c) there is no lagged dependent variable in the equation.
First order (one lag) serial correlation assumes that errors are related to their
values in the previous period
ut = ut 1 + "t
but there may be higher order serial correlation. For instance,. in quarterly data,
the errors may be related to errors up to a year ago: the size of the error in the
alcohol equation at Christmas (Q4) is related not just to the previous quarters
error but to the size of the error last Christmas:
ut = 1 ut 1 + 2 ut 2 + 3 ut 3 + 4 ut 4 + "t
this is fourth order (four lags) serial correlation. Suppose you ran a regression
0
yt = x t + ut
The test involves running a regression of the residuals on the variables included
in the original regression and the lagged residuals
bt = b0 x +
u bt 1
1u + bt 2
2u + bt 3
3u + bt 4
4u + "t
130
then testing the joint hypothesis 1 = 2 = 3 = 4 :
There are many diagnostic tests which involve regressing the estimated resid-
uals or powers of the residuals on particular variables. Technically, most of these
tests are known as Lagrange Multiplier Tests. It is important that you check your
equation for various diseases before you regard it as healthy enough to be used.
Statistical packages like EViews (section 22.2), Micro…t, Stata, etc. have built
in tests of the assumptions that are required for Least Squares estimates to be
reliable. If the assumptions do not hold the estimated standard errors are likely
to be wrong and corrected standard errors that are ‘robust’to the failure of the
assumptions are available.
Ct = 1 Dt + 2 (1 Dt ) + Y t + u t (21.1)
131
a constant in (21:1) the computer would have refused to estimate it and told you
that the data matrix was singular. This is known as ‘the dummy variable trap’.
A similar technique allows for seasonal e¤ects. Suppose that we had quarterly
data on consumption and income, and wanted to allow for consumption to di¤er
by quarters (e.g. spending more at Christmas). De…ne Q1t as a dummy variable
that is one in quarter one and zero otherwise; Q2t is one in quarter two zero
otherwise, etc. Then estimate
21.2. Non-linearities
21.2.1. Powers
We can easily allow for non-linearities by transformations of the data as we saw
with logarithms above. As another example imagine y (say earnings) …rst rose
with x (say age) then fell. We could model this by
yi = a + bxi + cx2i + ui
where we would expect b > 0; c < 0: Although the relationship between y and x
is non-linear, the model is linear in parameters, so ordinary least squares can be
used, we just include another variable which is the square of the …rst. Notice that
the e¤ect of x on y is given by
@y
= b + 2cxi
@x
thus is di¤erent at di¤erent values of xi ; and has a maximum (or minimum) which
can be calculated as the value of x that makes the …rst derivative zero. Earnings
rise with age to a maximum then fall, self reported happiness tends to fall with
age to a minimum then rise. The middle-aged are wealthy but miserable. We can
extend this approach by including the product of two variables as an additional
regressor. There is an example of this in the specimen exam, section 3.2 question
5. Verbeek section 3.5 has an extensive discussion.
132
21.2.2. Background: Regressions using Proportions
Suppose our dependent variable is a proportion, pt = Nt =K; whereNt is a number
a¤ected and K is the population, or a maximum number or saturation level.
Then pt lies between zero and one and the logistic transformation (ln(pt =(1 pt ))
is often used to ensure this: If the proportion is a function of time this gives,
pt
ln = a + bt + ut (21.2)
1 pt
which is an S shaped curve for pt over time. This often gives a good description
of the spread of a new good (e.g. the proportion of the population that have a
mobile phone) and can be estimated by least squares. Although this is a non
linear relationship in the variable pt it is linear in parameters when transformed
so can be estimated by least squares. The form of the non-linear relationship is
Nt 1
pt = = (21.3)
K 1 + exp (a + bt)
133
and R (interest rates) and PPI (producer price index). Note that there is no
2000 data on three of the variables. Figures are given on NSP and PPI which
are January …gures, the other three are averages for the year. Even if you are
not using EViews read the explanation and carry out the exercise on the software
you are using. The example below regresses dividends on earnings for the period
1871-1986. First we describe some of the output the computer produces.
the estimates of the regression coe¢ cients bi , i = 1; :::; k including the con-
stant
the standard error of each coe¢ cient SE(bi ) which measures how precisely
it is estimated;
the t ratio t( i = 0) = bi =SE(bi ) which tests the null hypothesis that that
particular coe¢ cient is really zero (the variable should not appear in the
regression): If the t ratio is greater than 2 in absolute value, we can reject
the null hypothesis that i = 0 at about the 5% level. In this case the
coe¢ cient is said to be signi…cantly di¤erent from zero or signi…cant.
the p value for the hypothesis that i = 0: This gives the probability that
the null hypothesis is true. If this is less than 0:05 again we can reject the
hypothesis that i = 0:
P 2
The Sum of Squared residuals bt : This is what least squares minimises.
u
where k is the number of regression coe¢ cients estimated and T the number
of observations. This is an estimate of the square root of the error variance
2
and gives you an idea of the average size of the errors: If the dependent
variable is a logarithm, multiply s by 100 and interpret it as the average
percent error.
134
R squared, which tells you the proportion of the variation in the dependent
variable that the equation explains
P P b
2 b2t
u (Yt Y )2
R =1 P 2
=P
(Yt Y ) (Yt Y )2
where k is the number of regression coe¢ cients estimated and T is the num-
ber of observations. Whereas R2 is always positive and increases when you
2
add variables, R can be negative and only increases if the added variables
have t ratios greater than unity.
An F statistic which tests the hypothesis that none of the slope variables
(i.e. the right hand side variables other than the constant b ) is signi…cant.
Notice that in the case of a single slope variable, this will be the square of
its t statistic. Usually it also gives the probability of getting that value of
the F-statistic if the slope variables all had no e¤ect.
22.2. Excel
Go into Excel, Load the Shiller.xls …le. Click Tools; Data Analysis; Choose Re-
gression from the list of techniques. You may have to add-in the data-analysis
module. Where it asks you Y range enter C2:C117. Where it asks you X range
135
enter D2:D117. Click Output Range and enter in the output range box G1. Al-
ternatively you can leave it at the default putting the results in a separate sheet.
Click OK. It gives you in the …rst box, Multiple R, which you can ignore, R
squared and Adjusted R Squared and Standard Error of the Regression. Then it
gives you an ANOVA box which you can ignore. Then it gives you estimates of
the coe¢ cients (intercept, X Variable 1, etc), their standard errors, t statistics,
and P values, etc. shown in the summary output.
In this case we have run a regression of dividends on earnings and the results
for the sample 1871 1986 are:
N Dt = 0:169+ 0:456N Et +b ut
(0:036) (0:007)
[4:67] [61:65]
f0:000g f0:000g
R2 = 0:971; s = 0:31:
Standard errors of coe¢ cients are given in parentheses, t statistics in brackets,
and p values in braces. You would normally report only one of the three, usually
just standard errors. The interpretation is that if earnings go up by $10, then
dividends will go up by $4.56. If earnings were zero, dividends would be 16.9
cents. Earnings explain 97% of the variation in dividends over this period and
the average error in predicting dividends is 0.31. We would expect our predictions
to be within two standard errors of the true value 95% of the time. Both the
intercept and the coe¢ cient of earnings are signi…cantly di¤erent from zero at the
5% level: their t statistics are greater than 2 in absolute values and their p values
are less than 0.05.
In Excel, if you click the residuals box it will also give you the predicted
values and residuals for every observation. If you use Excel you must graph these
residuals to judge how well the least squares assumptions hold. You can have more
right hand side, X, variables but they must be contiguous in the spreadsheet, side
by side. So for instance we could have estimated
N Dt = + N Et + N SPt + ut
136
22.3. EViews
22.3.1. Entering Data
Open the EViews program, di¤erent versions may di¤er slightly. Click on File,
New, Work…le, accept the default annual data and enter the length of the time
series 1871 2000 in the box. OK. You will now get a box telling you that you have
a …le with two variables C (which takes the value unity for each observation) and
RESID which is the variable where estimates of the residuals will be stored.
Click on File, Import, Read Text-Lotus-Excel, then click on the Shiller …le. It
will open a box. Tell it, in the relevant box, that there are 5 variables. Note the
other options, but use the defaults, note that B2 is the right place to start reading
this data …le. Click Read. You should now also have PPI, ND, NE and NSP R
in your work…le. Double click on NSP and you will see the data and have various
other options including graph.
Highlight NE and ND. Click on Quick, then Graph, OK line graph, and you
will see the graphs of these two series. Close the graph. Always graph your
data.
Use Save As command to save the Work…le under a new name and keep saving
it when you add new data, transformations etc.
137
Sum squared resid 11.45203 Schwarz criterion 0.604413
Log likelihood -30.30234 F-statistic 3880.968
Durbin-Watson stat 0.659872 Prob(F-statistic) 0.000000
138
and click OK. You will see that a new variable, log of real dividends, has been
added to the work…le.
Do the same to generate log of real earnings: LRE=log(NE/PPI). Graph LRD
and LRE. Click on Quick, Estimate equation, and enter LRD C LRE LRD(-1) in
the box. This estimates an equation of the form
Yt = + Xt + Yt 1 + ut
where the dependent variable is in‡uenced by its value in the previous period.
The estimates are:
LRDt = 0:517+ 0:248LREt + 0:657LRDt 1 R2 = 0:94
(0:107) (0:034) (0:046) s = 0:108
Although this has a R2 of 0.94, it does not mean that it is worse than the
previous equation, which had an R2 of 0.97, because the two equations have
di¤erent dependent variables. Above the dependent variable was nominal divi-
dends, here it is log real dividends. The Durbin Watson statistic for this equation
is 1.72 which is much better. This is a dynamic equation since it includes the
lagged dependent variable. The long-run elasticity of dividends to earnings is
0:248=(1 0:657) = 0:72: A 1% increase in earnings is associated with a 0.72%
increase in dividends in the long-run.
23. Dynamics
With cross-section data a major issue tends to be getting the functional form
correct; with time-series data a major issues tends to be getting the dependence
over time, the dynamics, correct.
139
1 < 1 < 1 the process is stable, it will converge back to a long-run equilibrium
after shocks. The long run equilibrium can be got from assuming yt = yt 1 = y
(as would be true in equilibrium with no shocks) so
y = 0 + 1y
y = 0 =(1 1 ):
Using the star to indicate the long-run equilibrium value. A random walk does
not have a long-run equilibrium it can wander anywhere.
A second order (two lags) autoregression takes the form
yt = 0 + 1 yt 1 + 2 yt 2 + ut :
This is stable if 1< 1 + 2 < 1; in which case its long run expected value is
0
y =
1 1 2
We may also get slow responses from the e¤ects of the independent variables,
these are called distributed lags (DL). A …rst order distributed lag takes the form
yt = + 0 xt + 1 xt 1 + ut
yt = 0 + 1 yt 1 + 0 xt + 1 xt 1 + ut
Again it can be estimated by least squares. There are often strong theoretical
reasons for such forms.
In many cases adjustment towards equilibrium is slow. This can be dealt with
by assuming a long-run equilibrium relationship, e.g. between consumption and
income
Ct = 0 + 1 Yt (23.1)
and a partial adjustment model (PAM)
Ct = (Ct Ct 1 ) + ut : (23.2)
140
measures the proportion of the deviation from equilibrium made up in a period
and we would expect 0 < 1; with = 1 indicating instantaneous adjustment
and = 0 no adjustment. We can write this
Ct = 0 + 1 Yt Ct 1 + ut
or
Ct = 0 + 1 Yt + (1 )Ct 1 + ut
we would just run a regression of consumption on a constant, income and lagged
consumption,
Ct = 0 + Yt + 1 Ct 1 + ut
We can recover the theoretical parameters ; 0 , 1 from the estimated parameters
given by the computer 0 ; 1 ; : So we estimate the speed of adjustment as
b = (1 b 1 ); and the long run e¤ect as b1 = b =b: This is a dynamic equation
1
it includes lagged values of the dependent variable. Whether we estimate it using
the …rst di¤erence Ct or level Ct as the dependent variable does not matter, we
would get identical estimates of the intercept and the coe¢ cient of income. The
coe¢ cient of lagged consumption in the levels equation will be exactly equal to the
coe¢ cient in the …rst di¤erence equation plus one. Sums of squared residuals will
be identical, though R2 will not be, because the dependent variable is di¤erent.
This is one reason R2 is not a good measure of …t.
For more complex adjustment processes, we can keep the long-run relation-
ship given by (23:1) and replace the partial adjutment model (23:2) by the error
correction model (ECM), which assumes that people respond to both the change
in the target and the lagged error
Ct = 1 Ct + 2 (Ct 1 Ct 1 ) + ut
Ct = 1 1 Yt + 2 ( 0 + 1 Yt 1 Ct 1 ) + ut
Ct = a0 + b0 Yt + b1 Yt 1 + a1 Ct 1 + ut :
and we can estimate the last version which gives us the estimated parameters,
which are functions of the theoretical parameters, (a0 = 2 0 ; b0 = 1 1 ; b1 = 2 1 ;
a1 = 2 ); so we can solve for the theoretical parameters from our estimates, e.g.
the long run e¤ect is, b1 = bb1 =ab1 :
We can also rearrange the ECM to give a (reparameterised) estimating equa-
tion of the ARDL(1,1) form
yt = 0 + 1 yt 1 + 0 xt + 1 xt 1 + ut :
141
Where 1 = 1 + a1 ; 1 = b0 + b1 :We can …nd the equilibrium solution if the model
is stable, 1 < a1 < 1 by setting yt = yt 1 = y; xt = xt 1 = x so that in long-run
equilibrium
y = 0 + 1y
0x + 1x +
0 +
y = + 0 1
x
1 1 1 1
y = 0 + 1x
These long-run estimates, 0 and 1 can be calculated from the short-run estimates
of 0 ; 1 ; 0 ; 1 which the computer reports. Economic theory usually makes pre-
dictions about the long-run relations rather than the short-run relations. Notice
our estimate of 1 will be identical whether we get it from the ECM or ARDL
equation or from estimating a non-linear version.
pt = pt 1 + "t
p t = "t
142
Above we could estimate the variance directly, usually we assume that the
error from an estimated regression equation exhibits ARCH. Suppose that we
estimate
y t = 0 x t + "t
where E("t ) = 0 and E("2t ) = 2
t: The GARCH(1,1) …rst order Generalised ARCH
model is then
2
t = a0 + a1 "2t i + b2 2
t i
more lags could be added. Eviews can estimate GARCH models of various forms.
Yt = 1 + 2 Xt + ut
y =X +u
2 3 2 3 2 3
Y1 1 X1 u1
6 Y2 7 6 1 X2 7 6 u2 7
6 7 6 7 1
+6 7
4 :: 5 = 4 :: :: 5 2
4 :: 5
YT 1 XT uT
143
and we will use X 0 X where X 0 is the 2 T transpose of X; so X 0 X is 2 2
P
PT P x2t
(X 0 X) =
xt xt
b0 u
u b = (y X b)0 (y X b)
0 0
= y 0 y + b X 0 X b 2b X 0 y
X
T X 2 2X X
b2t
u = Yt2 + (b1 T + b2 Xt2 + 2b1 b2 Xt )
t=1
X X
2(b1 Xt + b2 Xt Yt )
0 2
The scalar b X 0 X b is a quadratic form, i.e. of the form x0 Ax and the bi appear
in it: Quadratic forms play a big role in econometrics. A matrix, A; is positive
de…nite if for any a; a0 Aa > 0: Matrices with the structure X 0 X are always
positive de…nite, since they can be written as a sum of squares. De…ne z = Xa;
then z 0 z = a0 X 0 Xa is the sum of the squared elements of z.
144
Consider the linear relation:
P = x0a
1xn nx1
Then the di¤erential of P with respect to x or x0 is de…ned as :
dP dP
= a and 0 = a0
dx dx
In the case n= 2, we can write:
a1
P = [x1 ; x2 ]
a2
= x 1 a1 + x 2 a2
Then
dP dP
= a1 and = a2
dx1 dx2
So
dP
dP dx1 a1
= dP = =a
dx dx2
a2
and
dP dP dP
0
= ; = [a1 ; a2 ] = a0
dx dx1 dx2
Consider the quadratic form:
Q= x0 A x
1xn (nxn)(nx1)
145
So:
dQ dQ
= 2a11 x1 + 2a12 x2 and = 2a12 x1 + 2a22 x2
dx1 dx2
Then
" #
dQ
dQ dx1 2a11 x1 + 2a12 x2 a11 a12 x1
= dQ = =2
dx dx2
2a12 x1 + 2a22 x2 a12 a22 x2
= 2 A x
(2x2)(2x1)
and
dQ dQ dQ
= ; = [2a11 x1 + 2a12 x2 ; 2a12 x1 + 2a22 x2 ]
dx0 dx1 dx2
a11 a12
= 2 [x1 ; x2 ]
a12 a22
0
= 2x A
1x2 2x2
e = Cy = C(X + u) = CX + Cu
E(e) = CX + CE(u)
146
since e is unbiased by assumption. From above
e = + Cu = + ((X 0 X) 1 X 0 + W )u
e = (X 0 X) 1 X 0 u + W u
E(W u u0 W 0 ) = 2
WW0
the third is
E((X 0 X) 1 X 0 uu0 W 0 ) = 2
(X 0 X) 1 X 0 W 0 = 0
since X 0 W 0 = 0: Similarly
147
25. Index
AR (autoregressive) 138-141
ARDL (autoregressive distributed lag) 140-142
Asymptotic 104
Average (arithmetic mean) 41, 100-102
Bayes theorem/Bayesian statistics 17-18, 77-8, 100
CAPM (capital asset pricing model) 113-4,
Central limit theorem, 85-6
Cobb Douglas production function, 121-2
Con…dence intervals, 107-8
Consistent 104
Correlation coe¢ cient 43, 117
Covariance, 42-3
Demand function 16, 121
Dummy variables, 131-2
Durbin Watson statistic
Dynamics 139-42
ECM (error correction model) 140-1
e¢ cient market hypothesis 37
Exchange rates 95-6
F distribution, tests, 92, 129
GARCH, 142
GDP/GNP, Gross Domestic/National Product 51-2, 55
Geometric mean 41, 66, 94
Graphs 40, 50
Growth rates, 55-9, 92-3
Hypothesis tests, 108-11
Index numbers, 66-70
In‡ation rates, 55-9, 66-70
Interest rates, 92-4 114
Kurtosis, 43, 65
Law of large numbers, 98
Least squares, 101, 116, 118
Logarithms, 37-8, 66-7, 93, 121-2
Logistic, 133
Mean, see average and geometric mean
148
Median 35
Mode 35
Moments, 43-44, Method of moments, 99, 101, 115-6, 125
Monte Carlo, 97-8
Neyman-Pearson, 99
Normal distribution, 86-8
Partial Adjustment 140-1
Phillips Curve, 14
Predicted values, 120
Probability 74-8
Production function, 121-2
Proportions, 105, 112-3, 133
Random variables, 81-4, 86
Regression, section 17 onwards
Residuals, 120
R squared and R bar squared 118,134
Seasonals, 132
Signi…cance, 100, 107-10, 128
Skewness, 58, 65
Standard Deviation, 41-2, 103
Standard Error, of mean 102-4, of regression coe¢ cients 107, 117,128, of re-
gression 117-8
Standardised data 43
t distribution and test 91, 108
Time trend 18, 121-2
Unemployment, 52-3
Variance, 41-2
149