Sei sulla pagina 1di 34

Econ 184: Introduction to Econometrics

(Fall 2015)
Alexandre Poirier
1 Department

of Economics
University of Iowa

8/25/2015

Why Study Econometrics?

Econometrics: the use of statistical methods and economic


theory to study economic problems (using data).

Study causal relationships between economic variables.


I
I

I
I

Economics is a quantitative and predictive science.


Economists often want to determine whether a change in one
variable causes a change in another.
For example, does another year of schooling increase earnings?
Other examples:
I
I
I

Does patent protection help foster innovation?


Does a minimum wage lower employment?
Will universal coverage lower the quality of health care?

Why Study Econometrics?

More specifically, we might want to


I

Study whether the predictions of an economic model hold true


in reality.
I

Does demand slope downward? Is the stock market efficient?

Quantify the effect of an economic or social program on an


outcome of interest (poverty, inequality, wages, fertility,
educational achievements, innovation).
Forecast economic variables of interest.
I

Next quarter inflation, etc.

Why Study Econometrics?

This course is geared toward providing you with an introduction to


the methods and tools of econometrics.

Four main examples used in the textbook

1. Does reducing class size improve elementary school education?


2. Is there racial discrimination in the market for home loans?
3. How much do cigarette taxes reduce smoking?
4. Will raising the beer tax reduce traffic fatalities?
I

We will analyze many more examples in class, sections and


problem sets....

Causation vs. Correlation

Does a change in X really cause a change in Y ? Or do they


just co-vary?

To evaluate policies and test theories, we need to establish


causation.

But in the real world, correlation and causation are often very
difficult to separate.
I
I
I

Does drinking red wine reduce the risk of a heart attack?


Does watching Oprah cause stress?
Does smoking cigarettes cause cancer?

Causation vs. Correlation: More Examples

We observe a positive relationship between crime and the


number of police officers.
I
I

We observe that unemployed people who attend a job training


program wait for shorter periods before finding a job.
I

Is it because police officers create crime?


Or (more likely!) is it because more police officers are assigned
to more troublesome neighborhoods?

Is it because the program helped them, or is it because those


who joined the program are the most skilled/motivated ones,
so that they would have waited less anyway?

So how do we establish causation?

Causation vs. Correlation


I

Ideally, we would like to have experimental data: for


example, two identical plots of land, where the same crop is
cultivated, using the same techniques, but where different
fertilizers are used.

Then if the outcome of interest (yield per acre, for example)


is different between the two plots, we can safely infer that a
different fertilizer causes differences in the average yield.
I

Classic example: clinical trials (compliance?).

BUT, in most cases, all what we have is observational data


(you cant have the same person with and without college, or
the same economy with and without a tax cut...).

In general, to find answers, we need econometric skills &


creativity.

Summary: Learning Goals

Conduct Statistical Analysis.

Understand the role of empirical evidence in evaluating


economic problems.

Understand the role of assumptions for the underlying models.

Probability Review: Introduction

Lets begin by introducing the notions of probability,


randomness, and random variables....

The use of probability to measure uncertainty and variability


began hundreds of years ago with the study of gambling.

Generally speaking, probability is the chance that something


(an event) will happen.

The probability of an event or outcome is the proportion of


the time it occurs in the long run - this is called the frequency
interpretation of probability.

Probability Review: Definitions


I

Sample space and Events: The set of all possible outcomes


is called the sample space. An event is a subset of the sample
space.
I
I
I

We typically use to denote the sample space.


Example: Coin Tossing. = {Heads, Tails }.
Example: # of times a computer will crash. = {0, 1, ...},
and event A = {0, 1} (i.e., the computer will crash no more
than once).

Random Variable: Is a numerical summary of a random


outcome.
I

Formally: A random variable is a real valued function, defined


on the set of possible outcomes (i.e. the sample space ),
that assigns a real number to every possible outcome.
Example: Coin Tossing. = {Heads, Tails }. A random
variable could be X {0, 1} such that X = 0 if Heads occur.

Probability Review: Definitions

There are two major classes of random variables


I

A discrete random variable takes on only a discrete number of


values.
I

Discrete and Continuous

Number of phone calls you will receive today.

A continuous random variable takes on a continuum of values.


I

Amount of time you will spend on the phone.

Probability Review: Definitions

Probability distribution: Is a number between 0 and 1 than


quantifies how likely is an event to occur. I.e., for an event A,
the number Pr(A) indicates the probability that A will occur.

Probability Review: Discrete Random Variables

We characterize or describe a discrete random variable X


with a probability function (pf).

A pf lists the probability for each possible discrete outcome.

The pf of a discrete random variable is defined as the function


f such that for every real number x,
f (x ) = Pr(X = x )
where X represents a random variable and x represents a
realization of that random variable.

The following slide contains a specific example. Note that the


same information is displayed in the table and graph.

Probability Review: Probability Function

Probability Review: Cumulative Distribution Function


(CDF)
I

Another way of characterizing the distribution of a random


variable is with a cumulative distribution function (cdf)

The cdf lists the probability that a random variable is less


than or equal to a specific value
F (x ) = Pr(X x )
x
Pr(X x )

0
0.8

F is:
I
I
I

F (x ) [0, 1] for all x.


F is non-decreasing.
F is right-continuous.

1
0.9

2
0.96

3
0.99

4
1.0

Probability Review: Cumulative Distribution Function


(CDF)
I

The cumulative distribution may also be referred to as the


distribution function.
x
Pr(X x )

0
0.8

1
0.9

2
0.96

3
0.99

1.2
1
0.8
0.6
0.4
0.2
0
0

4
1.0

Probability Review: Continuous Random Variables


I

A random variable Y that can take on any real value within


some range is a continuous random variable.
I

Time, temperature, height...

For continuous random variables, the probability of a


particular value occurring is equal to zero
Pr(Y = y ) = 0

We typically speak of interval probabilities (i.e. the probability


that Y will take on some subset of values)
Pr(a Y b )

Note that probability zero does not mean impossible.

Probability Review: Probability Density Function (pdf)


I

The probabilities associated with a continuous random


variable Y are determined by the pdf of Y .

The pdf of Y , denoted f (y ), has the following properties:

1. f (y ) 0, for all y .
2. The probability that the uncertain quantity Y will fall in the
interval (a, b ) is equal to the area under f (y ) between a and
b:
P (a < Y < b ) =

Zb

f (y )dy .

3. The total area under the entire curve of f (y ) is equal to 1


Z

f (y )dy = 1

0
1.1

0.9

3.3

3.2

3.1

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

1.9

1.8

1.7

1.6

1.5

1.4

1.3

0.2

1.2

0.1

0.8

0.2

0.7

0.3

0.5

0.5

0.4

0.6

0.5

0.4

0.4

0.6

0.1

0.7

0.3

0.8

0.2

0.9

3.3

3.2

3.1

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

Probability Review: Probability Density Function (pdf)

0.9

0.8

0.6

0.7

0.3

0.1

P(0.8<y<1.6)

Probability Review: CDF of Continuous Random Variables

The definition of a cdf for a continuous random variable is the


same as that of a discrete random variable
F (y ) = Pr(Y y )

With a continuous random variable, the cdf is a continuous


function over the entire real line, so we can write down a
simple formula
F (y ) = Pr(Y y ) =

Zy

f (t )dt

Probability Review: CDF of Continuous Random Variables

Furthermore, it follows that, at each point at which f (x ) is


continuous, the pdf can be calculated as
F 0 (y ) =

dF (y )
= f (y )
dy

We can easily see that


Pr(Y
Pr(y1

> y ) = 1 F (y )
and
< Y y2 ) = F (y2 ) F (y1 )

In practice, the CDF allows us to calculate the probability for


any interval.

Pr(15 < CT 20) = Pr(CT 20) Pr(CT 15) =


F (20) F (15) = .58

Pr(CT > 20) = 1 F (20) = 1 .78 = .22

Probability Review: Measures of Central Tendency for


Distributions
I

Mode: The mode is the value that occurs with the greatest
probability.
I

Example: What is the modal age of students in this class?

Median: The median is the value such that the probability of


the random variable being less than or equal to that value is
at least 50% and the probability of the random variable being
greater than or equal to that value is at least 50%.
I
I

Example: What is the median age of students in this class?


Case 0
Age 19 20 21
%
.4 .2 .4
Case 1
Age 19 20 21 22
%
.4 .2 .2 .2
Case 2
Age 19 20 21 22
%
.4 .1 .3 .2

Probability Review: Mean (Expected Value)

Mean: The mean or expected value of a random variable X is


the weighted average of all its possible outcomes, weighted
by the probabilities of those outcomes.

Unlike the mode, its unique.

For a discrete random variable X ,


E (X ) = x1 Pr(x1 ) + ... + xk Pr(xk ) =

i =1

i =1

xi Pr(xi ) = xi f (xi )

Example: Expected value of throwing a die:

E (X ) = 61 1 + 16 2 + 16 3 + 16 4 + 16 5 + 16 6 =
1
6 (1 + 2 + 3 + 4 + 5 + 6) = 3.5

Probability Review: Mean (Expected Value)

However, one drawback of the mean (relative to the median)


is its sensitivity to outliers
I

When Bill Gates walks into a bar...

For a continuous random variable X with probability density


function f (x ), the mean of X (assuming it exists) is defined as
E (X ) =

xf (x )dx

The expected value or mean of X is typically denoted by


E (X ) or X .

Example
f (x ) = 2x for 0 < x < 1 (and 0 otherwise)
Then
Z

E (X ) =

xf (x ) dx =

Z1

x (2x ) dx =

Z1

2x 2 dx = 32 x 3 |10 =

2
3

Question: whats F (x )?

F (x ) =

Z x

f (t ) dt =

Z x
0

2tdt = t 2 |x0 = x 2 for x 1.

What about the median?


1
1
1
1
2 ? F ( 2 ) = 4 < 2 Nope.
2
3?

F ( 23 ) =

F ( 12 )

4
9

<

1
2

Nope.

1
2
2.5
2
1.5
1
0.5
0
0

0.2

0.4

0.6

0.8

1.2

1.4

Probability Review: Expected Value of a Function


I

We know that if X is a random variable with pdf f (x ) we can


calculate E (X ) as either
k

E (X ) =

xi f (xi )

or E (X ) =

i =1

xf (x )dx

But what if we want to calculate E (X 2 ) or E (ln(X ))?

In general, whats the expected value of a function g () of X ?

It can also be shown that (assuming it exists)


k

E (g (X )) =

g (xi )f (xi )

or E (g (X )) =

i =1

For example, for a continuous distribution


2

E (X ) =

x 2 f (x )dx

g (x )f (x )dx

Probability Review: Measures of Dispersion for


Distributions
I

The variance of a random variable measures the spread or


dispersion of the variable around its mean
i
h
Var (X ) = E (X X )2

In the discrete case, this is simply the weighted average of the


squared deviations of X from its mean
k

Var (X ) =

[xi E (X )]2 Pr(xi )

i =1

For a continuous random variable X with probability density


function f (x ), the variance of X is
Var (X ) =

[x E (X )]2 f (x )dx.

Probability Review: Variance & Moments

Another (equivalent) formula that can be used in either case


is:
Var (X ) = E (X 2 ) (E (X ))2
where the second moment E (X 2 ) is defined as
E (X 2 ) = (x1 )2 Pr(x1 ) + (x2 )2 Pr(x2 ) + ... + (xk )2 Pr(xk )
or
E (X 2 ) =

x 2 f (x )dx

Other (higher order) moments are defined similarly.

Probability Review: Standard Deviation

The variance of X is denoted Var (X ) or 2X .

The standard deviation of X , denoted X , is the square root


of Var (X ).

A large standard deviation and variance means that the


probability distribution is quite spread out: a large difference
between the outcome and the expected value is anticipated.

Probability Review: Example


x
1
2
3
4
5
6

Pr(x )

x Pr(x )

(x )2 Pr(x )

1
6
1
6
1
6
1
6
1
6
1
6

1
6
1
3
1
2
2
3
5
6

1
6
2
3
3
2
8
3
25
6

E (X ) = 3.5

E (X 2 ) = 15 61

k=6

I
I

E (X ) = ki=1 xi Pr(xi ) = 3.5


E (X 2 ) = ki=1 (xi )2 Pr(xi ) = 15 16

Var (X ) = E (X 2 ) (E (X ))2 = 15 16 3.52 = 2.92

X = 1.71

Probability Review: Example


f (x ) = 2x for 0 < x < 1 (and 0 otherwise)
We already know E (X ) = 23 . Whats Var (X )?
Var (X ) =

[x E (X )] f (x )dx =

Z1

Z1


2 2
2xdx
3

1



2x 3 83 x 2 + 98 x dx =

1 4
2x

89 x 3 + 94 x 2

or (using the other formula)


Var (X ) =

E (X 2 ) (E (X ))2

Z1

x 2 (2x ) dx ( 32 )2

Z1
0

2x 3 dx

4
9

1

2x


4 1
0

4
9

1
2

4
9

1
18

1
2

4
9

1
18

Probability Review: Expected Value and Variance of a


Linear Function
If Y = a + bX , then E (Y ) = a + bE (X )
Example: Suppose E (X ) = 5, Y = 3X 5, Z = 3X + 15
E (Y ) = 3 E (X ) 5 = 3 5 5 = 10
E (Z ) = 3 E (X ) + 15 = 3 5 + 15 = 0
If Y = a + bX , then Var (Y ) = b 2 Var (X )
Example: Var (X ) = 5, Y = 3X 5, Z = 3X + 15
Var (Y ) = 9 Var (X ) = 9 5 = 45
Var (Z ) = 9 Var (X ) = 9 5 = 45
These properties can be easily proven using the definitions of
expectation and variance...

Potrebbero piacerti anche