Sei sulla pagina 1di 4

ECON 309

Lecture 5: Causation and Correlation


I. Correlation
Correlation means the tendency of two variables to tend to move together. A correlation can be
positive (meaning that the variables tend to move in the same direction) or negative (meaning the
variables tend to move in opposite directions). An example of a positive correlation is age and
income; people who are older tend to earn more money. An example of a negative correlation is
latitude and temperature; as latitude increases, temperature tends to decrease.
More specifically, we have a statistical formula for correlation coefficient:

Notice the numerator will tend to be larger in magnitude when larger values of x happen along
with larger values of y, smaller values of x with smaller values of y. The denominator is the
standard deviation of both variables; in other words, we are scaling for how much these variables
vary without reference to each other.
The correlation coefficient always has a value between -1 and 1. When its 1, we have a perfect
positive correlation; when its -1, we have a perfect negative correlation. A perfect correlation
means that if you know one variables value, you automatically know the others as well. For
instance, the ages of any two people are perfectly correlated; if you know one persons age, then
as long as you know the difference in their birthdates, you know the other persons age, too.
When the correlation coefficient is zero, there is no correlation; knowing one variables tells you
absolutely nothing about the other.
If you square the correlation coefficient, you get something called the coefficient of
determination, or r2. It always lies between 0 and 1, and it has a nice interpretation: it tells you
the fraction of variation in one variable that can be explained, or predicted, by variation in the
other. For example, suppose r2 = 0.4 for income and age. That would mean differences in age
explain 40% of differences in income; the remaining 60% would have to be explained by other
factors.
However, were using explain and predict in a very specific way here. It is only a property
of the variables numerical values and how they tend to go together. That does necessarily mean
that changes in one variable cause changes in the other. For example, age and income might be
correlated, but age may not cause higher income; its just that older people tend to have more
experience. Correlation is only a statement of numerical facts; it says nothing about cause and
effect. As we will see, causation is a much more complex matter.
II. Correlation versus Causation

Causation = cause and effect; talking about one thing will tend, other things equal, to result in
another thing. It is often a very difficult matter to distinguish true causal relationship.
Example: RadioLab podcast on Secrets of Success. What causes success? Is it innate ability?
Timing/opportunity? Love of the activity (motivation)? Practice?
Things to consider: (1) Maybe there is not just one answer. Multiple factors contribute
to success. Some may have one cause, others a different cause. (2) Some factors may
only indirectly cause the outcome. E.g., motivation might matter only because it affects
practice. (3) Some factors may interact with each other. E.g., the effectiveness of
practice may depend on innate talent. E.g., the use of talent may be dependent on some
degree of luck. (4) Some things may have both a direct and indirect effect. E.g., talent
has both a direct effect by making you just better, and an indirect effect by increasing
your motivation (people like to do what theyre good at).
Example: Republicans report having more satisfying sex lives than Democrats. According to an
ABC News poll (http://abcnews.go.com/Primetime/News/story?id=180291), Republicans are
more likely to report being very satisfied with their sex life than Democrats (by a margin of 56%
to 47%). This is true even if you control for being in a committed relationship (87% versus
76%), so its not just that Republicans are more likely to be in such relationships. Whats going
on? Does this mean being a Republican causes better sex lives? [Good for demonstrating
CA&B. Turns out men are both more likely to be Republican and more likely to be happy
with their sex lives.]
Example: People who have had more sex partners are more likely to get divorced. Does this
mean having more sex partners causes divorce?
(http://agoraphilia.blogspot.com/2007/02/sexual-correlation-and-causation.html)
[Good for demonstrating CA&B and also BA. One possibility is that possession of
conservative values results in both fewer divorces and fewer sex partners. Another possibility is
that getting married tends to cause fewer sex partners (because you stop adding more), while
getting divorced tends to cause more sex partners (because you start adding them again).]
Example: Is the President responsible for the economys performance on his watch? Obviously,
the President has some influence on economic policy. But there are lots of confounding factors.
(a) Effects can be lagged in time, so that some economic effects are the responsibility of the
previous president. (b) Business cycles can be driven by non-political factors, such as changes in
underlying factors in the economy. (c) A president might get voted out of office because people
think hes responsible for the recession, and as a result the new president comes in just as the
economy is recovering.
Simplest form of causation: AB. That is, when A happens, that means B will also happen.
We say A causes B to happens. When we observe a correlation between A and B, people will
often reach the conclusion that AB. But there are many other possibilities:
1. BA; call this reverse causation.
2. CA & B; cause this external causation.

3.
4.
5.
6.
7.

AB & CB; A and C each independently cause B; call this multiple causation.
(A&C)B; A and B together cause B; call this joint causation.
ACB; call this indirect causation.
CAB; this is also indirect causation, but with a different order of events.
A unrelated to B; we call this a coincidence. B happened for unrelated reasons.

III. The Need for Controls


Because of all these other factors that can cause spurious correlations (especially CA&B), we
need to use controls. That means trying to find data for which the other factor (C) is held
constant, so we can test to see whether we still have a relationship consistent with AB.
Example: What are the causes of higher income? Education is an obvious answer. Another
obvious answer is IQ. But wait IQ may also cause one to get higher education. So maybe we
have external causation: IQ is the real cause, it leads to higher income and also higher
education as a side effect. To find out how much education really does, we need to control for
IQ by having at least some people in our data set who have similar IQs but different amounts of
education. And wed also like some people with similar education but different IQs, to test to
see whether IQ has an effect independent of education.
Much of statistics is concerned with trying to control for confounding factors. There are various
methods for doing this. One is the experimental method, which directly holds constant some
factors while varying others. Another is multiple regression, which works by collecting a large
sample of data that happens to vary by multiple factors.
But even with good controls, we still cannot verify true causation. The best we can do is rule out
certain alternative hypotheses, thereby strengthening the case of causation. Say we have a theory
that education causes higher income. Someone might challenge that with the IQ hypothesis. To
rule that out, we use IQ as a control and show there is still a correlation between education and
income. That lends support to our theory, but it doesnt prove it, because its still possible the
correlation is just a coincidence or caused by yet another factor weve failed to control for.
The ceteris paribus assumption. Most of the claims we make in economics have this form;
were saying that a causal relationship holds as long as other things are equal. The law of
demand says a higher price will induce people to buy less but thats assuming income and
preferences are constant. So we try to control for those other factors to isolate the effect of price.
IV. Necessary and Sufficient Conditions
These terms are often, but not always, related to causation. In some cases, they refer not to
causation but to strictly logical relationships.
We say A is a sufficient condition of B if having B guarantees having A. We can write this
AB, or If A then B. Note that this does not mean that BA. For example, being a poodle is
a sufficient condition for being a dog. Thus, poodledog. But if you have a dog, it might not
be a poodle.

We say A is a necessary condition of B if B cannot happen without A. We can write this BA or


AB. (Think that through. If A is necessary for B, then if B is true, we know A must have
happened.) For example, being a poodle is a sufficient condition for being a dog.
In the example of dogs and poodles, the necessary and sufficient conditions are opposite sides of
the coin. Poodle is sufficient for dog, and dog is necessary for poodle. But it doesnt have to be
that way. There are many cases where you may have sufficient conditions without necessary
ones, and vice versa. For example, more education might be sufficient for higher income (that is,
other things equal, more education leads to higher income). But even if thats definitely true,
you cannot conclude from someone having a higher income that they also have more education,
because other things (like athletic ability) might lead to higher income without an education.
How do necessary and sufficient conditions show up in statistics? Both can show up in
correlations. If A is necessary for B, then we will expect to see B only in cases where A occurs,
and not-B when A does not occur, resulting in a correlation. (However, if A really is strictly
necessary for B, then there should not be a single counterexample; that is, there should be no
data points with B and not A.) If A is sufficient for B, then we will expect to see A and B
frequently occurring together as well, since whenever A happens, B must also happen.
(However, if A really is strictly sufficient for B, there should be no data points with A and not B.)
Keep in mind that whenever other factors are also involved, as in the case of multiple causation,
there will not be perfect correlations. We may try to control for them all, but were unlikely to
succeed because the world is complex.

Potrebbero piacerti anche