Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
does?
by Michael Nielsen on January 23, 2012
It is a commonplace of scientific discussion that correlation does not imply causation. Business Week recently
ran an spoof article pointing out some amusing examples of the dangers of inferring causation from
correlation. For example, the article points out that Facebooks growth has been strongly correlated with the
yield on Greek government bonds: (credit)
Despite this strong correlation, it would not be wise to conclude that the success of Facebook has
somehow caused the current (2009-2012) Greek debt crisis, nor that the Greek debt crisis has caused the
adoption of Facebook!
Of course, while its all very well to piously state that correlation doesnt imply causation, it does leave us with
a conundrum: under what conditions, exactly, can we use experimental data to deduce a causal relationship
between two or more variables?
The standard scientific answer to this question is that (with some caveats) we can infer causality from a well
designed randomized controlled experiment. Unfortunately, while this answer is satisfying in principle and
sometimes useful in practice, its often impractical or impossible to do a randomized controlled experiment.
And so were left with the question of whether there are other procedures we can use to infer causality from
experimental data. And, given that we can find more general procedures for inferring causal relationships,
what does causality mean, anyway, for how we reason about a system?
It might seem that the answers to such fundamental questions would have been settled long ago. In fact, they
turn out to be surprisingly subtle questions. Over the past few decades, a group of scientists have developed
a theory of causal inferenceintended to address these and other related questions. This theory can be
thought of as an algebra or language for reasoning about cause and effect. Many elements of the theory have
been laid out in a famous book by one of the main contributors to the theory, Judea Pearl. Although the
theory of causal inference is not yet fully formed, and is still undergoing development, what has already been
accomplished is interesting and worth understanding.
In this post I will describe one small but important part of the theory of causal inference, a causal
calculus developed by Pearl. This causal calculus is a set of three simple but powerful algebraic rules which
can be used to make inferences about causal relationships. In particular, Ill explain how the causal calculus
can sometimes (but not always!) be used to infer causation from a set of data, even when a randomized
controlled experiment is not possible. Also in the post, Ill describe some of the limits of the causal calculus,
and some of my own speculations and questions.
The post is a little technically detailed at points. However, the first three sections of the post are nontechnical, and I hope will be of broad interest. Throughout the post Ive included occasional Problems for the
author, where I describe problems Id like to solve, or things Id like to understand better. Feel free to ignore
these if you find them distracting, but I hope theyll give you some sense of what I find interesting about the
subject. Incidentally, Im sure many of these problems have already been solved by others; Im not claiming
that these are all open research problems, although perhaps some are. Theyre simply things Id like to
understand better. Also in the post Ive included some exercises for the reader, and some slightly harder
problems for the reader. You may find it informative to work through these exercises and problems.
Before diving in, one final caveat: I am not an expert on causal inference, nor on statistics. The reason I wrote
this post was to help me internalize the ideas of the causal calculus. Occasionally, one finds a presentation of
a technical subject which is beautifully clear and illuminating, a presentation where the author has seen right
through the subject, and is able to convey that crystalized understanding to others. Thats a great aspirational
goal, but I dont yet have that understanding of causal inference, and these notes dont meet that standard.
Nonetheless, I hope others will find my notes useful, and that experts will speak up to correct any errors or
misapprehensions on my part.
Simpsons paradox
Let me start by explaining two example problems to illustrate some of the difficulties we run into when making
inferences about causality. The first is known asSimpsons paradox. To explain Simpsons paradox Ill use a
concrete example based on the passage of the Civil Rights Act in the United States in 1964.
In the US House of Representatives, 61 percent of Democrats voted for the Civil Rights Act, while a much
higher percentage, 80 percent, of Republicans voted for the Act. You might think that we could conclude from
this that being Republican, rather than Democrat, was an important factor in causing someone to vote for the
Civil Rights Act. However, the picture changes if we include an additional factor in the analysis, namely,
whether a legislator came from a Northern or Southern state. If we include that extra factor, the
situation completely reverses, in both the North and the South. Heres how it breaks down:
North: Democrat (94 percent), Republican (85 percent)
South: Democrat (7 percent), Republican (0 percent)
Yes, you read that right: in both the North and the South, a larger fraction of Democrats than Republicans
voted for the Act, despite the fact that overall a larger fraction of Republicans than Democrats voted for the
Act.
You might wonder how this can possibly be true. Ill quickly state the raw voting numbers, so you can check
that the arithmetic works out, and then Ill explain why its true. You can skip the numbers if you trust my
arithmetic.
North: Democrat (145/154, 94 percent), Republican (138/162, 85 percent)
South: Democrat (7/94, 7 percent), Republican (0/10, 0 percent)
Overall: Democrat (152/248, 61 percent), Republican (138/172, 80 percent)
One way of understanding whats going on is to note that a far greater proportion of Democrat (as opposed to
Republican) legislators were from the South. In fact, at the time the House had 94 Democrats, and only 10
Republicans. Because of this enormous difference, the very low fraction (7 percent) of southern Democrats
voting for the Act dragged down the Democrats overall percentage much more than did the even lower
fraction (0 percent) of southern Republicans who voted for the Act.
(The numbers above are for the House of Congress. The numbers were different in the Senate, but the same
overall phenomenon occurred. Ive taken the numbers fromWikipedias article about Simpsons paradox,
and there are more details there.)
If we take a naive causal point of view, this result looks like a paradox. As I said above, the overall voting
pattern seems to suggest that being Republican, rather than Democrat, was an important causal factor in
voting for the Civil Rights Act. Yet if we look at the individual statistics in both the North and the South, then
wed come to the exact opposite conclusion. To state the same result more abstractly, Simpsons paradox is
the fact that the correlation between two variables can actually bereversed when additional factors are
considered. So two variables which appear correlated can become anticorrelated when another factor is
taken into account.
You might wonder if results like those we saw in voting on the Civil Rights Act are simply an unusual fluke.
But, in fact, this is not that uncommon. Wikipedias page on Simpsons paradox lists many important and
similar real-world examples ranging from understanding whether there is gender-bias in university admissions
to which treatment works best for kidney stones. In each case, understanding the causal relationships turns
out to be much more complex than one might at first think.
Ill now go through a second example of Simpsons paradox, the kidney stone treatment example just
mentioned, because it helps drive home just how bad our intuitions about statistics and causality are.
Imagine you suffer from kidney stones, and your Doctor offers you two choices: treatment A or treatment B.
Your Doctor tells you that the two treatments have been tested in a trial, and treatment A was effective for a
higher percentage of patients than treatment B. If youre like most people, at this point youd say Well, okay,
Ill go with treatment A.
Heres the gotcha. Keep in mind that this really happened. Suppose you divide patients in the trial up into
those with large kidney stones, and those with small kidney stones. Then even though treatment A was
effective for a higher overall percentage of patients than treatment B, treatment B was effective for a higher
percentage of patients in both groups, i.e., for both large and small kidney stones. So your Doctor could just
as honestly have said Well, you have large [or small] kidney stones, and treatment B worked for a higher
percentage of patients with large [or small] kidney stones than treatment A. If your Doctor had made either
one of these statements, then if youre like most people youd have decided to go with treatment B, i.e., the
exact opposite treatment.
The kidney stone example relies, of course, on the same kind of arithmetic as in the Civil Rights Act voting,
and its worth stopping to figure out for yourself how the claims I made above could possibly be true. If youre
having trouble, you can click through to the Wikipedia page, which has all the details of the numbers.
Now, Ill confess that before learning about Simpsons paradox, I would have unhesitatingly done just as I
suggested a naive person would. Indeed, even though Ive now spent quite a bit of time pondering Simpsons
paradox, Im not entirely sure I wouldnt still sometimes make the same kind of mistake. I find it more than a
little mind-bending that my heuristics about how to behave on the basis of statistical evidence are obviously
not just a little wrong, but utterly, horribly wrong.
Perhaps Im alone in having terrible intuition about how to interpret statistics. But frankly I wouldnt be
surprised if most people share my confusion. I often wonder how many people with real decision-making
power politicians, judges, and so on are making decisions based on statistical studies, and yet they dont
understand even basic things like Simpsons paradox. Or, to put it another way, they have not the first clue
about statistics. Partial evidence may be worse than no evidence if it leads to an illusion of knowledge, and
so to overconfidence and certainty where none is justified. Its better to know that you dont know.
Correlation, causation, smoking, and lung cancer
As a second example of the difficulties in establishing causality, consider the relationship between cigarette
smoking and lung cancer. In 1964 the United States Surgeon General issued a report claiming that cigarette
smoking causes lung cancer. Unfortunately, according to Pearl the evidence in the report was based primarily
on correlations between cigarette smoking and lung cancer. As a result the report came under attack not just
by tobacco companies, but also by some of the worlds most prominent statisticians, including the
great Ronald Fisher. They claimed that there could be a hidden factor maybe some kind of genetic factor
which caused both lung cancer and people to want to smoke (i.e., nicotine craving). If that was true, then
while smoking and lung cancer would be correlated, the decision to smoke or not smoke would have no
impact on whether you got lung cancer.
Now, you might scoff at this notion. But derision isnt a principled argument. And, as the example of
Simpsons paradox showed, determining causality on the basis of correlations is tricky, at best, and can
potentially lead to contradictory conclusions. Itd be much better to have a principled way of using data to
conclude that the relationship between smoking and lung cancer is not just a correlation, but rather that there
truly is a causal relationship.
One way of demonstrating this kind of causal connection is to do a randomized, controlled experiment. We
suppose there is some experimenter who has the power tointervene with a person, literally forcing them to
either smoke (or not) according to the whim of the experimenter. The experimenter takes a large group of
people, and randomly divides them into two halves. One half are forced to smoke, while the other half are
forced not to smoke. By doing this the experimenter can break the relationship between smoking and any
hidden factor causing both smoking and lung cancer. By comparing the cancer rates in the group who were
forced to smoke to those who were forced not to smoke, it would then be possible determine whether or not
there is truly a causal connection between smoking and lung cancer.
This kind of randomized, controlled experiment is highly desirable when it can be done, but experimenters
often dont have this power. In the case of smoking, this kind of experiment would probably be illegal today,
and, I suspect, even decades into the past. And even when its legal, in many cases it would be impractical,
as in the case of the Civil Rights Act, and for many other important political, legal, medical, and econonomic
questions.
Causal models
To help address problems like the two example problems just discussed, Pearl introduced a causal calculus.
In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the
smoking-cancer connection. Well see that even without doing a randomized controlled experiment its
possible (with the aid of some reasonable assumptions) to infer what the outcome of a randomized controlled
experiment would have been, using only relatively easily accessible experimental data, data that doesnt
require experimental intervention to force people to smoke or not, but which can be obtained from purely
observational studies.
To state the rules of the causal calculus, well need several background ideas. Ill explain those ideas over the
next three sections of this post. The ideas are causal models (covered in this section), causal conditional
probabilities, and d-separation, respectively. Its a lot to swallow, but the ideas are powerful, and worth taking
the time to understand. With these notions under our belts, well able to understand the rules of the causal
calculus
To understand causal models, consider the following graph of possible causal relationships between
smoking, lung cancer, and some unknown hidden factor (say, a hidden genetic factor):
This is a quite general model of causal relationships, in the sense that it includes both the suggestion of the
US Surgeon General (smoking causes cancer) and also the suggestion of the tobacco companies (a hidden
factor causes both smoking and cancer). Indeed, it also allows a third possibility: that perhaps both smoking
and some hidden factor contribute to lung cancer. This combined relationship could potentially be quite
complex: it could be, for example, that smoking alone actually reduces the chance of lung cancer, but the
hidden factor increases the chance of lung cancer so much that someone who smokes would, on average,
see an increased probability of lung cancer. This sounds unlikely, but later well see some toy model data
which has exactly this property.
Of course, the model depicted in the graph above is not the most general possible model of causal
relationships in this system; its easy to imagine much more complex causal models. But at the very least this
is an interesting causal model, since it encompasses both the US Surgeon General and the tobacco
company suggestions. Ill return later to the possibility of more general causal models, but for now well
simply keep this model in mind as a concrete example of a causal model.
Mathematically speaking, what do the arrows of causality in the diagram above mean? Well develop an
answer to that question over the next few paragraphs. It helps to start by moving away from the specific
smoking-cancer model to allow a causal model to be based on a more general graph indicating possible
causal relationships between a number of variables:
could be a two-outcome random variable indicating the presence or absence of some gene
indicates gets lung cancer or doesnt get lung cancer. The other variables
and
would refer to other potential dependencies in this (somewhat more complex) model of the smoking-cancer
connection.
A notational convention that well use often is to interchangeably use
causal model, and also as a way of labelling the corresponding vertex in the graph for the causal model. It
should be clear from context which is meant. Well also sometimes refer interchangeably to the causal model
or to the associated graph.
For the notion of causality to make sense we need to constrain the class of graphs that can be used in a
causal model. Obviously, itd make no sense to have loops in the graph:
We cant have
causing
causing
causing
constrain the graph to be a directed acyclic graph, meaning a (directed) graph which has no loops in it.
By the way, I must admit that Im not a fan of the term directed acyclic graph. It sounds like a very
complicated notion, at least to my ear, when what it means is very simple: a graph with no loops. Id really
prefer to call it a loop-free graph, or something like that. Unfortunately, the directed acyclic graph
nomenclature is pretty standard, so well go with it.
Our picture so far is that a causal model consists of a directed acyclic graph, whose vertices are labelled by
random variables
. For instance, in the graph shown below (which is the same as the complex graph
Now, of course, vertices further back in the graph say, the parents of the parents could, of course,
influence the value of
notation, using
variables. Ill use this kind of overloading quite a bit in the rest of this post. In particular, Ill often use the
notation
(or
or
Motivated by the above discussion, one way we could define causal influence would be to require that
be
where
is some function. In fact, well allow a slightly more general notion of causal influence,
allowing
to not just be a deterministic function of the parents, but a random function. We do this by
requiring that
where
is a function, and
is
itself, or a descendant of
are independent of
(and, through
, except
, and each
. The
, except when
is
or a
. Instead, well
work with the following equation, which specifies the causal models joint probability distribution as a product
of conditional probabilities:
I wont prove this equation, but the expression should be plausible, and is pretty easy to prove; Ive asked you
to prove it as an optional exercise below.
Exercises
Problems
introduced above. Suppose that the hidden factor is a gene which is either switched
on or off. If on, it tends to make people both smoke and get lung cancer. Find explicit
values for conditional probabilities in the causal model such
that
genetic factor is taken into account this relationship is reversed. That is, we have both
and
.
Problems for the author
An alternate, equivalent approach to defining causal models is as follows: (1) all root
vertices (i.e., vertices with no parents) in the graph are labelled by independent
random variables. (2) augment the graph by introducing new vertices corresponding
to the
. (3)
is, admittedly, a rather special case, but is perfectly consistent with the definition. For example, in a causal
model like
it is possible that the outcome of cancer might be independent of the hidden causal factor or, for that matter,
that it might be independent of whether someone smokes or not. (Indeed, logically, at least, it may be
independent of both, although of course thats not what well find in the real world.) The second caveat in how
we think about the arrows and causality is that the arrows only capture the direct causal influences in the
model. It is possible that in a causal model like
and
influence, mediated by other random variables, but it would still be a causal influence. In the next section Ill
give a more formal definition of causal influence that can be used to make these ideas precise.
Causal conditional probabilities
In this section Ill explain what I think is the most imaginative leap underlying the causal calculus. Its the
introduction of the concept of causal conditional probabilities.
The notion of ordinary conditional probabilities is no doubt familiar to you. Its pretty straightforward to do
experiments to estimate conditional probabilities such as
population of people who smoke, and figuring out what fraction of those people develop cancer.
Unfortunately, for the purpose of understanding the causal relationship between smoking and
cancer,
isnt the quantity we want. As the tobacco companies pointed out, there might
well be a hidden genetic factor that makes it very likely that youll see cancer in anyone who smokes, but that
wouldnt therefore mean that smoking causes cancer.
As we discussed earlier, what youd really like to do in this circumstance is a randomized controlled
experiment in which its possible for the experimenter to force someone to smoke (or not smoke), breaking
the causal connection between the hidden factor and smoking. In such an experiment you really could see if
there was a causal influence by looking at what fraction of people who smoked got cancer. In particular, if that
fraction was higher than in the overall population then youd be justified in concluding that smoking helped
cause cancer. In practice, its probably not practical to do this kind of randomized controlled experiment. But
Pearl had what turns out to be a very clever idea: to imagine a hypothetical world in which it really ispossible
to force someone to (for example) smoke, or not smoke. In particular, he introduced a conditional causal
probability
This should be read as the (causal conditional) probability of cancer given that we do smoking, i.e.,
someone has been forced to smoke in a (hypothetical) randomized experiment.
Now, at first sight this appears a rather useless thing to do. But what makes it a clever imaginative leap is that
although it may be impossible or impractical to do a controlled experiment to
determine
such causal conditional probabilities should obey. And, by making use of this causal calculus, it turns out to
sometimes be possible to infer the value of probabilities such as
, even when a
controlled, randomized experiment is impossible. And thats a very remarkable thing to be able to do, and why
I say it was so clever to have introduced the notion of causal conditional probabilities.
Well discuss the rules of the causal calculus later in this post. For now, though, lets develop the notion of
causal conditional probabilities. Suppose we have a causal model of some phenomenon:
Now suppose we introduce an external experimenter who is able to intervene to deliberately set the value of
a particular variable
to
. In other words, the experimenter can override the other causal influences on
In this new causal model, weve represented the experimenter by a new vertex, which has as a child the
vertex
to
experimenters intervention overrides the other causal influences. (Note that the edges to the children of
are left undisturbed.) In fact, its even simpler (and equivalent) to consider a causal model where the parents
have been cut off from
This model has no vertex explicitly representing the experimenter, but rather the
relation
and the corresponding causal model aperturbed causal model. In the perturbed causal model the only
change is to delete the edges to
relation
by the
Our aim is to use this perturbed causal model to compute the conditional causal
probability
. In this expression,
probability
causal model,
that the probability distribution in the original causal model was given by
where the product on the right is over all vertices in the causal model. This expression remains true for the
perturbed causal model, but a single term on the right-hand side changes: the conditional probability for
the
of
to be
. As a result we have:
This equation is a fundamental expression, capturing what it means for an experimenter to intervene to set
the value of some particular variable in a causal model. It can easily be generalized to a situation where we
and
, where
variables:
and . The expression [1] can be viewed as a definitionof causal conditional probabilities. But
although this expression is fundamental to understanding the causal calculus, it is not always useful in
practice. The problem is that the values of some of the variables on the right-hand side may not be known,
and cannot be determined by experiment. Consider, for example, the case of smoking and cancer. Recall our
causal model:
we try to use the expression on the right of equation [1]: weve got no way of estimating the conditional
probabilities for smoking given the hidden common factor. So we cant obviously
compute
. And, as you can perhaps imagine, this is the kind of problem that will
come up a lot whenever were worried about the possible influence of some hidden factor.
All is not lost, however. Just because we cant compute the expression on the right of [1] directly doesnt
mean we cant compute causal conditional probabilities in other ways, and well see below how the causal
calculus can help solve this kind of problem. Its not a complete solution we shall see that it doesnt always
make it possible to compute causal conditional probabilities. But it does help. In particular, well see that
although its not possible to compute
compute
With causal conditional probabilities defined, were now in position to define more precisely what we mean by
causal influence. Suppose we have a causal model, and
subsets of random variables). Then we say
of
and
of
such that
and
and
. The
following exercise gives an information-theoretic justification for this definition of causal influence: it shows
(The causal capacity) This exercise is for people with some background in
information theory. Suppose we define the causal capacity between
be
, where
and
to
with distribution
theorem tells us that an external experimenter who can intervene to set the value
of
causal capacity. Show that the causal capacity is greater than zero if and only if
has a causal influence over
Weve just defined a notion of causal influence between two random variables in a causal model. What about
when we say something like Event A causes Event B? What does this mean? Returning to the smokingcancer example, it seems that we would say that smoking causes cancer
provided
uninfluenced by other causal factors, then they would increase their chance of cancer. Intuitively, it seems to
me that this notion of events causing one another should be related to the notion of causal influence just
defined above. But I dont yet see quite how to do that. The first problem below suggests a conjecture in this
direction:
Problems for the author
Suppose
and
that
this imply that
and . Does
(Sum-over-paths for causal conditional probabilities?) I believe a kind of sumover-paths formulation of causal conditional probabilities is possible, but havent
worked out details. The idea is as follows (the details may be quite wrong, but I
believe something along these lines should work). Supose
and
are single
vertices (with corresponding random variables) in a causal model. Then I would like
to show first that if
is not an ancestor of
then
, i.e.,
is an ancestor of
to
then
in
may be
, and computing for
We used causal models in our definition of causal conditional probabilities. But our
informal definiton imagine a hypothetical world in which its possible to force a
variable to take a particular value didnt obviously require the use of a causal
Another way of framing the last problem is that Im concerned about the empirical
basis for causal models. How should we go about constructing such models? Are
they fundamental, representing true facts about the world, or are they modelling
conveniences? (This is by no means a dichotomy.) It would be useful to work through
many more examples, considering carefully the origin of the functions
the auxiliary random variables
and of
d-separation
In this section well develop a criterion that Pearl calls directional separation (d-separation, for short). What dseparation does is let us inspect the graph of a causal model and conclude that a random variable
model cant tell us anything about the value of another random variable
in the
To understand d-separation well start with a simple case, and then work through increasingly complex cases,
building up our intuition. Ill conclude by giving a precise definition of d-separation, and by explaining how dseparation relates to the concept of conditional independence of random variables.
Heres the first simple causal model:
Clearly, knowing
case
and
are not d-separated. Well use the term d-connected as a synonym for not d-separated, and
and
are d-connected.
and
A useful piece of terminology is to say that a vertex like the middle vertex in this model is a collider for the
path from
to
, meaning a vertex at which both edges along the path are incoming.
Its like the way knowing the genome for one sibling can give us information about the genome of another
sibling, since similarities between the genomes can be inferred from the common ancestry. Well call a vertex
like the middle vertex in this model a fork for the path from
to
are outgoing.
Exercises
Construct an explicit causal model demonstrating the assertion of the last paragraph.
For example, you may construct a causal model in which
fork, and where
is actually a function of
and
are joined by a
to
that
or
be the number of
to
that contains no colliders is called an unblocked path. (Note that by the above exercise, an unblocked path
must contain either one or no forks.) In general, we define
and
to be d-connected if there is an
unblocked path between them. We define them to be d-separated if there is no such unblocked path.
Its worth noting that the concepts of d-separation and d-connectedness depend only on the graph topology
and on which vertices
random variables
and
and
have been chosen. In particular, they dont depend on the nature of the
determine d-separation or d-connectdness simply by inspecting the graph. This fact that d-separation and
d-connectdness are determined by the graph also holds for the more sophisticated notions of d-separation
and d-connectedness we develop below.
With that said, it probably wont surprise you to learn that the concept of d-separation is closely related to
whether or not the random variables
and
(optionally) develop through the following exercises. Ill state a much more general connection below.
Exercises
Suppose that
and
and
are independent
. Explain how to
construct a causal model on that graph such that the random variables
corresponding to those two vertices are not independent.
and
The last two exercises almost but dont quite claim that random variables
and
in
a causal model are independent if and only if they are d-separated. Why does this
statement fail to be true? How can you modify the statement to make it true?
So far, this is pretty simple stuff. It gets more complicated, however, when we extend the notion of dseparation to cases where we are conditioning on already knowing the value of one or more random
variables in the causal model. Consider, for example, the graph:
(Figure A.)
Now, if we know
, then knowing
is already a function of
to
unconditioned case this path would not have been considered blocked. Well also say that
separated, given
and
are d-
It is helpful to give a name to vertices like the middle vertex in Figure A, i.e., to vertices with one ingoing and
one outgoing edge. Well call such vertices a traversealong the path from
lesson of the above discussion is that if
to
and
to
.
, even if we know
to
. This is
. And so we say
Again, if we know
, then knowing
to
path would not have been considered blocked. Again, in this example
The lesson of this model is that if
and
to
(Figure B.)
In the unconditioned case this would have been considered a blocked path. And, naively, it seems as though
this should still be the case: at first sight (at least according to my intuition) it doesnt seem very likely that
can give us any additional information about
cautious, because the argument we made for the graph in Figure A breaks down: we cant say, as we did for
Figure A, that
is a function of
and
. This is a phenomenon which Pearl calls Berksons paradox. He gives the example
of a graduate school in music which will admit a student (a possibility encoded in the value of
have high undergraduate grades (encoded in
music (encoded in
) if either they
). It would not be surprising if these two attributes were anticorrelated amongst students
in the program, e.g., students who were admitted on the basis of exceptional gifts would be more likely than
otherwise to have low grades. And so in this case knowledge of
knowledge of
program).
Another way of seeing Berksons paradox is to construct an explicit causal model for the graph in Figure B.
Consider, for example, a causal model in which
equal probabilities
and
. We suppose that
does, indeed, have the structure of Figure B. But given that we know the value
tells us everything about
, since
or , chosen with
As a result of this discussion, in the causal graph of Figure B well say that
to
even though in the unconditioned case the path would have been considered blocked. And well also say that
in this causal graph
and
and
and
(Figure C.)
To see this, suppose we choose
bits,
finally, we choose
know
and
, because
we say that
from
to
, given that we
The general intuition about graphs like that in Figure C is that knowing
the ancestors of
. And,
, and so we must act as though those ancestors are known, too. As a result, in this case
to
, since
is d-connected to
Given the discussion of Figure C that weve just had, you might wonder why forks or traverses which are
ancestors of
cant block a path, for similar reasons? For instance, why dont we consider
separated, given
and
to be d-
and
to be d-separated, given
in this example.
These examples motivate the following definition:
Definition: Let
in
to a vertex in
an ancestor of
and
and
sometimes omit the graph when the context is clear. Well write
; well
to denote unconditional d-
separation.
As an aside, Pearl uses a similar but slightly different notation for d-separation, namely
Unfortunately, while the symbol
looks like a LaTeX symbol, its not, but is most easily produced using a
rather dodgy LaTeX hack. Instead of using that hack over and over again, Ive adopted a more standard
LaTeX notation.
While Im making asides, let me make a second: when I was first learning this material, I found the d for
directional in d-separation and d-connected rather confusing. It suggested to me that the key thing was
having a directed path from one vertex to the other, and that the complexities of colliders, forks, and so on
were a sideshow. Of course, theyre not, theyre central to the whole discussion. For this reason, when I was
writing these notes I considered changing the terminology to i-separated and i-connected, for informationallyseparated and informationally-connected. Ultimately I decided not to do this, but I thought mentioning the
issue might be helpful, in part to reassure readers (like me) who thought the d seemed a little mysterious.
Okay, thats enough asides, lets get back to the main track of discussion.
We saw earlier that (unconditional) d-separation is closely connected to the independence of random
variables. It probably wont surprise you to learn that conditional d-separation is closely connected to
conditional independence of random variables. Recall that two sets of random variables
are conditionally independent, given a third set of random variables
and
, if
. The
following theorem shows that d-separation gives a criterion for when conditional independence occurs in a
causal model:
Theorem (graphical criterion for conditional independence): Let
disjoint subsets of vertices in that graph. Then
models on
and
and
be
(Update: Thanks to Rob Spekkens for pointing out an error in my original statement of this theorem.)
I wont prove the theorem here. However, its not especially difficult if youve followed the discussion above,
and is a good problem to work through:
Problems
The concept of d-separation plays a central role in the causal calculus. My sense is
that it should be possible to find a cleaner and more intuitive definition that
substantially simplifies many proofs. Itd be good to spend some time trying to find
such a definition.
, and
from
have been deleted. This is the graph which results when an experimenter intervenes to set
, overriding other causal influences on
Rule 1: When can we ignore observations: Ill begin by stating the first rule in all its glory, but dont worry if
you dont immediately grok the whole rule. Instead, just take a look, and try to start getting your head around
it. What well do then is look at some simple special cases, which are easily understood, and gradually build
up to an understanding of what the full rule is saying.
Okay, so heres the first rule of the causal calculus. What it tells us is that when
can ignore the observation of
to set
, conditional on both
, then we
and an intervention
To understand why this rule is true, and what it means, lets start with a much simpler case. Lets look at what
happens to the rule when there are no
simply becomes that
worry about
or
is d-separated from
because theres no
. Theres no need to
we have
, so
is independent of
that
In other words, the first rule is simply a generalization of what it means for
and
and
being independent.
to be independent. The
full rule generalizes the notion of independence in two ways: (1) by adding in an extra variable
value has been determined by passive observation; and (2) by adding in an extra variable
whose
whose value
has been set by intervention. Well consider these two ways of generalizing separately in the next two
paragraphs.
We begin with generalization (1), i.e., there is no
becomes that
is d-separated from
, given
is conditionally independent of
, which is exactly the statement of the rule. And so the first rule can be viewed
and
to be independent, conditional on
Now lets look at the other generalization, (2), in which weve added an extra variable
been set by intervention, and where there is no
becomes that
, given
is d-separated from
, given
.
whose value has
The rules of the causal calculus: All three rules of the causal calculus follow a similar template to the first
rule: they provide ways of using facts about the causal structure (notably, d-separation) to make inferences
about conditional causal probabilities. Ill now state all three rules. The intuition behind rules 2 and 3 wont
necessarily be entirely obvious, but after our discussion of rule 1 the remaining rules should at least appear
plausible and comprehensible. Ill have bit more to say about intuition below.
As above, we have a causal model on a graph
, and
causal model.
been deleted.
to the children of
have been
. Then:
. Then:
have
. Suppose
. Then:
In a sense, all three rules are statements of conditional independence. The first rule tells us when we can
ignore an observation. The second rule tells us when we can ignore the act of intervention (although that
doesnt necessarily mean we can ignore the value of the variable being intervened with). And the third rule
tells us when we can ignore an intervention entirely, both the act of intervention, and the value of the variable
being intervened with.
I wont prove rule 2 or rule 3 this post is already quite long enough. (If I ever significantly revise the post I
may include the proofs). The important thing to take away from these rules is that they give us conditions on
the structure of causal models so that we know when we can ignore observations, acts of intervention, or
even entire variables that have been intervened with. This is obviously a powerful set of tools to be working
with in manipulating conditional causal probabilities!
Indeed, according to Pearl theres even a sense in which this set of rules is complete, meaning that using
these rules you can identify all causal effects in a causal model. I havent yet understood the proof of this
result, or even exactly what it means, but thought Id mention it. The proof is in papers by Shpitser and
Pearl and Huang and Valtorta. If youd like to see the proofs of the rules of the calculus, you can either have
a go at proving them yourself, or you can read the proof.
Problems for the author
Suppose the conditions of rules 1 and 2 hold. Can we deduce that the conditions of
rule 3 also hold?
The great benefit of this model was that it included as special cases both the hypothesis that smoking causes
cancer and the hypothesis that some hidden causal factor was responsible for both smoking and cancer.
It turns out, unfortunately, that the causal calculus doesnt help us analyse this model. Ill explain why thats
the case below. However, rather than worrying about this, at this stage its more instructive to work through an
example showing how the causal calculus can be helpful in analysing a similar but slightly modified causal
model. So although this modification looks a little mysterious at first, for now I hope youll be willing to accept
it as given.
The way Im going to modify the causal model is by introducing an extra variable, namely, whether someone
has appreciable amounts of tar in their lungs or not:
(By tar, I dont mean tar literally, but rather all the material deposits found as a result of smoking.)
This causal model is a plausible modification of the original causal model. It is at least plausible to suppose
that smoking causes tar in the lungs and that those deposits in turn cause cancer. But if the hidden causal
factor is genetic, as the tobacco companies argued was the case, then it seems highly unlikely that the
genetic factor caused tar in the lungs, except by the indirect route of causing those people to smoke. (Ill
come back to what happens if you refuse to accept this line of reasoning. For now, just go with it.)
Our goal in this modified causal model is to compute probabilities
like
.
without needing to know anything about the
hidden factor. We wont even need to know the nature of the hidden factor. It also means that we can
determine
i.e., to set the value for
In other words, the causal calculus lets us do something that seems almost miraculous: we can figure out the
probability that someone would get cancer given that they are in the smoking group in a randomized
controlled experiment, without needing to do the randomized controlled experiment. And this is true even
though there may be a hidden causal factor underlying both smoking and cancer.
The obvious first question to ask is whether we can apply rule 2 or rule 3 directly to the conditional causal
probability
If rule 2 applies, for example, it would say that intervention doesnt matter, and so
Intuitively, this seems unlikely. Wed expect that intervention really can change the probability of cancer given
smoking, because intervention would override the hidden causal factor.
If rule 3 applies, it would say that
has no impact on whether they get cancer. This seems even more unlikely than rule 2 applying.
However, as practice and a warm up, lets work through the details of seeing whether rule 2 or rule 3 can be
applied directly to
Obviously,
deleted:
and
the fact that the hidden causal factor indeed does influence both
What about rule 3? For this to apply wed need
pointing toward
deleted:
. Recall that
Again,
can influence
to
way we can determine this probability? An experienced probabilist would at this point instinctively wonder
whether it would help to condition on the value of , writing:
Of course, saying an experienced probabilist would instinctively do this isnt quite the same as
explaining why one should do this! However, it is at least a moderately obvious thing to do: the only extra
information we potentially have in the problem is , and so its certainly somewhat natural to try to introduce
that variable into the problem. As we shall see, this turns out to be a wise thing to do.
Exercises
This should be intuitively plausible, but really requires proof. Prove that the equation
is correct.
To simplify the right-hand side of equation [2], we first note that we can apply rule 2 to the second term on the
right-hand side, obtaining
apply is that
to
is blocked at
is d-separated from
in
. As a result, we have:
At this point in the presentation, Im going to speed the discussion up, telling you what rule of the calculus to
apply at each step, but not going through the process of explicitly checking that the conditions of the rule
hold. (If youre doing a close read, you may wish to check the conditions, however.)
The next thing we do is to apply rule 2 to the first term on the right-hand side of equation [3],
obtaining
obtaining
to the computation of
. This
doesnt seem terribly encouraging: weve merely substituted the computation of one causal conditional
probability for another. Still, let us continue plugging away, and see if we can make progress. The obvious
first thing to try is to apply rule 2 or rule 3 to simplify
surprisingly, neither rule applies. So what do we do? Well, in a repeat of our strategy above, we again
condition on the other variable we have available to us, in this case
, and so we have
to
assuming the causal model above) in terms of quantities which may be observed directly from experimental
data, and which dont require intervention to do a randomized, controlled experiment.
Once
If
is larger than
than
in the model
above? Is there some way we could have seen that this would be the case, without
needing to go through a detailed computation?
, with
and
using
Unfortunately, I dont know what the experimentally observed probabilities are in the smoking-tar-cancer
case. If anyone does, Id be interested to know. In lieu of actual data, Ill use some toy model data suggested
by Pearl; the data is quite unrealistic, but nonetheless interesting as an illustration of the use of equation [5].
The toy model data is as follows:
(1) 47.5 percent of the population are nonsmokers with no tar in their lungs, and 10 percent of these get
cancer.
(2) 2.5 percent are smokers with no tar, and 90 percent get cancer.
(3) 2.5 percent are nonsmokers with tar, and 5 percent get cancer.
(4) 47.5 percent are smokers with tar, and 85 percent get cancer.
In this case, we get:
By contrast,
percent, and so if this data was correct (obviously its not even close) it
would show that smoking actually somewhat reduces a persons chance of getting lung cancer. This is
despite the fact that
correlations alone would suggest that smoking causes cancer. In fact, in this imagined world smoking might
actually be useable as a preventative treatment for cancer! Obviously this isnt truly the case, but it does
illustrate the power of this method of analysis.
Summing up the general lesson of the smoking-cancer example, suppose we have two competing
hypotheses for the causal origin of some effect in a system, A causes C or B causes C, say. Then we should
try to construct a realistic causal model which includes both hypotheses, and then use the causal calculus to
attempt to distinguish the relative influence of the two causal factors, on the basis of experimentally
accessible data.
Incidentally, the kind of analysis of smoking we did above obviously wasnt done back in the 1960s. I dont
actually know how causality was established over the protestations that correlation doesnt impy causation.
But its not difficult to think of ways you might have come up with truly convincing evidence that smoking was
a causal factor. One way would have been to look at the incidence of lung cancer in populations where
smoking had only recently been introduced. Suppose, for example, that cigarettes had just been introduced
into the (fictional) country of Nicotinia, and that this had been quickly followed by a rapid increase in rates of
lung cancer. If this pattern was seen across many new markets then it would be very difficult to argue that
lung cancer was being caused solely by some pre-existing factor in the population.
Exercises
Construct toy model data where smoking increases a persons chance of getting lung
cancer.
Lets leave this model of smoking and lung cancer, and come back to our original model of smoking and lung
cancer:
What would have happened if wed tried to use the causal calculus to analyse this model? I wont go through
all the details, but you can easily check that whatever rule you try to apply you quickly run into a dead end.
And so the causal calculus doesnt seem to be any help in analysing this problem.
This example illustrates some of the limitations of the causal calculus. In order to
compute
While this model is plausible, it is not beyond reproach. You could, for example, criticise it by saying that it is
not the presence of tar deposits in the lungs that causes cancer, but maybe some other factor, perhaps
something that is currently unknown. This might lead us to consider a causal model with a revised structure:
So we could try instead to use the causal calculus to analyse this new model. I havent gone through this
exercise, but I strongly suspect that doing so we wouldnt be able to use the rules of the causal calculus to
compute the relevant probabilities. The intuition behind this suspicion is that we can imagine a world in which
the tar may be a spurious side-effect of smoking that is in fact entirely unrelated to lung cancer. What causes
lung cancer is really an entirely different mechanism, but we couldnt distinguish the two from the statistics
alone.
The point of this isnt to say that the causal calculus is useless. Its remarkable that we can plausibly get
information about the outcome of a randomized controlled experiment without actually doing anything like that
experiment. But there are limitations. To get that information we needed to make some presumptions about
the causal structure in the system. Those presumptions are plausible, but not logically inevitable. If someone
questions the presumptions then it may be necessary to revise the model, perhaps adopting a more
sophisticated causal model. One can then use the causal calculus to attempt to analyse that more
sophisticated model, but we are not guaranteed success. It would be interesting to understand systematically
when this will be possible and when it will not be. The following problems start to get at some of the issues
involved.
Problems for the author
Is it possible to make a more precise statement than the causal calculus doesnt
seem to be any help for the original smoking-cancer model?
Extending the last problem, itd be good to have an algorithm to answer questions
like: in the space of all possible causal models consistent with a given set of
observed probabilities, what can we say about the possible causal probabilities? It
would also be useful to be able to input to the algorithm some constraints on the
causal models, representing knowledge were already sure of.
In real-world experiments there are many practical issues that must be addressed to
design a realiable randomized, controlled experiment. These issues
include selection bias, blinding, and many others. There is an entire field
ofexperimental design devoted to addressing such issues. By comparison, my
description of causal inference ignores many of these practical issues. Can we
integrate the best thinking on experimental design with ideas such as causal
conditional probabilities and the causal calculus?
From a pedagogical point of view, I wonder if it might have been better to work fully
through the smoking-cancer example before getting to the abstract statement of the
rules of the causal calculus. Those rules can all be explained and motivated quite
nicely in the context of the smoking-cancer example, and that may help in
understanding.
Conclusion
Ive described just a tiny fraction of the work on causality that is now going on. My impression as an
admittedly non-expert outsider to the field is that this is an exceptionally fertile field which is developing
rapidly and giving rise to many fascinating applications. Over the next few decades I expect the theory of
causality will mature, and be integrated into the foundations of disciplines ranging from economics to
medicine to social policy.
Causal discovery: One question Id like to understand better is how to discovercausal structures inside
existing data sets. After all, human beings do a pretty good (though far from perfect) job at figuring out causal
models from their observation of the world. Id like to better understand how to use computers to
automatically discover such causal models. I understand that there is already quite a literature on the
automated discovery of causal models, but I havent yet looked in much depth at that literature. I may come
back to it in a future post.
Im particularly fascinated by the idea of extracting causal models from very large unstructured data sets.
The KnowItAll group at the University of Washington (seeOren Etzioni on Google Plus) have done
fascinating work on a related but (probably) easier problem, the problem of open information extraction. This
means taking an unstructured information source (like the web), and using it to extract facts about the real
world. For instance, using the web one would like computers to be able to learn facts like Barack Obama is
President of the United States, without needing a human to feed it that information. One of the things that
makes this task challenging is all the misleading and difficult-to-understand information out on the web. For
instance, there are also webpages saying George Bush is President of the United States, which was
probably true at the time the pages were written, but which is now misleading. We can find webpages which
state things like [Let's imagine] Steve Jobs is President of the United States; its a difficult task for an
unsupervised algorithm to figure out how to interpret that Lets imagine. What the KnowItAll team have done
is made progress on figuring out how to learn facts in such a rich but uncontrolled environment.
What Im wondering is whether such techniques can be adapted to extract causal models from data? Itd be
fascinating if so, because of course humans dont just reason with facts, they also reason with (informal)
causal models that relate those facts. Perhaps causal models or a similar concept may be a good way of
representing some crucial part of our knowledge of the world.
Problems for the author
What systematic causal fallacies do human beings suffer from? We certainly often
make mistakes in the causal models we extract from our observations of the world
one example is that we often do assume that correlation implies causation, even
when thats not true and itd be nice to understand what systematic biases we have.
Humans arent just good with facts and causal models. Were also really good at
juggling multiple causal models, testing them against one another, finding problems
and inconsistencies, and making adjustments and integrating the results of those
models, even when the results conflict. In essence, we have a (working, imperfect)
theory of how to deal with causal models. Can we teach machines to do this kind of
integration of causal models?
We know that in our world the sun rising causes the rooster to crow, but its possible
to imagine a world in which it is the rooster crowing that causes the sun to rise. This
could be achieved in a suitably designed virtual world, for example. The reason we
believe the first model is correct in our world is not intrinsic to the data we have on
roosters and sunrise, but rather depends on a much more complex network of
background knowledge. For instance, given what we know about roosters and the
sun we can easily come up with plausible causal mechanisms (solar photons
impinging on the roosters eye, say) by which the sun could cause the rooster to
crow. There do not seem to be any similarly plausible causal models in the other
direction. How do we determine what makes a particular causal model plausible or
not? How do we determine the class of plausible causal models for a given
phenomenon? Can we make this kind of judgement automatically? (This is all closely
related to the last problem).
Continuous-time causality: A peculiarity in my post is that even though were talking about causality, and
time is presumably important, Ive avoided any explicit mention of time. Of course, its implicitly there: if Id
been a little more precise in specifying my models theyd no doubt be conditioned on events like smoked at
least a pack a day for 10 or more years. Of course, this way of putting time into the picture is rather coarsegrained. In a lot of practical situations were interested in understanding causality in a much more temporally
fine-grained way. To explain what I mean, consider a simple model of the relationship between what we eat
and our insulin levels:
This model represents the fact that what we eat determines our insulin levels, and our insulin levels in turn
play a part in determining how hungry we feel, and thus what we eat. But as a model, its quite inadequate. In
fact, theres a much more complex feedback relationship going on, a constant back-and-forth between what
we eat at any given time, and our insulin levels. Ideally, this wouldnt be represented by a few discrete events,
but rather by a causal model that reflects the continual feedback between these possibilities. What Id like to
see developed is a theory of continuous-time causal models, which can address this sort of issue. It would
also be useful to extend the calculus to continuous spaces of events. So far as I know, at present the causal
calculus doesnt work with these kinds of ideas.
Problems for the author
Other notions of causality: A point Ive glossed over in the post is how the notion of causal influence weve
been studying relates to other notions of causality.
The notion weve been exploring is based on the notion of causality that is established by a (hopefully welldesigned!) randomized controlled experiment. To understand what that means, think of what it would mean if
we used such an experiment to establish that smoking does, indeed, cause cancer. All this means is that in
the population being studied, forcing someone to smoke will increase their chance of getting cancer.
Now, for the practical matter of setting public health policy, thats obviously a pretty important notion of
causality. But nothing says that we wont tomorrow discover some population of people where no such causal
influence is found. Or perhaps well find a population where smoking actively helps prevent cancer. Both
these are entirely possible.
Whats going on is that while our notion of causality is useful for some purposes, it doesnt necessarily say
anything about the details of an underlying causal mechanism, and it doesnt tell us how the results will apply
to other populations. In other words, while its a useful and important notion of causality, its not the only way
of thinking about causality. Something Id like to do is to understand better what other notions of causality are
useful, and how the intervention-based approach weve been exploring relates to those other approaches.
Acknowledgments
Thanks to Jen Dodd, Rob Dodd, and Rob Spekkens for many discussions about causality. Especial thanks to
Rob Spekkens for pointing me toward the epilogue of Pearls book, which is what got me hooked on
causality!
Principal sources and further reading
A readable and stimulating overview of causal inference is the epilogue to Judea Pearls book. The
epilogue, in turn, is based on a survey lecture by Pearl on causal inference. I highly recommend getting a
hold of the book and reading the epilogue; if you cannot do that, I suggest looking over the survey lecture. A
draft copy of the first edition of the entire book is available on Pearls website. Unfortunately, the draft does
not include the full text of the epilogue, only the survey lecture. The lecture is still good, though, so you should
look at it if you dont have access to the full text of the epilogue. Ive also been told good things about the
book on causality by Spirtes, Glymour and Scheines, but havent yet had a chance to have a close look at
it. An unfortunate aspect of the current post is that it gives the impression that the theory of causal inference
is entirely Judea Pearls creation. Of course thats far from the case, a fact which is quite evident from both
Pearls book, and the Spirtes-Glymour-Scheines book. However, the particular facets Ive chosen to focus on
are due principally to Pearl and his collaborators: most of the current post is based onchapter 3 and chapter
1 of Pearls book, as well as a 1994 paper by Pearl, which established many of the key ideas of the causal
calculus. Finally, for an enjoyable and informative discussion of some of the challenges involved in
understanding causal inference I recommend Jonah Lehrers recent article in Wired.
Interested in more? Please follow me on Twitter. You may also enjoy reading my new book about open
science, Reinventing Discovery.
From Uncategorized
59 Comments
1.
Suresh permalink
Do you think thered be a way to interpret causal structure via geometry, much like we use geometry to
express correlation and other patterns in data mining. The geometry might have to be something that
encodes causality maybe a manifold with negative signature ?
arthegall permalink
Theres been plenty of work on the geometry of
curved exponential families, and their relation to
inference in graphical models. See, as a start, e.g.
http://uai.sis.pitt.edu/papers/98/p472-settimi.pdf
http://projecteuclid.org/euclid.aos/1009210550
http://arxiv.org/abs/math/0301255
Bernd Sturmfels and Lior Pachter also have a pretty
good book that touches on a lot of this
http://bio.math.berkeley.edu/ascb/
Suresh permalink
Yes, Im aware of that work.
But the geometry there is a
geometry in the parameter
space. I dont think it can be
used to capture this kind of
causality (at least at first
glance)
2.
3.
4.
5.
Nice exposition! Perhaps some notion of latent surprise could be relevant. Adapting from the Wired article
you cite, imagine that a candidate drugs operation has two plausible causal models. The first and most
plausible model is simple. It is used during drug development. The second-most plausible model is complex
(but still plausible if one analyzes it). If that second-most plausible causal model is very different from the first,
that could be a latent surprise for researchers a warning that, if their understanding of the drugs operation
changes somewhat, the clinical effects could be profound.
In general, if the most plausible few models are close (in the metric of plausibility) yet very different (in the
metric space of causal model similarity), this is a warning of big latent surprises if our understanding shifts a
bit. Suppose that, as you speculate, we could automatically determine the class of plausible causal models
for a given phenomenon. We might then also be able to scan automatically for latent surprises in important
systems: scientific, social, financial, policy, and so forth.
6.
7.
physics). With that assumption, whatever an experimenter does is merely one more observable in a
stochastic network, randomized controlled trials disappear, and causal calculus disappears as well. We arrive
at the conclusion that the only scientific method to attribute causality relies on the existence of free will as a
source of obvious causality.
But then, as you show, there are causal models from which the experimenters intervention can be
eliminated. We can thus draw conclusions about causality without assuming the obvious source of free will.
I wonder if it is possible to state under which conditions a causal model permits this elimination. Rules 2 and
3 are about individual variables, but is there a rule that applies to a complete graph?
8.
Naftali permalink
Thanks for this. Ive been spending a lot of time thinking about Pearls book lately and this is by far the most
accessible introduction to the material that I have come across.
One quick correction. Close to the end of your discussion of rule 1 (2 paragraphs before the heading: the
rules of the causal calculus), you give the equation:
P(y|do(x),z) = P(y|do(x),z)
Presumably you mean:
P(y|do(x),z) = P(y|do(x))
9.
10.
11.
Nicolas permalink
I have enjoyed a lot reading this. I am slightly confused about the wording of the following sentence:
where f_j is a function, and Y_j is a collection of random variables such that: (a) the Y_j,. are independent of
one another for different values of j; and (b) for each j, Y_j,. is independent of all variables X_k, except when
X_k is X_j itself, or a descendant of X_j. The intuition is that the are a collection of auxiliary random variables
which inject some extra randomness into X_j (and, through X_j, its descendants), but which are otherwise
independent of the variables in the causal model.
What you mean by that is that for instance in the diagram above the paragraph Y_4,i is not independent of
X_3 and X_2?
12.
13.
vzn permalink
one famous place & case study where hidden causality is notoriously, even fiendishly difficult to isolate and
shows the extreme subtlety involved: local hidden variable theories for quantum mechanics. which recently
have been brought back from the dead (or maybe semi zombie state) by anderson/brady in a soliton model.
more thoughts on that here. it has an aura of unorthodoxy but lets not forget that the greats have always
been enamored with the idea. einstein, schroedinger, t hooft, etcetera.
part of the difficulty in QM is the idea of counterintuitive variables that might actually cause the experiment
apparatus to measure or not measure (or click vs not click/silent). this has been called a conspiracy
for decades. not sure who invented that description.
14.
15.
Josh permalink
Imply causation? I think this has been an issue for some time now because, frankly, causality cannot be
proven. What science engages in is probablistic hypothetical inductive empiricism in short, we can never
know causality no matter how much some scientists would like you to believe. Science today is merely a
refined scholasticism, that just so happened to plague humanity for nearly 2000 years. Not a single person
can or has or will prove (analytically) universal causality of Being to put it in easier terms, someone prove to
me gravity will exist next Tuesday
16.
jenny permalink
Realy interesting!
17.
just_hobyst permalink
Interesting article overall, but I disagree with this statement:
We cant have X causing Y causing Z causing Y!
In fact, this is called positive feedback loop and is common in nature. You will find a lot of examples in
wikipedia, none of them needs a time machine
just_hobyst permalink
I noticed I incorrectly quoted you above, but the point is, loops in causal
diagrams are common.
18.
valtron permalink
In [1], Im confused how to expand the right side; I dont see where I can get the values for pa(Xj).
Im trying to expand the basic cancer-smoking-hidden model in terms of basic probabilities, and I can only get
as far as P(gets cancer | do(smokes)) = P(gets cancer, smokes) / P(smokes | pa(Smoke)).
(My end goal is to see if I can use [1] to expand the cancer-smoking-tar-hidden model and obtain the same
result that you did, but without using the causal calculus.)
19.
20.
robert permalink
Hello, thanks for this nice explanation of Pearls & al. theory.
But there is something I cant grasp in spite of reading Pearls lecture slides or some parts of his papers.
When simplifying equation [2], you say (as Pearl does) that we can apply rule 2 to find : p(z|do(x)) = p(z|x)
But rule 2 is much more complex than this. It tells about x,y,z and w.
How can you make disappear y and w in rule 2 ? Is it because w is unobserved ? Is it because pa(y) = x and
we can use another relation ?
Thanks for your help
robert permalink
Okay, after many readings , I guess Im now able to answer to myself.
In the 1992 paper, Pearl derives three properties from [1] formula.
The third is :
p(z|do(x)) = p(z|x) iff z_|_ pa(x) | x
which is the case in the example graph.
Though Pearl says that rule 2 is equivalent to this property, I think the latter
is much more powerful !
21.
22.
John permalink
Regarding the application of Simpsons Paradox to the Civil Rights Act and your mention of application to
gender bias I would ask, how far can one go in slicing and dicing? How often is this an exercise in merely
seeking an outcome that supports ones pre-existing bias? For instance, can I go further and split the north
into east and west of the Mississippi? Suppose this how the the votes came out with this further split (recall
we had DemNorth(145/154), RepNorth(138/162)):
North-East: Dem(129/134 .966)
North-West: Dem(16/20 = .8) Rep(109/132 >.825)
Now we have three regions, NorthEast, NorthWest, and South and the republican % was higher in two out of
three. Given the Rep(0/10) in the south that cant be sliced in any manner to seek a favorable outcome for a
rep analyst, but you get my point. I just quickly jotted down a few trials to come up with this example which is
not surprising given the initial split into north-south is just a first iteration that demonstrates this is possible.
But again I ask, where does the slicing and dicing stop in such an analysis? Usually with these sorts of
political and judicial analyses, those things that involve human motivations, it usually stops where the desired
outcome is achieved and the best part is one can claim it was scientific and mathematical so is
indisputable! The analyst can say under oath and with a straight face,I lay the numbers before you and the
numbers dont lie. But just what do the numbers tell us?
Your threshold being Republican, rather than Democrat, was an important factor in causing someone to vote
for the Civil Rights Act is also subjective as it must be in dealing with human motivations, e.g. what is
important?, what is causing? One could note the 94Dem/10Rep representation from the south, and
analyzing the majority of southern voters motivations at that time conclude that a major reason for the big
Dem majority in that region was in part caused by the voters view that based on platforms and reputation,
being Rep, the losing challenger was most likely in favor of the Civil Rights Act.
23.
John permalink
In see that in my previous post on slicing and dicing somehow things got a bit garbled between what I typed
in and what displayed. One could derive the details given what did display but here is what I intended
regarding the East-West split of the North in the Civil Rights vote split:
North-East: Dem(129/134).966
North-West Dem(16/20)=.80 Rep(109/132)>,825
Ive applied Simpsons Paradox to the North vote split. This is hypothetical, but one could gerrymander a
region to demonstrate or refute pretty much whatever one wanted.
24.
This is important because, within the social sciences, our current theories fail far more often than they
succeed. imagine what we might be able to accomplish if our economic policies worked twice as well as they
do? What about theories of management and psychology? Double the effectiveness and watch what happens
to organizational and mental health!
Thanks,
Steve
25.
Kaitlyn permalink
> The immediate lesson from the graph of Figure B is that and can tell us something
> about one another, given , if there is a path between and where the only collider
> is at . In fact, the same phenomenon can occur even in this graph:
In the example you gave about the music academy, and Berksons paradox, there should be another node in
the graph: that X gives information about Y if and only if X and Y have some other (external) connection. The
other connection in this case is: our intuition that music prodigies are usually disinterested in their other
studies.
So, you cannot proceed to the principle that when X > Z < Y, X gives information about Y, i.e. that the path
is unblocked. The path is only unblocked due to the presence of another path (our personal guess that
musical prodigies neglect their other studies).
26.
Kaitlyn permalink
> The immediate lesson from the graph of Figure B is that and can tell us something
> about one another, given , if there is a path between and where the only collider
> is at . In fact, the same phenomenon can occur even in this graph:
In the example you gave about the music academy, and Berksons paradox, there should be another node in
the graph: X gives information about Y if and only if X and Y have some other (external) connection. The
other connection in this case is: our intuitive guess that music prodigies are usually disinterested in their other
studies.
So, you cannot proceed to the principle that when X > Z Z < Y is blocked.