Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2. Phantom Populations
If there were to be a fatal flaw in an analysis, it would probably involve how well the samples
represent the population. Sometimes data analysts don’t give enough thought to the populations
they want to analyze. They use observations to make inferences to a population that doesn’t
exist. Populations must be based on some identifiable commonalities that would meaningfully
affect some characteristic. A group of anomalies would not be a population. Opinion polls
sometimes suffer from phantom populations. Say you surveyed people wearing red shirts. Could
you then generalize to everyone who wears red shirts? Canadian researchers found one such
phantom population when they tried to create a control group of men who had not been exposed
to pornography (http://www.telegraph.co.uk/relationships/6709646/All-men-watch-porn-
scientists-find.html). Make sure the population being analyzed is more than an illusion.
5. Indulging Variance
Most people don’t appreciate variance. They don’t even know it’s there
(http://statswithcats.wordpress.com/2010/08/01/there%E2%80%99s-something-about-variance/).
If their candidate for office is up by two percentage points in a poll, they figure the election is in
the bag. Even professionals like scientists, engineers, and doctors don’t want to deal with it.
They ignore it whenever they can and just address the average or most common case. Business
people talk about variances all the time, only they mean differences rather than statistical
dispersion. Baseball players thrive on variance. Where else can you have two failures out of
every three chances and still be considered a star? Data analysts have to understand variance and
address it at every step of a project. Look for how variance will be controlled in study plans
(http://statswithcats.wordpress.com/2010/09/05/the-heart-and-soul-of-variance-control/
http://statswithcats.wordpress.com/2010/09/19/it%E2%80%99s-all-in-the-technique/). Look for
variance to be reported with results. And most importantly, look for some assessment of how
uncertainty affects any decisions made from the analysis.
7. Torrents of Tests
If a statistical test is conducted in a study, false positives and false negatives can be controlled, or
at least, evaluated. But if there are many tests, you can bet there will be false results just because
of Mother Nature’s sense of humor. In groundwater testing, for example, there may be a test for
every combination of well, analytes, and sampling rounds, resulting in literally hundreds of tests.
There are strategies for dealing with this type of situation, such as hierarchical testing and the use
of special tests (look for the term Bonferroni). Be careful of bad decisions based on a small
proportion of the tests being (apparently) significant.
9. Extrapolation Intoxication
Make sure the data spans the parts of the variable scales about which you want to make
predictions. If a study collects test data at ambient indoor temperature, beware of predictions
made under freezing conditions. Likewise, be careful of tests on rabbits that are extrapolated to
humans, maps showing information beyond the limits observed, surveys of one demographic
extrapolated to another, and the like. Perhaps the only example of extrapolation that is even
grudgingly accepted by statisticians is time-series analysis
(http://statswithcats.wordpress.com/2010/08/15/time-is-on-my-side/). You have to extrapolate to
predict the future. The issue is how far into the future is reasonable, which will depend on the
degree of autocorrelation, the stability of the data, and the model.
http://statswithcats.wordpress.com/2010/11/07/ten-fatal-flaws-in-data-analysis/