Sei sulla pagina 1di 4

Inferential Statistics

By Ryan Gardiner
Calculicious Capstone
Ms. Carlee Hollenbeck
12th Grade Calculus

Definition

History

Inferential Statistics is the process of


inferring a trend or pattern from a sample of
data. Using the probability of certain things
occurring in a sample you can project that
to an entire population. It also looks at the
correlation between independent and
dependent variable. A major component of
this is using hypothesis testing to identify
confidence intervals and margin of error.

Statistics as a whole was founded in the 16th


century in an attempt to justify state policy
based on demographic and economic
statistics. It wasn't until several hundred years
later during the 19th century that the field was
broader to include the entirety or data
collection and analysis. Modern statistics,
including inferential statistics, came in three
dierent waves, The first wave which came
around 1900 developed the ideas of
re g re s s i o n , s t a n d a rd d e v i a t i o n , a n d
regression analysis. The second wave, which
came from 1910-1920 focused on variance
a n d t h e d e v e l o p m e n t o f p re v i o u s l y
established principles. The third and final
wave introduced "type 2" error and
confidence intervals. Today, a lot of statistics
has been dominated by the use of modern
computers.

Population
Sample

Reasons for Study


Inferential Statistics are used for in a wide
variety of research applications. Anytime that
you have a population and a mathematically
attune sample you can use this concept to
draw conclusions. Examples of Inferential
Statistics being used include politics, in
which a sample of a population is polled
and a result is then inferred. Another
potential use is determining the probability of
success of an individual within a larger
group. For example, a student in after
school care.

Background Information
Null Hypothesis: Represented as H, is the
hypothesis that a study is attempting to
disprove or nullify. Typically, H represents a
lack of a relationship or pattern in the data.
Alternative Hypothesis: The contrasting
hypothesis to H. Typically represents a
change or relationship in the data.
Normal Distribution: Also known as the Bell
Curve is the standard and commonly
occurring statistical patten in data.

Standard Deviation: Represented as , is


the average to the dierences of each data
point from the mean squared. This is a
measurement of how spread out the data is.
In order to find, you must subtract the mean
from each data point, square the result, and
find the average of all of those data points.
Variance: The square of the standard
deviation. This is a measurement of how
spread out the data is. When is close to
zero all of the data is very similar, when is
larger there is much more variation.
Empirical Rule: States that 68% percent of
the data will fall within the first standard
deviation. 95% of the data would fall within
two standard deviations, and 99.7% or the
data will fall within three standard deviations.
Z-Score: A measurement that represents
how many standard deviations a value is
from the mean. They could range anywhere
from -3 standard deviations to 3 standard
deviations (based on the empirical rule). To
calculate a z-score you must select a value,
subtract the mean from it, and then divide
by the standard deviation.

Central Limit Theorem

Standard Error

The Central Limit Theorem states that when


a certain amount of random variables are
generated, the results will form a normal
distribution. That occurrence will become
more likely the larger the sample size is. The
data that is being inputed could be the
sample mean or individual data points.
Either would still distribute normally.

The standard error of the mean is essentially


saying the standard deviation of the sampling
distribution of the sample mean. So lets break
that down. The sample mean is the average
of all the values you have sampled. The
sampling distribution is the distribution for
which all the samples means will fall under
(you conduct dierent samples and then plot
the distribution of those samples. Then you
can calculate the standard deviation of all of
the data you plotted. To calculate you must
divide the standard deviation of the population
by the square root of the sample size. That
will give you the standard error.

Skew
Skew is a measurement of how normal a
distribution is. When looking at the shape of
a bell curve, a positive skew would mean
the right end of the curve is longer. A
negative skew would mean the tail end of
the curve is longer on the left. A skew of
zero would mean the distribution is perfect
and symmetrical.

Positive Skew

Perfect Distribution

Negative Skew

Kurtosis
Kurtosis is also a measurement of how
normal distribution is. However instead of
looking at the distribution on the x-axis it is
looking at the y-axis.

Negative Kurtosis

Perfect Distribution

Positive Kurtosis

Margin of Error
The margin of error represents the possible
discrepancy between the results and analysis
of a sample to the population as a whole. In
order to calculate the margin of error you must
know the standard error and the critical value
(either the z-score or t-score). All you need to
do is multiply the critical value times the
standard error.

T-Scores
There are some cases in hypothesis testing
in which you will not be able to use a zscore. If your sample size is small (generally
less than 30) or you do not have a standard
deviation, then you should use a t-score.
Both t-scores and z-scores are types of

critical values. They represent the same


thing but are represented in dierent types
of standard forms. In order to do this you
must calculate alpha = 1 - (confidence
level / 100), critical probability = 1 - (alpha /
2), and degrees of freedom = sample size 1. Using that information you can reference
a t-distribution chart or a t-distribution
calculator to find the result. T-scores and zscores are interchangeable in calculations.

Confidence Intervals
Confidence intervals are used to describe
the uncertainty of a statistical parameter
(mean, standard deviation, variance, or
other data measurement) or a given sample.
It is worth noting that this is not the
confidence in a parameter itself, statistical
parameters are constant (there is a 0% or
100% chance a value is the mean of a
sample). Confidence intervals describe the
uncertainty on inferring a conclusion based
on a statistical parameter. In order to
calculate a confidence interval you must
know the confidence level, the statistic, and
the margin of error. Confidence level is the
uncertainty of a sample method (usually
90%, 95%, or 100%) you chose which one
you want to calculate with. If you have all of
that information, you can calculate the
confidence interval adding or subtracting the
margin of error from the sample statistic.

Example: Margin of Error and Confidence Interval


A study is taken looking at the average hair length among adult women living in Statsville, California. The sample size is 1,000 women
from a population of 1,000,000 women. The mean length is 300 mm and the standard deviation of the study is 45.

1. First we must identify the statistic we are calculating for. In this study we are finding the margin of error and confidence interval of the

mean, 300mm. We also need to chose our confidence level. In this case we will choose 95%.
2. Next we will calculate standard error. In order to calculate this we must divide the standard deviation by the square root of the sample
size. Standard Error = 45 / sqrt(1000) = 45 / 31.62 = 1.42
3. After calculating standard error, we must find the critical value. Because this is a larger study we can reference a z-score chart to find
the z-score. Given the values of this study, the result will be 1.96.
4. Then we can multiply standard error times critical value (1.42 * 1.96) to give us the margin of error: 2.78.
5. With a margin of error, a confidence level, and a statistic we can express the result of this survey as a confidence interval: with 95%
confidence we can say hair length in the population is 300 mm plus or minus 2.78 mm.

Example: Calculating a t-score


A poll is taken trying to identify the approval of the mayor of Mathland, Arizona. The sample size is 30 registered voters from a
population of 200 total registered voters. (Mathland is not a very popular place). 21 respondents said they support the job the mayor is
doing.

1. Because this is a smaller study and there is no standard deviation, we must calculate a t-score. In order to find the t-score we calculate
alpha = 1 - confidence interval = 1 - 0.95 = 0.05. Then we calculate the critical probability. Critical probability = 1 - alpha/2 = 1 - 0.05/2 =
0.975. Then we must find the degrees of freedom by calculating the sample size - 1 = 30 - 1 = 29. You can use a t-distribution table to
determine the t-value with the information we just calculated. In this case the result is 0.53. The result of that can later be used to find a
confidence interval and margin of error for this poll.

Citations
All of the written explanations and examples are the original work of Ryan Gardiner at High Tech High North County, 2015, under the
supervision of Carlee Hollenbeck.
All graphics are also the original work of Ryan Gardiner (High Tech High North County, 2015)
Bibliography
The information presented in this article was a result of research conducted in large part from:
Kahn Academy - Kahn Academy, Inc. www.khanacademy.org
StatTrek - Authored primarily by Harvey Berman at the Georgia Institute of Technology. stattrek.com
Research Methods Knowledge Base - Authored Primarily by William M.K. Trochim socialresearchmethods.net

Potrebbero piacerti anche