Sei sulla pagina 1di 20

Descriptive Statistics

Week 1
Measures of Central Tendency

Measures of central tendency are measures of


the location
x of the middle or the center of a
distribution. The definition of "middle" or
"center" is purposely left somewhat vague so
that the term "central tendency" can refer to a
wide variety of measures. The mean is the
most commonly used measure of central
tendency.
Measures of Central Tendency
(Mean)
Mean
Arithmetic average = (sum all values)/# of values
Population: µ = (∑xi)/N
Sample: x = (∑ xi)/n
Be sure you know how to get the values easily
from your calculator and computer softwares.
Measures of Central Tendency
(Weighted Mean)
When what you have is grouped data, compute
the mean using µ = (∑wixi)/∑wi

Problem: Calculate the average profit from truck shipments,


United States to Canada, for the following data given in thousands
of bags and profits per thousand bags:
Montreal 64.0 Ottawa 15.0 Toronto 285.0
$15.00 $13.50 $15.50
Vancouver 228.0 Winnipeg 45.0
$12.00 $14.00
(Answer: $14.04 per thous. bags)
Measures of Central Tendency
(Median)
To find the median:
1. Put the data in an array.
2A. If the data set has an ODD number of numbers, the median is the
middle value.
2B. If the data set has an EVEN number of numbers, the median is the
AVERAGE of the middle two values.
(Note that the median of an even set of data values is not necessarily
a member of the set of values.)

The median is particularly useful if there are outliers in the data


set, which otherwise tend to sway the value of an arithmetic
mean.
Measures of Central Tendency
(Mode)
The mode is the most frequent value.
While there is just one value for the mean
and one value for the median, there may
be more than one value for the mode of a
data set. The mode tends to be less
frequently used than the mean or the
median.
Comparing Measures of
Central Tendency

If Mean = Median = Mode, the shape of the


distribution is Symmetric.
If Mode < Median < Mean, the shape of the
distribution trails to the right, is positively skewed.
If Mode > Median > Mean, the shape of the
distribution trails to the left, is negatively skewed.
Range

The range is the distance between the smallest


and the largest data value in the set.
Range = largest value – smallest value
Sometimes range is reported as an interval,
anchored between the smallest and largest data
value, rather than the actual width of that
interval.
Key Concept - Residuals

Residuals are the differences between


each data value in the set and the group
mean:
for a population, xi – µ
for a sample, xi – x
Mean Absolute Deviation

The Mean Absolute Deviation, MAD is found by


summing the absolute values of all residuals and
dividing by the number of values in the set:
for a population, MAD = (∑|xi – µ|)/N
for a sample, MAD = (∑ |xi – x |)/n
Variance

Variance is one of the most frequently used measures of


spread,
(x –)2 (x )2 – N 2
for population,  2  i  i
N N
(x – x )2 (x )2 – nx 2
for sample, s2  i  i
n –1 n–1

The right side of each equation is often used as a


computational shortcut.
Standard Deviation

Since variance is given in squared units, we often


find uses for the standard deviation, which is the
square root of variance:
for a population,   2

for a sample, s  s2

Be sure you know how to get the values easily from your
calculator and computer softwares.
Coefficient of Variation

The coefficient of variation (CV) expresses the


standard deviation as a percent of the mean,
indicating the relative amount of dispersion in the
data.
CV   100%
Quartiles

One of the most frequently used quantiles is the quartile.


Quartiles divide the values of a data set into four subsets of equal
size, each comprising 25% of the observations.
To find the first, second, and third quartiles:
1. Arrange the N data values into an array.
2. First quartile, Q1 = data value at position (N + 1)/4
3. Second quartile, Q2 = data value at position 2(N + 1)/4
4. Third quartile, Q3 = data value at position 3(N + 1)/4
Shape of a Distribution

Measures of Shape:
Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean

Week 1-15
Interpreting the Standard Deviation

How many observations fit within + n s of the


mean?
Interval Chebyshev’s Empirical
Rule Rule
No useful info Approximately 68%
 1s or  1

 2 s or  2 At least 75% Approximately 95%

 3s or  3 At least 8/9 Approximately


99.7%

Week 1-16
Interpreting the Standard Deviation

Empirical Rule

Week 1-17
Interpreting the Standard Deviation

You have purchased compact fluorescent light bulbs for your home. Average life
length is 500 hours, standard deviation is 24, and frequency distribution for the life
length is mound shaped. One of your bulbs burns out at 450 hours. Would you send
the bulb back for a refund?

Interval Range % of observations % of observations


included excluded
Approximately 68% Approximately 32%
 1s 476 - 524

Approximately 95% Approximately 5%


 2s 452 - 548

Approximately Approximately
 3s 428 - 572 99.7% 0.3%

Week 1-18
Kurtosis

Kurtosis Definition: In probability theory and


statistics, kurtosis (from the Greek word
kurtos, meaning bulging) is a measure of the
"peakedness" of the probability distribution of
a real-valued random variable. Higher kurtosis
means more of the variance is due to infrequent
extreme deviations, as opposed to frequent
modestly-sized deviations.

Week 1-19
SPSS Workshop

Part 3: Summarizing Data


(Descriptive Statistics)

Week 1-20

Potrebbero piacerti anche