Sei sulla pagina 1di 15

Chapter 3

Numerical Descriptive Measures


Last (Family) Name: __________________________________.
First (Given) Name: __________________________________
INTRODUCTION

FROM PREVIOUS CHAPTERS:


A parameter is a numerical measure that describes a characteristic of a population.
A statistic is a numerical measure that describes a characteristic of a population sample.
What are the following?
According to a Pew survey, twenty one percent of Americans believe that
President Obama is a Muslim.
The average age of children in our family is 17.
First time unemployment claims were down by 5% in the last quarter.
WHY NUMERICAL DESCRIPTIVE MEASURES?
Charts, frequency distributions, and cross-tabs reduce the detail in the data in ways that make it
easier to see key patterns in the data.
Numerical descriptive measures reduce detail even further; they say something important about a
variable with a single number.
The average age of our employees is 34.
Nearly everyone has more than the average number of legs.
More than 80% of our customers live within three miles of our store.

Page 1

TYPES

OF DESCRIPTIVE MEASURES:

Central tendency describes the extent to which all of the data values group around a
typical center value.

Variation describes the amount of dispersion, or scattering, of values away from a central
value.

Shape is the pattern of the distribution of values from the lowest value to the highest
value.

Page 2

3.1

MEASURES OF CENTRAL
TENDENCY

THE MEDIAN

Where is the center of the distribution?

The middle value in ordered data.

Is the measure of central tendency a


meaningful typical value?

Ranked values (odd number)


Data: 12, 18, 27, 42, 150.
Median = 27

THE MEAN

Ranked values (even number)

The mean is what most people call the


average

Data: 4, 8, 15, 16, 23, 42

Mean = X = sum of the values / number of


values
Values:

1, 3, 5, 7, 9

Sum:

25

Number:

Tie for middle value; take the


average of 15 and 16.
Median = 15.5
Works with numerical data
Not sensitive to outliers.

X = 25/5 = 5

Also works with ordered categorical data.

Only works with numerical data.

Rates self as above average: 42

A few outliers can throw off the mean

Rates self as average:

19

Values:

1, 3, 5, 7, 30

Rates self as below average:

Sum:

46

Number:

Median is Rates self as above


average.

X = 46/5 = 9.2; yet no value is even


close to this.

THE MODE

The category with the largest frequency:

Data do not have to be ordered:


Values:

1, 9, 5, 7, 3

Sum:

25

Number:

California

35

New York

Nebraska

72

The mode is Nebraska

X = 25/5 = 5

Works with numerical and categorical data

Page 3

3.2: VARIATION AND SHAPE


THE RANGE

The range is the largest value minus the smallest value.


THE VARIANCE AND THE STANDARD DEVIATION

Measure the average scatter around the mean.


You do not have to know how to compute them.
However, you do need to know:
That the standard deviation (S) and the variance (V) measure the same thing.
That the variance is the square of the standard deviation.
THE COEFFICIENT OF VARIATION

A relative measure of scatter compared to the size of the mean.


CV = (S/X) x 100%
Good for comparing things that have different units of measurement, like the weight and volume
differences among packing crates.
Z SCORES

The Z score is a measure of how far an outlier is away from the mean.
Z = (X-X)/S

Page 4

3.3

NUMERICAL DESRIPTIVE MEASURES OF A POPULATION

Means, standard deviations, and variances for the entire population, not just a sample
Sample

Population

Mean

(mu)

Standard Deviation

(lower-case sigma)

Variance

S2

Note: There are slight differences in how sample and population standard deviations and
variances are measured, but you can ignore them.
THE EMPIRICAL RULE AND THE CHEBYSHEV RULE
Percentage of Values Within the Interval
Normal Bell-Shaped Distribution
(Empirical Rule)

Any Distribution
(Chebyshev Rule)

Mean +/- one

Approximately 68%

NA

Mean +/- two s

Approximately 95%

At least 75%

Mean +/- three s

Approximately 99.7%

At least 88.89%

Typically, a value more than two standard deviations from the mean is viewed as an outlier.

Page 5

COMPUTING KEY STATISTICS IN EXCEL

Page 6

3.4

QUARTILES AND THE BOX PLOT

QUARTILES

A crude but common way of expressing ranges


Process
Order the data
2, 3, 4, 5, 6, 7, 8, 12
Divide it into four parts
2, 3, 4, 5, 6, 7, 8, 12
The lowest 25% of the data values form the first quartileQ1 (2,3)
The next 25% of the data values from the second quartileQ2 (4,5)
The next 25% of the data values from the third quartileQ3 (6,7)
The highest 25% of the data values form the fourth quartileQ4 (8,12)
EXAMPLES

In General
Q1 = (N+1)/4 ranked value
Q2 = 2(N+1)/4 ranked value
Q3 = 3(N+1)/4 ranked value
Example
Times in Minutes:
29, 31, 35, 39, 39, 40, 43 44, 46, 52
10 Values
Q1 = (10+1)/4 = 2.75th value
This is 35 (the third value)
Q2 = 2*(10+1)/4 = 5.5th value
The 5th value is 39
The 6th value is 40
Choose 39.5
Q3 = 3*(10+1)/4 = 8.25th value
Page 7

This is 44 (the 8th value)


THE INTERQUARTILE RANGE

The difference between the third and first quartile.


The middle 50 percent of the data.
Times in Minutes:
29, 31, 35, 39, 39, 40, 43 44, 44, 52
10 Values
Q1 = (10+1)/4 = 2.25th value
This is 35 (the third value)
Q3 = 3*(10+1)/4 = 8.25th value
This is 44 (the 8th value)
Interquartile range = 44-33 = 9 minutes
THE FIVE-NUMBER SUMMARY
THE BOXPLOT

Page 8

3.5

CORRELATION

RELATIONSHIPS

Relationships describe how two variables behave relative to one another.


POSITIVE RELATIONSHIP
As one variable increases, the other variable
also increases or
As one variable decreases, the other variable
also decreases.

NEGATIVE RELATIONSHIP
As one variable increases, the other variable
decreases or
As one variable decreases, the other variable
increases.

NO RELATIONSHIP
As one variable changes, this has no impact on
the other variable.

Page 9

THE COVARIANCE
THE COEFFICIENT OF CORRELATION CORRELATION COEFFICIENT

The correlation coefficient r (Pearsons r)


Is a number that describes the strength of the relationship between two variables.
The Excel function is CORREL(data series 1, data series 2)
Interpreting the correlation coefficient
1.0 is perfect correlation (rare)
0 is no correlation
-1.0 is perfect negative correlation (rare)

Page
10

3.6

PRESENTING DESCRIPTIVE STATISTICS: PITFALLS AND ETHICAL


ISSUES

You need to summarize data to understand it.


Single numerical measures can be very powerful.
However, they may summarize too much and lose important specifics.
Often, when you read a report, you are only given one or two measures.
This may leave you unable to interpret the results meaningfully.
Example: cannot tell if the mean and median are different, which would indicate a
skewed distribution.
Example: Cannot tell if the distribution is bimodal, so that the mean and median are
values for which there are no items in the distribution.

You must personally not be deceptive in what your report.


Giving a single number that is not characteristic of the distribution.
Giving single numbers that rely on characteristics your distribution does not have.
You must report the results fairly, not withholding information that detracts from the point you
are trying to make.

Page
11

HOMEWORK

1. CREATE A SCATTER PLOT FOR ADVERTISING PER CAPITAL


CAPITA USING THE FOLLOWING DATA.
To do this, select the two data series and not the header row.
Then go to the Insert tab and select Scatter Chart.
Make Sales Per Capital the Vertical Title
Make Advertising Per Capita the Horizontal Title

Region
Honolulu
Orlando
Omaha
Ventura
Dallas
Chicago
Nashvill
e
Boston
San
Francisc
o
Boise

Advertisin
g pc
$10
$6
$3
$2
$4
$1

Sales
pc
$153
$135
$116
$115
$126
$114

$7
$9

$140
$153

$5
$8

$127
$143

Paste your chart onto the following line:


*
Compute Pearsons r
The value you computed: ______________________

Page
12

AND

SALES PER

2. CREATE

SCATTER CHART

FOR

PRICE

AND

UNITS SOLD

DATA.

Use directions from Problem 1 but with these column names.


The table is for a sample.

Region
San
Francisco

Price
5

Dallas

Omaha

Orlando

Chicago

Ventura

Boston

Boise

Honolulu

10

Nashville

Units
Sold
21.73195
57
32.17653
48
35.63134
42
21.65274
18
29.78496
33
36.48239
21
25.11031
28
25.68419
31
12.00501
18
30.79009
04

Chart:
*

Correlation Coefficient
Answer:

Page
13

USING THE FOLLOWING

3. CREATE

SCATTER CHART

USING THE FOLLOWING

SALES
DATA.
FOR

PER

CAPITA

AND INCOME

Use directions from Problem 1 but with these column names.


The table is for a population.

Region
Honolulu
Orlando
Omaha
Ventura
Dallas
Chicago
Nashvill
e
Boston
San
Francisc
o
Boise

Sales
pc
$56
$48
$3
$54
$34
$34

Income
pc
$46,079
$26,355
$83,108
$23,164
$74,177
$34,859

$12
$74

$16,414
$92,937

$44
$100

$31,164
$89,303

Chart:
*

Correlation Coefficient
Answer:

Page
14

PER CAPITA

4. FOR SALES

PER

CAPITA,

COMPUTE THE FOLLOWING STATISTICS.

The table is for a population.


City

Sales per
Capita

Chicago
Ventura
Omaha
Dallas
San
Francisco
Orlando
Nashville
Boise
Boston
Honolulu
Cheyenne
Raleigh
Detroit
Denver
Seattle
Bismark
New York
Trenton
Topeka
New Orleans

$145.66
$127.27
$145.41
$120.69
$151.42
$120.92
$149.06
$156.05
$125.91
$121.10
$165.61
$138.08
$112.99
$140.55
$157.04
$140.40
$161.95
$109.60
$139.17
$163.87

Mean: ______
Median: _____
Standard Deviation: _______
Variance: __________
Z value for Trenton; ________
Z value for Bismark: _________

Page
15

Potrebbero piacerti anche