Sei sulla pagina 1di 39

Numerical Descriptive

Measures
Measures of Central Tendency

Mean, Median, Mode, Geometric Mean


Quartiles
Measures of Variation

Range, Interquartile Range, Variance and Standard Deviation,


Coefficient of Variation
Shape

Symmetric, Skewed
Using Box-and-Whisker Plots
Coefficient of Correlation
Pitfalls in Numerical Descriptive Measures and Ethical Issues

Summary Measures
Summary Measures

Central Tendency
Mean

Quartiles

Variation

Mode
Median

Range

Coefficient of
Variation

Variance

Geometric Mean

Standard Deviation

Introduction
Think of a sample portfolio composed of three

stocks.

200
shares
100 shares ARR =
ARR = 10% 15%

100 shares
ARR = 20%

A central measure for this portfolios ARR for is 15%.


Now observe the following portfolio
A central measure of this portfolios ARR for is 15% too.
200
shares
100
100 shares
shares ARR =
ARR
ARR == 5%
5% 15%

100 shares
ARR = 25%

Considering the average ARR only the two

portfolios are equal. But are they really?


Is the dispersion of ARR the same for the two
portfolio?
The dispersion (variability) is an important
property when describing a set of numbers, at
least as important as the central location.

Measures of Central
Tendency
Central Tendency

Mean

Median

Mode

X
i 1

i 1

Chap 3-5

Geometric Mean

X G X 1 X 2 L X n

2004 Prentice-Hall, Inc.

1/ n

Measures of Central Tendency


The central data point reflects the
locations of all the actual data points.
How?
With two data points,
the central location
With one data point
should fall in the middle
clearly the central
location is at the point between them (in order
to reflect the location of
itself.
both of them).

Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
If the third data point appears in the center
the measure of central location will remain
in the center, but (click)

Measures of Central
Tendency
The central data point reflects the
locations of all the actual data points.
How?
But if the third data point
appears on the left hand-side
of the midrange, it should pull
the central location to the left.

Measures of Central
Tendency
As more and more data points are added, the
central location moves (left and right) as required
in order to reflect the effects of all the points.

Mean (Arithmetic Mean)


Mean (Arithmetic Mean) of Data Values
Sample mean

Sample Size

X
i 1

Population mean
N

X
i 1

X1 X 2 L X n

n
Population Size

X1 X 2 L X N

Mean (Arithmetic Mean)


The Most Common Measure of Central Tendency
Affected by Extreme Values (Outliers)

0 1 2 3 4 5 6 7 8 9 10
Mean = 5

0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6

Median
Robust Measure of Central Tendency
Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10
Median = 5

0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5

In an Ordered Array, the Median is the Middle

Number
If n or N is odd, the median is the middle number
If n or N is even, the median is the average of the 2

middle numbers

Mode
A Measure of Central Tendency
Value that Occurs Most Often
Not Affected by Extreme Values
There May Not Be a Mode
There May Be Several Modes
Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9

0 1 2 3 4 5 6
No Mode

Geometric Mean
Useful in the Measure of Rate of Change of a

Variable Over Time

X G X 1 X 2 L X n

1/ n

Geometric Mean Rate of Return


Measures the status of an investment over time

RG 1 R1 1 R2 L 1 Rn

1/ n

Example
An investment of $100,000 declined to $50,000 at the
end of year one and rebounded back to $100,000 at end
of year two:

R1 0.5 (or 50%)

R2 1 (or 100% )

Average rate of return:


( 0.5) (1)
R
0.25 (or 25%)
2
Geometric rate of return:
RG 1 0.5 1 1
0.5 2

1/ 2

1/ 2

1 11/ 2 1 0 (or 0%)

Quartiles
Split Ordered Data into 4 Quarters

25%

25%

Q1

25%

Q2

25%

Q3

i n 1
Position of i-th Quartile Qi
4
Q and
1

Q3 are Measures of Non-central Location

Q2 = Median, a Measure of Central Tendency

Quartiles
The lower half of a data set is the set of all values that are

to the left of the median value when the data has been put
into increasing order.
The upper half of a data set is the set of all values that are
to the right of the median value when the data has been
put into increasing order.
The first quartile, denoted by Q1 , is the median of
the lower half of the data set. This means that about 25%
of the numbers in the data set lie below Q1 and about 75%
lie above Q1 .
The third quartile, denoted by Q3 , is the median of

the upper half of the data set. This means that about 75%
of the numbers in the data set lie below Q3 and about 25%
lie above Q3 .

Quartiles
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Median

1 9 1
Position of Q1
2.5
4

Q1

12 13

12.5

(17 18)
Q3 2 17.5

Measures of Variation
Measures of central location fail to tell the whole

story about the distribution.


A question of interest still remains unanswered:

How much are the values of a given set


spread out around the mean value?

19

Measures of Variation
Variation

Range

Variance

Interquartile
Range
Population
Variance
Sample
Variance

Standard
Deviation
Population
Standard
Deviation
Sample
Standard
Deviation

Coefficient
of Variation

Range
Measure of Variation
Difference between the Largest and the Smallest

Observations:

Range X Largest X Smallest


Ignores How Data are Distributed
Range = 12 - 7 = 5

Range = 12 - 7 = 5

Chap 3-21

10

11

12

10

11

12

2004 Prentice-Hall, Inc.

Interquartile Range
Measure of Variation
Also Known as Midspread
Spread in the middle 50%

Difference between the First and Third Quartiles

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

Interquartile Range Q3 Q1 17.5 12.5 5


Not Affected by Extreme Values

Variance
Important Measure of Variation
Shows Variation about the Mean
Sample Variance:
n

S2
Population Variance:

X
i 1

n 1

X
i 1

The Variance
Example
Find the variance of the following set of numbers,
representing annual rates of returns for a group of
mutual funds. Assume the set is (i) a sample, (ii) a
population: -2, 4, 5, 6.9, 10

Solution:

The Variance
Solution:

Assuming a sample

Standard Deviation
Most Important Measure of Variation
Shows Variation about the Mean
Has the Same Units as the Original Data
Sample Standard Deviation:

S
Population Standard Deviation:

X
i 1

n 1

X
i 1

Standard Deviation
Example

The daily percentage of defective items in two weeks


of production (10 working days) were calculated for
two production lines?
Which line provides good items more consistently?
Line 1: 8.3, 6.2, 20.9, 2.7, 33.6, 42.9, 24.4, 5.2, 3.1, 30.05
Line 2: 12.1, 2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, 1.3, 11.4

27

Standard Deviation
Solution:
Line 1:

28

Standard Deviation
Solution:
Line 2:

29

Standard Deviation
Line 1 should be considered less consistent
because the standard deviation of its defective
proportion is larger (i.e. therefore the standard
deviation of the good item proportion is also
larger).

30

Interpreting the Standard Deviation


The standard deviation can be used to
compare the variability of several distributions
make a statement about the general shape of a

distribution.

When describing the shape of a distribution we

refer to

A distribution with any shape


A mound shaped distribution

31

Standard Deviation
From a Frequency Distribution

(continued)

Approximating the Standard Deviation


Used when the raw data are not available and the

only source of data is a frequency distribution


c

m
j 1

X fj
2

n 1
n sample size
c number of classes in the frequency distribution
m j midpoint of the jth class
f j frequencies of the jth class

Comparing Standard
Deviations
Data A

11 12

Mean = 15.5
s = 3.338

13

14

15

16

17

18

19

20 21

Data B
Mean = 15.5

11 12

13

14

15

16

17

18

19

20 21

s = .9258

Data C
Mean = 15.5

11 12
Chap 3-33

13

14

15

16

17

18

19

20 21

s = 4.57

2004 Prentice-Hall, Inc.

Coefficient of Variation
Measure of Relative Variation
Always in Percentage (%)
Shows Variation Relative to the Mean
Used to Compare Two or More Sets of Data

Measured in Different Units

S
CV 100%
X

Sensitive to Outliers

Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $2

Stock B:
Average price last year = $100
Standard deviation = $5

Coefficient of Variation:
Stock A:
Stock B:

$2
S

CV 100%
100% 4%
X
$50

$5
S

CV 100%
100% 5%
X
$100

Shape of a Distribution
Describe How Data are Distributed
Measures of Shape
Symmetric or skewed

Left-Skewed

Symmetric

Mean < Median < Mode Mean = Median =Mode

Right-Skewed
Mode < Median < Mean

Exploratory Data Analysis


Box-and-Whisker Plot

Graphical display of data using 5-number summary

X smallest Q
1

Median( Q2)

Q3

10

Xlargest

12

Distribution Shape &


Box-and-Whisker Plot
Left-Skewed

Q1

Q2 Q3

Symmetric

Q1Q2Q3

Right-Skewed

Q1 Q2 Q3

The Empirical Rule


For Data Sets That Are Approximately Bell-

shaped:
Roughly 68% of the Observations Fall Within 1

Standard Deviation Around the Mean


Roughly 95% of the Observations Fall Within 2
Standard Deviations Around the Mean
Roughly 99.7% of the Observations Fall Within 3
Standard Deviations Around the Mean

Potrebbero piacerti anche