Sei sulla pagina 1di 53

Descriptive

Statistics
BMS1024 MANAGERIAL STATISTICS

Key Concepts
The central location is the extent to which all

the data values group around a typical or


central value.

The variation is the amount of dispersion, or

scattering of values

The shape is the pattern of the distribution of

values from the lowest value to the highest


value.

BMS1024 Managerial Statistics

Numerical Descriptive
Measures for a Sample
SUMMARY MEASURES
Central
Location
Mean
Median
Mode

BMS1024 Managerial Statistics

Variation

Quartiles

Range
Interquartile Range
Variance
Standard deviation
Coefficient of Variance

Five-number summary consists


of the minimum value, Q1,
median, Q3 and maximum value.

Measures of Central
Location
SUMMARY
Central Location

Arithmetic
Mean

Median

Mode

X
i 1

BMS1024 Managerial Statistics

Middle value in
the ordered array

Most frequently
observed value

Measures of Central
Location
THE ARITHMETIC MEAN
The arithmetic mean (mean) is the most common

measure of central location


For a sample of size n:
n

X
Sample size
BMS1024 Managerial Statistics

X
i 1

X1 X 2 X n

n
Observed values
5

Measures of Central
Location
THE ARITHMETIC MEAN
The most common measure of central location
Mean = sum of values divided by the number of values
Affected by extreme values (outliers)

BMS1024 Managerial Statistics

EXAMPLE 1
The data given below are the 2008 profits (rounded to
billions of dollars) of 12 companies selected from all over
the world.
8 12 7 17 14 45 10
13 17 13 9 11
Compute the mean profit for these companies.
Solution:

12

8 12 11 176
X

14.67
12
12
12
i 1

The mean profit is $ 14.67 billion.


BMS1024 Managerial Statistics

Measures of Central
Location
THE MEDIAN
In an ordered array, the median is the middle number

(50% above, 50% below)


Not affected by extreme values

BMS1024 Managerial Statistics

Measures of Central
Location
THE MEDIAN
The median of an ordered set of data is located at the

(n+1)/2 ranked value.


If the number of values is odd, the median is the

middle number.
If the number of values is even, the median is the

average of the two middle numbers.


Note that (n+1)/2 is NOT the value of the median, only

the position of the median in the ranked data.


BMS1024 Managerial Statistics

EXAMPLE 2
Refer to same data as provided in the previous
example.
Compute the median profit for those companies.
Solution:

First, rank the values:


7, 8,
9,
10,
11,
12,
13,
14,
17,
17,
45

13,

The median is in between the 6th and 7th values (or at


the 6.5th position)

12 13
12.5
Hence, the median
2

The median profit is $ 12.5 billion.


BMS1024 Managerial Statistics

10

Measures of Central
Location
THE MODE
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes

BMS1024 Managerial Statistics

11

EXAMPLE 3
Refer to same data in the previous example.
Determine the mode for the data.
Solution:

First, rank the values:


7, 8,
9,
10,
11,
12,
13,
14,
17,
17,
45

13,

13 occurs twice. 17 also occurs twice.


The modes are $13 billion and $17 billion.

BMS1024 Managerial Statistics

12

Measures of Central
Location
Review Example
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Sum 3,000,000

Mean:

($3,000,000/5)
= $600,000
Median: middle value of ranked data
= $300,000
Mode: most frequent value
= $100,000
Learn how to compute mean using the
Mode SD function on your scientific
calculator!

BMS1024 Managerial Statistics

13

Measures of Central
Location
WHICH MEASURE TO CHOOSE?
The mean is generally used, unless extreme

values (outliers) exist.


Then median is often used, since the median

is not sensitive to extreme values. For


example, median home prices may be
reported for a region; it is less sensitive to
outliers.
BMS1024 Managerial Statistics

14

Quartile Measures
Quartiles split the ranked data into 4 segments with an

equal number of values per segment.


25%
Q1 = lower
quartile

25%
Q1

25%
Q2

25%
Q3

Q3 = upper
quartile

The first quartile, Q1, is the value for which 25% of the observations
are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are larger)
Only 25% of the values are greater than the third quartile

BMS1024 Managerial Statistics

15

Quartile Measures
LOCATING QUARTILES
Find a quartile by determining the value in the appropriate
position in the ranked data, where
First quartile position:
Second quartile position:
Third quartile position:

Q1 = (n+1)/4 ranked value


Q2 = (n+1)/2 ranked value
Q3 = 3(n+1)/4 ranked value

where n is the number of observed values


BMS1024 Managerial Statistics

16

Quartile Measures
GUIDELINES
Rule 1: If the result is a whole number, then the quartile is
equal to that ranked value.
Rule 2: If the result is a fraction half (2.5, 3.5, etc), then the
quartile is equal to the average of the corresponding ranked
values.
Rule 3: If the result is neither a whole number or a fractional
half, you round the result to the nearest integer and select that
ranked value.
BMS1024 Managerial Statistics

17

Quartile Measures
LOCATING THE FIRST QUARTILE
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
First, note that n = 9.
Q1 = is in the (9+1)/4 = 2.5 ranked value of the ranked data, so use
the value half way between the 2nd and 3rd ranked values,
so

Q1 = 12.5

Q1 and Q3 are measures of non-central location


Q2 = median, a measure of central location
BMS1024 Managerial Statistics

18

EXAMPLE 4
Refer to same data as in the previous example.
Find all quartiles and interpret their results.
Solution:

First, compute the position of each quartile value.


Then, find the quartile value in ranked data.
Quartile

Q1

Q2

Q3

Position

(12+1)/4
= 3.15 3

(12+1)/2
= 6.5

3(12+1)/4 =
9.45 9.5

12.5

15.5

Value

BMS1024 Managerial Statistics

19

EXAMPLE 4 Continued
Q1 = 9. This result indicate that 25% of the companies
have profits equal to or lesser than $ 9 billion. Also, 75%
of the companies have profits equal to or greater than $
9 billion.
Q2 = 12.5. This result indicate that 50% of the
companies have less than $ 12.5 billion profit. Also, 50%
of the companies have more than $ 12.5 billion profit.
Q3 = 15.5. This result indicate that 75% of the
companies have less than $ 15.5 billion profit. Also, 25%
of the companies have more than $ 15.5 billion profit.

BMS1024 Managerial Statistics

20

Measures of Variation/
Dispersion/ Variability
Variation measures the spread, or dispersion, of
values in a data set:
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation

BMS1024 Managerial Statistics

Small
dispersion
Large
dispersion

21

Measures of Variation
RANGE

Simplest measure of variation


Difference between the largest and the smallest

values:

Range = Xlargest Xsmallest


Example:

BMS1024 Managerial Statistics

22

Measures of Variation
RANGE

Disadvantages of the range:

Ignores the way in which data are distributed

Sensitive to outliers

BMS1024 Managerial Statistics

23

Measures of Variation
INTERQUARTILE RANGE

Problems caused by outliers can be eliminated by

using the interquartile range (IQR).


The IQR can eliminate some high and low values and

calculate the range from the remaining values.


Interquartile Range = 3rd quartile 1st quartile

= Q3 Q1
BMS1024 Managerial Statistics

24

Measures of Variation
INTERQUARTILE RANGE

Example of a five-number summary:

IQR isnt affected by the


outliers because it shows
only the dispersion in the
middle 50% of the data!
BMS1024 Managerial Statistics

25

EXAMPLE 5
Refer to the following data: (continuation of previous
examples)
8 12 7 17 14 45 10
13 17 13 9 11
Compute the range and interquartile range.
Solution:

Range 45 7 38
IQR Q3 Q1 15.5 9 6.5

BMS1024 Managerial Statistics

26

Outlier Detection
For the data 2, 5, 6, 9, 12, we have the following five-number summary:
Xminimum = 2; Q1 = 3.5;

Median = 6; Q3 = 10.5; Xmaximum = 12

Range = 12-2 = 10
IQR = Q3 Q1 = 10.5 3.5 = 7
To determine if there are outliers we must consider the numbers that

are 1.5*IQR beyond the quartiles.

Below lower quartile Q1 1.5*IQR = 3.5 10.5 = 7


Above upper quartile Q3 + 1.5*IQR = 10.5 + 10.5 = 21
Since none of the data are outside the interval from 7 to 21, there are

no outliers.

BMS1024 Managerial Statistics

27

EXERCISE
Refer to the following data: (continuation of previous
exercises and examples)
8 12 7 17 14 45 10
13 17 13 9 11
Is there any outlier in the data? If yes, remove that outlier
and find the summary measures for the new data.

BMS1024 Managerial Statistics

28

Measures of Variation
VARIANCE

The variance is the average


(approximately) of squared
deviations of values from the
mean.
where

x = arithmetic mean
n = sample size
Xi = ith value of the variable X
X 2i = ith squared value of the variable X
BMS1024 Managerial Statistics

29

Measures of Variation
STANDARD DEVIATION

Most commonly used measure of variation


Shows variation about the mean
Has the same units as the original data
n

Sample standard deviation:

BMS1024 Managerial Statistics

i 1

n 1

n(n 1 )
i 1

30

Measures of Variation
STANDARD DEVIATION

Sample Data (xi) :


10

12

n8

14

15

17

18

18

24

Mean X 16

10

12 2 ... 24 2
10 12 ... 24

8 1
8 8 1

2178 128

7
8 7

4.3095

A measure of the average


scatter around the mean

Learn how to compute standard deviation using the Mode SD function


on your scientific calculator!

BMS1024 Managerial Statistics

31

EXERCISE
Refer to the following data: (continuation of previous
exercise and examples)
8 12 7 17 14 45 10
13 17 13 9 11
(i)

Compute the variance and standard deviation.

(ii) Is there any difference in standard deviation when

an outlier is removed?

BMS1024 Managerial Statistics

32

Measures of Variation
STANDARD DEVIATION

Comparing standard deviation

BMS1024 Managerial Statistics

33

Measures of Variation
STANDARD DEVIATION

BMS1024 Managerial Statistics

34

Measures of Variation
COEFFICIENT OF VARIATION (CV)

The coefficient of variation is the standard deviation

divided by the mean, multiplied by 100.


It is always expressed as a percentage (%).
It shows variation relative to mean.
It is a useful measure when:
data are measured in different units.
data are in the same units but the means are far apart.

s
CV
X

BMS1024 Managerial Statistics

100%

35

Measures of Variation
COEFFICIENT OF VARIATION (CV)

STOCK A
Average price last year =

$50
Standard deviation = $5

CV A

5
100% 10%
50

STOCK B
Average price last year =

$100
CV
B

5
100% 5%
100

Standard deviation = $5

Relative to the mean, prices in stock B is less


variable compared to prices in stock A.

BMS1024 Managerial Statistics

36

Exercise
The variation in the annual incomes of executives at
Nash Rambler Products, Inc is to be compared with the
variation in incomes of unskilled employees.
The observation from a sample of executives,
RM500,000 and s = RM50000.
For a sample of unskilled employees,
and s = RM3,200.

X =

= RM32000

Compare the relative dispersion in these two


distributions using coefficient of variation.

BMS1024 Managerial Statistics

37

Measures of Variation
SUMMARY CHARACTERISTICS:
The more the data are spread out, the greater the range,

interquartile range, variance, and standard deviation.


The more the data are concentrated, the smaller the

range, interquartile range, variance, and standard


deviation.
If the values are all the same (no variation), all these

measures will be zero.


None of these measures are ever negative.
BMS1024 Managerial Statistics

38

Shape of a Distribution
Describes how data are distributed
Measures of shape: Symmetric or skewed

BMS1024 Managerial Statistics

39

Pearsons Coefficient of
Skewness
It is used to measure the degree of skewness of a

distribution.

Sk = 0 indicates a symmetrical distribution


Sk = +ve indicates a right-skewed distribution
Sk = -ve indicates a left-skewed distribution
Approximation Formula

BMS1024 Managerial Statistics

3( X Median)
Sk
s
40

EXAMPLE 6
Refer to the following data: (continuation of previous
examples)
8 12 7 17 14 45 10
13 17 13 9 11
Compute the Pearsons coefficient of skewness and
describe the shape of the distribution.
Solution:

3( X Median) 314.67 12.5


Sk

0.6467
s
10.066
The distribution of the data is slightly skewed to the right.
BMS1024 Managerial Statistics

41

General Descriptive
Stats Using Microsoft
Excel
Add-in Data
Analysis into your
Excel.
Go to Developer >>
Add-ins >> Select
Analysis ToolPak >>
Click OK button.
BMS1024 Managerial Statistics

42

General Descriptive
Stats Using Microsoft
Excel
1. Enter data.

2. Select Data.
3. Select Data
Analysis.
4. Select
Descriptive
Statistics and
click OK.
BMS1024 Managerial Statistics

43

General Descriptive
Stats Using Microsoft
Excel
5. Enter the cell
range.
6. Check the
Summary
Statistics box.
7. Click OK.

BMS1024 Managerial Statistics

44

General Descriptive
Stats Using Microsoft
Excel
Microsoft Excel
descriptive statistics
output, using the house
price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
BMS1024 Managerial Statistics

45

Example: Simple
Interpretation
The average house prices is $ 600,000.
50 percent of the house prices are $300,000 and

below. 50 percent of the house prices are $300,000


and above.
The most houses are priced at $100,000.
The standard deviation is $800,000.
The range of house prices is $190,000.
The distribution of house prices is skewed to the right.
BMS1024 Managerial Statistics

46

Numerical Descriptive
Measures for a Population
Descriptive statistics discussed previously

described a sample, not the population.


Summary measures describing a population,

called parameters, are denoted with Greek


letters.
Important population parameters are the

population mean, variance, and standard


deviation.

BMS1024 Managerial Statistics

47

Population Mean
The population mean is the sum of the values in
the population divided by the population size, N.
N

where

X
i 1

X1 X 2 X N

= population mean
N = population size
Xi = ith value of the variable X

BMS1024 Managerial Statistics

48

Population Variance
The population variance is the average of
squared deviations of values from the mean.
N

2
where

2
(
X

)
i
i 1

= population mean

Population
variance is seldom
used because we
usually rely on
sample data, not
population data!

N = population size
Xi = ith value of the variable X
BMS1024 Managerial Statistics

49

Population Standard
Deviation
The population standard deviation is the most

commonly used measure of variation.


It has the same units as the original data.
N

where

2
(
X

)
i
i 1

= population mean
N = population size

Population
standard deviation
is seldom used
because we
usually rely on
sample data, not
population data!

Xi = ith value of the variable X


BMS1024 Managerial Statistics

50

Sample Statistics
versus Population
Parameters

BMS1024 Managerial Statistics

51

CLASS EXERCISE 1
The following data is the amount that a sample of 10
customers spent for lunch ($) at a fast-food restaurant:
7.42

6.29

5.90

4.89

(i)

5.83

6.50

8.34

9.51

7.10

6.80

List the five-number summary.

(ii) Compute the mean, mode, standard deviation, Q 1

and Q3.
(iii) Is the data skewed? If so, how?
(iv) Based on your results of (i) to (iii), what conclusions

can you reach concerning the amount that


customers spent for lunch?
BMS1024 Managerial Statistics

52

At the end of this


lesson, you should be
able to:

Compute the measures of central location

and variation for ungrouped data


Interpret the measures of central location

and variation
Describe the shape of a distribution

BMS1024 Managerial Statistics

53

Potrebbero piacerti anche