Sei sulla pagina 1di 49

Central Tendency and Variability

The two most essential features of a


distribution

Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency
Mean

Variation

Shape

Median

Range
Variance

Mode

Standard Deviation

Skew

Variables have distributions


A variable is something that changes or
has different values (e.g., anger).
A distribution is a collection of
measures, usually across people.
Distributions of numbers can be
summarized with numbers (called
statistics or parameters).

Central Tendency refers to the


Middle of the Distribution

Variability is about the Spread

Mean
Sum of scores divided by the number of
people. Population mean is (mu)
and sample mean is X (X-bar).
We calculate the sample mean by:
Arit
Geo

X
N

X n

X n

FX

Ungrouped Data
Number of a family
children in Sleman
No of Child

Frequency

20

15

The height (to the nearest mm) of


each of a number of seedlings
31
36
40
46

33

33

31

17

20

46

39

29

38

34

37

Grouped Data
Example
The heights (in cm) of a group of
students are summarized below. Draw a
histogram and polygon to illustrate
these data

Mean

1. Measure of Central Tendency


2. Most Common Measure
3. Acts as Balance Point
4. Affected by Extreme Values
(Outliers)
5. Formula (Sample Mean)
n

i 1

X
n

Deviation from the mean


x = X X . Deviations sum to zero.
Deviation score deviation from the
mean
9
Raw scores
8 9 10
7

10 11

-1
-1

0
0
0

1
1

Deviation scores

-2

Median
Score that separates top 50% from bottom 50%
Ungrouped Data
Even number of scores, median is half way between two
middle scores.
Letak Med1= n /2
Letak Med2 = (n+2)/2
Med = (Med1+Med2)/2
1 4 6 8 9 10 17 18 Median is (8+9)/2 = 8.5

Odd number of scores, median is the middle number


Letak Med = (n + 1)/2
1 4 6 8 9 10 17 Median is 8

Median
1. Measure of Central Tendency
2. Middle Value In Ordered Sequence
If Odd n, Middle Value of Sequence
If Even n, Average of 2 Middle Values

3. Position of Median in Sequence

Positioning Point

n 1
2

4. Not Affected by Extreme Values

Median Example
Odd-Sized Sample

Raw Data: 24.1 22.6 21.5 23.7


22.6
Ordered:

21.5

22.6

22.6

23.7

24.1

Position:

Positioning
Median = 22.6

Point

n +1
2

5 +1
2

Median Example
Even-Sized Sample

Raw Data: 10.3 4.9 8.9 11.7 6.3


7.7
Ordered:
Position:

4.9
1

Positioning
Median

6.3
2

Point
7.7 + 8.9
2

7.7
3

n +1
2
8.3

8.9
4

10.3
5

6 +1
2

11.7
6

3 .5

Mode
1. Measure of Central Tendency
2. Value That Occurs Most Often
3. Not Affected by Extreme Values
4. May Be No Mode or Several Modes
5. May Be Used for Numerical &
Categorical Data

The mode the most frequently


occurring score. Midpoint of most
populous class interval. Can have
bimodal and multimodal distributions.

Grouped/Classified Data

Mode Example
No Mode
Raw Data:

10.3 4.9 8.9 11.7 6.3 7.7

One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
More Than 1 Mode
Raw Data: 21 28

28

41

43

43

Thinking Challenge
Youre a financial analyst.
You have collected the
following closing stock
prices of new stock issues:
17, 16, 21, 18, 13, 16, 12,
11.
Describe the stock prices
in terms of central
tendency.

ODD & EVEN DATA

Classified Data

Comparison of mean, median


and mode
Mode

Good for nominal variables


Good if you need to know most frequent
observation
Quick and easy

Median
Good for bad distributions
Good for distributions with arbitrary
ceiling or floor

Comparison of mean, median


& mode
Mean

Used for inference as well as description;


best estimator of the parameter
Based on all data in the distribution
Generally preferred except for bad
distribution. Most commonly used
statistic for central tendency.

Best Guess interpretations


Mean average of signed error will be
zero.
Mode will be absolutely right with
greatest frequency
Median smallest absolute error

Shape of a Distribution
Describes how data are distributed
Measures of shape
Symmetric or skewed
Left-Skewed

Symmetric

Right-Skewed

Mean < Median

Mean = Median

Median < Mean

Statistics for Business


and Economics, 6e

Chap 3-26

Influence of Distribution
Shape

Review

What is central tendency?


Mode
Median
Mean

Review

Range
Average deviation
Variance
Standard Deviation
Z score

Variation

Numerical Data
Properties & Measures
Numerical Data
Properties
Central
Tendency

Variation

Shape

Mean

Range

Median

Variance
Standard Deviation

Mode

Skew

4 Statistics: Range, Average Deviation,


Variance, & Standard Deviation
Range = high score minus low score.
12 14 14 16 16 18 20 range=20-12=8

Average Deviation mean of absolute


deviations from the median:
| X Md |

AD
N

Note difference between Hays & undergrad textdeviation from Median vs. Mean

Variance

2
(
X


Population Variance:
N
Where 2means population variance,
means population mean, and the other
terms have their usual meaning.
The variance is equal to the average squared
deviation from the mean.
To compute, take each score and subtract the
mean. Square the result. Find the average
over scores. Ta da! The variance.
2

Computing the Variance


(N=5)

15

10

X X (X X )

-10

100

15

-5

25

15

15

20

15

25

25

15

10

100

Total:

75

250

Mean:

Variance

Is

50

Standard Deviation
Variance is average squared deviation
from the mean.
To return to original, unsquared units,
we just take the square root of the
variance. This is the standard
deviation.
2
Population formula:
( X )

Standard Deviation
Sometimes called the root-mean-square
deviation from the mean. This name
says how to compute it from the inside
out.
Find the deviation (difference between
the score and the mean).
Find the deviations squared.
Find their mean.
Take the square root.

Computing the Standard


Deviation
2
(X X )
(N=5) X
X

5
10
15
20
25
Total:
Mean:
Sqrt

15
15
15
15
15
75
Variance
SD

X X

-10
-5
0
5
10
0
Is
Is

100
25
0
25
100
250
50
50 7.07

Example: Age Distribution


Distribution of Age
Central Tendency, Variability, and Shape
16
Median = 23

Average Distrance from Mean

Mode = 21

12

Frequency

Mean=25.73

SD = 6.47

0
10

20

30

age

40

50

Standard or z score
A z score indicates distance from the
mean in standard deviation units.
Formula:
X X
z
S

X
z

Converting to standard or z scores does


not change the shape of the distribution.
Z-scores are not normalized.

Skewness and Kurtosis


Skewness and kurtosis describe the shape of your
data set's distribution. Skewness indicates how
symmetrical the data set is, while kurtosis indicates
how heavy your data set is about its mean compared
to its tails.
Perfectly symmetrical data sets will have a skewness
of zero (skewness = 0), and a normally distributed
data set will have a kurtosis of approximately three
(kurtosis=3).

SKEWNESS

KURTOSIS

EQUATION

skewness: g1 = m3 / m23/2

kurtosis: a4 = m4 / m22

Example

Calculation of Skewness ON
CLASSIFIED DATA

Finally, the skewness is


g1 = m3 / m23/2 = 2.6933 / 8.52753/2 = 0.1082

Interpretation
If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly
zero is quite unlikely for real-world data, so how can you interpret the
skewness number?
Bulmer, M. G., Principles of Statistics (Dover, 1979) a classic suggests this
rule of thumb:
If skewness is less than 1 or greater than +1, the distribution is highly skewed.
If skewness is between 1 and or between + and +1, the distribution is
moderately skewed.
If skewness is between and +, the distribution is approximately
symmetric.
With a skewness of 0.1098, the sample data for student heights are
approximately symmetric.

Calculation of Kurtosis

Influence of Distribution
Shape

Potrebbero piacerti anche