Sei sulla pagina 1di 69

INTRODUCTION TO

STATISTICS AND
STATISTICAL
INFERENCE
Training on Teaching
Basic Statistics for
Tertiary Level Teachers
Summer 2008

Note: Most of the Slides were taken from


Elementary Statistics: A Handbook of Slide
Presentation prepared by Z.V.J. Albacea, C.E.
Reano, R.V. Collado, L.N. Comia and N.A.
Tandang in 2005 for the Institute of Statistics,
CAS, UP Los Banos

TEACHING BASIC STATISTICS .

Session 1.2

TEACHING BASIC STATISTICS .

Session 1.3

TEACHING BASIC STATISTICS .

Session 1.4

TEACHING BASIC STATISTICS .

Session 1.5

TEACHING BASIC STATISTICS .

Session 1.6

TEACHING BASIC STATISTICS .

Session 1.7

TEACHING BASIC STATISTICS .

Session 1.8

TEACHING BASIC STATISTICS .

Areas of Statistics
Descriptive statistics

Inferential statistics

methods

methods

concerned w/
collecting, describing, and
analyzing a set of data
without drawing
conclusions (or inferences)
about a large group

concerned
with the analysis of a
subset of data leading
to predictions or
inferences about the
entire set of data

Session 1.9

TEACHING BASIC STATISTICS .

Example of Descriptive Statistics


Present the Philippine population by constructing a
graph indicating the total number of Filipinos counted
during the last census by age group and sex

Session 1.10

TEACHING BASIC STATISTICS .

Example of Inferential Statistics


A new milk formulation designed to improve the psychomotor
development of infants was tested on randomly selected infants.

Based on the results, it was concluded that the new milk formulation is
effective in improving the psychomotor development of infants.
Session 1.11

TEACHING BASIC STATISTICS .

Inferential Statistics
Larger Set
(N units/observations)

Smaller Set
(n
units/observations)

Inferences and
Generalizations

Session 1.12

TEACHING BASIC STATISTICS .

Key Definitions

A universe is the collection of things or


observational units under consideration.

A variable is a characteristic observed


or measured on every unit of the
universe.

A population is the set of all possible


values of the variable.

Session 1.13

TEACHING BASIC STATISTICS .

Key Definitions

Parameters are numerical measures


that describe the population or universe
of interest. Usually donated by Greek
letters; (mu), (sigma), (rho),
(lambda), (tau), (theta), (alpha) and
(beta).

Statistics are numerical measures of a


sample
Session 1.14

TEACHING BASIC STATISTICS .

Types of Variables
Qualitative variable

non-numerical values

Quantitative variable

numerical values
a.

Discrete

b.

Continuous

c.

countable
measurable

Constant
Session 1.15

TEACHING BASIC STATISTICS .

Levels of Measurement
1.

Nominal

2.

Ordinal scale

3.

Accounts for order; no indication of distance


between positions
Used in ranking, no meaningful numerical
statements can be made about difference between
categories.

Interval scale

4.

Numbers or symbols used to classify

Equal intervals; no absolute zero

Ratio scale

Has absolute zero

Session 1.16

TEACHING BASIC STATISTICS .

NOMINAL

ORDINAL

Gender, Political Party,


Religion, Automobile
Ownership,
Teachers Performance, Movie
Classification, Faculty Rank,
Hotel Ratings, Student Class
Designation

INTERVAL

Temperature,

RATIO

Weight, Age, Salary

Session 1.17

TEACHING BASIC STATISTICS .

Methods of Collecting Data

Objective Method

Subjective Method

Use of Existing Records

Session 1.18

TEACHING BASIC STATISTICS .

Methods of Presenting Data


Textual
Tabular
Graphical
Session 1.19

TEACHING BASIC STATISTICS .

Summary Measures

Location

Variation
Percentile
Quartile
Decile

Maximum
Minimum
Central
Tendency

Mean

Range
Variance

Kurtosis
Coefficient of
Variation
Interquartile
Range

Mode

Median

Skewness

Standard Deviation

Session 1.20

TEACHING BASIC STATISTICS .

Measures of Location
A Measure of Location summarizes a
data set by giving a typical value within
the range of the data values that describes
its location relative to entire data set.
Some Common Measures:
Minimum, Maximum
Central Tendency
Percentiles, Deciles, Quartiles

Session 1.21

TEACHING BASIC STATISTICS .

Maximum and Minimum

Minimum is the smallest value


in the data set, denoted as MIN.

Maximum is the largest value in


the data set, denoted as MAX.
Session 1.22

TEACHING BASIC STATISTICS .

Measure of Central Tendency

A single value that is used to identify


the center of the data
it is thought of as a typical value of
the distribution
precise yet simple
most representative value of the
data

Session 1.23

N
N
x

x
xnn

N
i
1ii
1
2
N
n1i12n
TEACHING BASIC STATISTICS .

Mean

Most common measure of the center


Also known as arithmetic average

Population Mean

Sample Mean

Session 1.24

TEACHING BASIC STATISTICS .

Properties of the Mean

may not be an actual


observation in the data set
can be applied in at least
interval level
easy to compute
every observation contributes to
the value of the mean
Session 1.25

TEACHING BASIC STATISTICS .

Properties of the Mean

subgroup means can be combined


to come up with a group mean
easily affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Mean = 5

0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 6

Session 1.26

TEACHING BASIC STATISTICS .

Median

Divides the observations into two equal


parts

If the number of observations is odd, the


median is the middle number.
If the number of observations is even, the
median is the average of the 2 middle
numbers.

~
Sample median denoted as x

while population median is denoted as


Session 1.27

TEACHING BASIC STATISTICS .

Properties of a Median

may not be an actual observation in


the data set
can be applied in at least ordinal level
a positional measure; not affected by
extreme values

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5

Session 1.28

TEACHING BASIC STATISTICS .

Mode

occurs most frequently


nominal average
may or may not exist

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

0 1 2 3 4 5 6

No Mode
Mode = 9
Session 1.29

TEACHING BASIC STATISTICS .

Properties of a Mode

can be used for qualitative as


well as quantitative data
may not be unique
not affected by extreme values
can be computed for ungrouped
and grouped data

Session 1.30

TEACHING BASIC STATISTICS .

Mean, Median & Mode


Use the mean when:

sampling stability is desired


other measures are to be
computed

Session 1.31

TEACHING BASIC STATISTICS .

Mean, Median & Mode


Use the median when:

the exact midpoint of the


distribution is desired
there are extreme
observations

Session 1.32

TEACHING BASIC STATISTICS .

Mean, Median & Mode


Use the mode when:

when the "typical" value is


desired
when the dataset is measured
on a nominal scale

Session 1.33

TEACHING BASIC STATISTICS .

Percentiles

Numerical measures that give the


relative position of a data value
relative to the entire data set.
Divide an array (raw data arranged
in increasing or decreasing order of
magnitude) into 100 equal parts.
The jth percentile, denoted as Pj, is
the data value in the the data set
that separates the bottom j% of the
data from the top (100-j)%.

Session 1.34

TEACHING BASIC STATISTICS .

EXAMPLE
Suppose LJ was told that relative
to the other scores on a certain
test, his score was the 95th
percentile.
This means that 95% of those
who took the test had scores less
than or equal to LJs score, while
5% had scores higher than LJs.
Session 1.35

TEACHING BASIC STATISTICS .

Deciles

Divide an array into ten equal


parts, each part having ten
percent of the distribution of the
data values, denoted by Dj.

The 1st decile is the 10th


percentile; the 2nd decile is the
20th percentile..
Session 1.36

TEACHING BASIC STATISTICS .

Quartiles

Divide an array into four equal


parts, each part having 25% of
the distribution of the data
values, denoted by Qj.
The 1st quartile is the 25th
percentile; the 2nd quartile is
the 50th percentile, also the
median and the 3rd quartile is
the 75th percentile.
Session 1.37

TEACHING BASIC STATISTICS .

Measures of Variation
A

measure of variation is a
single value that is used to
describe the spread of the
distribution
A measure

of central tendency
alone does not uniquely
describe a distribution
Session 1.38

TEACHING BASIC STATISTICS .

A look at dispersion
Data A
11

12

13

14

15

16

17

18

19

20 21

Mean = 15.5
s = 3.338

20 21

Mean = 15.5
s = .9258

Data B
11

12

13

14

15

16

17

18

19

Data C
11

12

13

14

15

16

17

18

19

20 21

Mean = 15.5
s = 4.57

Session 1.39

TEACHING BASIC STATISTICS .

Two Types of Measures of


Dispersion
Absolute Measures of Dispersion:
Range
Inter-quartile Range
Variance
Standard Deviation

Relative Measure of Dispersion:


Coefficient of Variation
Session 1.40

TEACHING BASIC STATISTICS .

Range (R)
The difference between the maximum and
minimum value in a data set, i.e.
R = MAX MIN
Example: Pulse rates of 15 male residents of a
certain village
54
74

58
75

58 60 62 65 66 71
77 78 80 82 85

R = 85 - 54 = 31

Session 1.41

TEACHING BASIC STATISTICS .

Some Properties of the Range

The larger the value of the


range, the more dispersed
the observations are.
It is quick and easy to
understand.
A rough measure of
dispersion.
Session 1.42

TEACHING BASIC STATISTICS .

Inter-Quartile Range (IQR)


The difference between the third quartile and
first quartile, i.e.
IQR = Q3 Q1
Example: Pulse rates of 15 residents of a
certain village
54
74

58
75

58 60 62 65 66 71
77 78 80 82 85

IQR = 78 - 60 = 18
Session 1.43

TEACHING BASIC STATISTICS .

Some Properties of IQR

Reduces the influence of


extreme values.

Not as easy to calculate


as the Range.

Session 1.44

TEACHING BASIC STATISTICS .

Variance
important measure of variation
shows variation about the mean

Population variance

(X
i 1

)2

N
n

Sample variance

s2

(x x)
i 1

n 1

Session 1.45

TEACHING BASIC STATISTICS .

Standard Deviation (SD)

most important measure of variation


square root of Variance
has the same units as the original data
N

Population SD

(X
i 1

)2

N
n

Sample SD

(x x)
i 1

n 1
Session 1.46

(s4.31096)2(16)2(146)2(1576)2(176)2(186)2(416)2
TEACHING BASIC STATISTICS .

Computation of Standard Deviation


Data: 10

12

n=8
=16

14

15

17

18

18

24

Mean

Session 1.47

TEACHING BASIC STATISTICS .

Remarks on Standard Deviation

If there is a large amount of variation,


then on average, the data values will be
far from the mean. Hence, the SD will be
large.
If there is only a small amount of
variation, then on average, the data
values will be close to the mean. Hence,
the SD will be small.
Session 1.48

TEACHING BASIC STATISTICS .

Comparing Standard Deviation


Data A
11 12

13

14

15

16

17

18

19

20 21

Mean = 15.5
s = 3.338

20 21

Mean = 15.5
s = .9258

20 21

Mean = 15.5
s = 4.57

Data B
11 12

13

14

15

16

17

18

19

Data C

11 12

13

14

15

16

17

18

19

Session 1.49

TEACHING BASIC STATISTICS .

Comparing Standard Deviation


Example: Team A - Heights of five marathon players in inches

Mean = 65
S
=0

65

65

65

65

65

Session 1.50

TEACHING BASIC STATISTICS .

Comparing Standard Deviation


Example: Team B - Heights of five marathon players in inches

Mean = 65
s = 4.0

62

67

66

70

60

Session 1.51

TEACHING BASIC STATISTICS .

Properties of Standard Deviation

It is the most widely used measure of


dispersion. (Chebychevs Inequality)
It is based on all the items and is rigidly
defined.
It is used to test the reliability of measures
calculated from samples.
The standard deviation is sensitive to the
presence of extreme values.
It is not easy to calculate by hand (unlike the
range).
Session 1.52

TEACHING BASIC STATISTICS .

Chebyshevs Rule
It permits us to make statements about

the percentage of observations that


must be within a specified number of
standard deviation from the mean
The proportion of any distribution that
lies within k standard deviations of the
mean is at least 1-(1/k2) where k is any
positive number larger than 1.
This rule applies to any distribution.
Session 1.53

TEACHING BASIC STATISTICS .

Chebyshevs Rule
For any data set with mean () and
standard deviation (SD), the following
statements apply:
At least 75% of the observations are
within 2SD of its mean.
At least 88.9% of the observations are
within 3SD of its mean.
Session 1.54

TEACHING BASIC STATISTICS .

Illustration

At least 75%

At least 75% of the observations


are within 2SD of its mean.
Session 1.55

TEACHING BASIC STATISTICS .

Example
The midterm exam scores of 100 STAT 1 students
last semester had a mean of 65 and a standard
deviation of 8 points.
Applying the Chebyshevs Rule, we can say that:
1. At least 75% of the students had scores
between 49 and 81.
2. At least 88.9% of the students had scores
between 41 and 89.

Session 1.56

TEACHING BASIC STATISTICS .

Coefficient of Variation (CV)

measure of relative variation


usually expressed in percent
shows variation relative to mean
used to compare 2 or more groups
Formula :
SD
CV
100%
Mean
Session 1.57

TEACHING BASIC STATISTICS .

Comparing CVs
Stock A: Average Price = P50
SD = P5
CV = 10%
Stock B: Average Price = P100
SD = P5
CV = 5%

Session 1.58

TEACHING BASIC STATISTICS .

Measure of Skewness

Describes the degree of departures of the


distribution of the data from symmetry.
The degree of skewness is measured by
the coefficient of skewness, denoted as SK
and computed as,

3 Mean Median
SK
SD

Session 1.59

TEACHING BASIC STATISTICS .

What is Symmetry?
A distribution is said to be
symmetric about the mean,
if the distribution to the left of
mean is the mirror image
of the distribution to the right
of the mean. Likewise, a
symmetric distribution has
SK=0 since its mean is
equal to its median and its
mode.
Session 1.60

TEACHING BASIC STATISTICS .

Measure of Skewness
SK > 0
positively skewed

SK < 0
negatively skewed

Session 1.61

TEACHING BASIC STATISTICS .

Measure of Kurtosis

Describes the extent of peakedness or


flatness of the distribution of the data.
Measured by coefficient of kurtosis (K)
computed as,
N

i 1

Session 1.62

TEACHING BASIC STATISTICS .

Measure of Kurtosis
K=0
mesokurtic

K>0
leptokurtic

K<0
platykurtic
Session 1.63

TEACHING BASIC STATISTICS .

Box-and-Whiskers Plot

Concerned with the symmetry of the


distribution and incorporates
measures of location in order to study
the variability of the observations.
Also called as box plot or 5-number
summary (represented by Min, Max,
Q1, Q2, and Q3).
Suitable for identifying outliers.
Session 1.64

TEACHING BASIC STATISTICS .

Box-and-Whiskers Plot
The diagram is made up of a box which
lies between the first and third
quartiles.
The whiskers are the straight line
extending from the ends of the box to
the smallest and largest values that
are not outliers.

Session 1.65

TEACHING BASIC STATISTICS .

Steps to Construct a Box-and-Whiskers plot


Step 1: Draw a rectangular box whose left edge is at the Q1 and whose
right edge is at the Q3 so the box width is the IQR. Then draw a
vertical line segment inside the box where the median is found.

Q1

Md

Q3

75

78

85
Session 1.66

TEACHING BASIC STATISTICS .

Steps to Construct a Box-and-Whiskers plot


Step 2: Place marks at distances 1.5 IQR from
either end of the box. (1.5 IQR =15)
1.5 IQR

1.5 IQR

60

Q1

Md

Q3

75

78

85

100

Session 1.67

TEACHING BASIC STATISTICS .

Steps to Construct a Box-and-Whiskers


plot
Step 3:Draw the horizontal line segments
known as the whiskers from each of the
end box to the largest and smallest values
in the data set that are not outliers.
(An observation beyond 1.5 IQR is an
outlier.)
If the largest and smallest values in the
data set are outliers, extend whiskers until
1.5 IQR from either ends of the box.
Session 1.68

TEACHING BASIC STATISTICS .

Steps to Construct a Box-and-Whiskers


plot
Step 4: For every outlier, draw a dot. If two or more dots
have the same values, draw the dots side by side.
1.5 IQR
1.5 IQR

.
.
55

60

Q1

Md

Q3

75

78

85

98

100
Session 1.69

Potrebbero piacerti anche