Stat Lecture

Elementary Statistics
(for Math 104 classes)
Dante V. Partosa
Mathematics Department
College of Science and Information Technology
Ateneo de Zamboanga University
Preliminaries
Statistics consists of conducting

studies to collect, organize,
summarize, analyze, and draw
conclusions.
Data are the values (measurements
or observations) that the variables
can assume.
Variables whose values are
determined by chance are called
random variables.
A collection of data values forms a
data set.
Each value in the data set is called
a data value or a datum.
Descriptive statistics consists of the
collection, organization, summation,
and presentation of data.
A population consists of all subjects
(human or otherwise) that are being
studied.
A sample is a subgroup of the
population.
Inferential statistics consists of
generalizing from samples to
populations, performing hypothesis
testing, determining relationships
among variables, and making
predictions.
Variables and Types of Data
Qualitative variables are variables
that can be placed into distinct
categories, according to some
characteristic or attribute. For
example, gender (male or female).
Quantitative variables are numerical
in nature and can be ordered or
ranked. Example: age is numerical
and the values can be ranked.
Discrete variables assume values

that can be counted.
Continuous variables can assume all
values between any two specific
values. They are obtained by
measuring.
The nominal level of measurement

classifies data into mutually exclusive
(nonoverlapping), exhausting categories
in which no order or ranking can be
imposed on the data.
The ordinal level of measurement

classifies data into categories that can
be ranked; precise differences between
the ranks do not exist.
The interval level of measurement ranks

data; precise differences between units
of measure do exist; there is no
meaningful zero.
The ratio level of measurement

possesses all the characteristics of
interval measurement, and there exists
a true zero. In addition, true ratios exist
for the same variable.
Data Collection and Sampling
Techniques
Data can be collected in a variety of ways.

One of the most common methods is through
the use of surveys.
Surveys can be done by using a variety of
methods -
Examples are telephone, mail questionnaires,
personal interviews, surveying records and
direct observations.
Techniques
To obtain samples that are unbiased,

statisticians use four methods of
sampling.
Random samples are selected by using
chance methods or random numbers.
Techniques
Systematic samples are obtained by

numbering each value in the population
and then selecting the kth value.
Techniques
Stratified samples are selected by

dividing the population into groups
(strata) according to some characteristic
and then taking samples from each
group.
Techniques
Cluster samples are selected by

dividing the population into groups and
then taking samples of the groups.
Computers and Calculators
Computers and calculators make

numerical computation easier.
Many statistical packages are available.
One example is SSPW (SPSS), MINITAB,
PHStat, Excel. The TI-83 calculator can
also be used to do statistical calculations.
Data must still be understood and
interpreted.
Organizing Data
When data are collected in original

form, they are called raw data.
When the raw data is organized into a
frequency distribution, the frequency
will be the number of values in a
specific class of the distribution.
Organizing Data
A frequency distribution is the

organizing of raw data in table form,
using classes and frequencies.
The following slide shows an example
of a frequency distribution.
Three Types of Frequency
Distributions
Categorical frequency distributions - can

be used for data that can be placed in
specific categories, such as nominal- or
ordinal-level data.
Examples - political affiliation, religious
affiliation, blood type etc.
Blood Type Frequency Distribution -
Example
C lass Frequency Percent
A 5 20
B 7 28
O 9 36
AB 4 16
Ungrouped Frequency
Distributions
Ungrouped frequency distributions - can
be used for data that can be enumerated
and when the range of values in the data
set is not large.
Examples - number of miles your
instructors have to travel from home to
campus, number of girls in a 4-child family
etc.
Number of Miles Traveled -
Example
Class Frequency
5 24
10 16
15 10
Grouped Frequency Distributions
Grouped frequency distributions - can be

used when the range of values in the data
set is very large. The data must be
grouped into classes that are more than
one unit in width.
Examples - the life of boat batteries in
hours.
Lifetimes of Boat Batteries -
Example
C l as s C l as s F r e q u e n c y C u m u l a ti v e
l i m i ts Bo u n d a r i e s fr e q u e n c y
24 - 30 2 3 .5 - 3 7 .5 4 4
38 - 51 3 7 .5 - 5 1 .5 14 18
52 - 65 5 1 .5 - 6 5 .5 7 25
Terms Associated with a Grouped
Frequency Distribution
Class limits represent the smallest and

largest data values that can be included in
a class.
In the lifetimes of boat batteries example,
the values 24 and 30 of the first class are
the class limits.
The lower class limit is 24 and the upper
class limit is 30.
The class boundaries are used to

separate the classes so that there are
no gaps in the frequency distribution.
The class width for a class in a

frequency distribution is found by
subtracting the lower (or upper) class
limit of one class minus the lower (or
upper) class limit of the previous
class.
Guidelines for Constructing a
There should be between 5 and 20

classes.
The class width should be an odd
number.
The classes must be mutually
exclusive.
Guidelines for Constructing a
The classes must be continuous.

The classes must be exhaustive.
The class must be equal in width.
Procedure for Constructing a Grouped
Find the highest and lowest value.

Find the range.
Select the number of classes desired.
Find the width by dividing the range by
the number of classes and rounding up.
Procedure for Constructing a Grouped
Select a starting point (usually the lowest

value); add the width to get the lower
limits.
Find the upper class limits.
Find the boundaries.
Tally the data, find the frequencies, and
find the cumulative frequency.
Grouped Frequency Distribution -
Example
10 8 6 14
22 13 17 19
11 9 18 14
13 12 15 15
5 11 16 11
Example
Step 1: Find the highest and lowest

values: H = 22 and L = 5.
Step 2: Find the range:
R = H L = 22 5 = 17.
Step 3: Select the number of classes
desired. In this case it is
equal to 6.
Example
Step 4: Find the class width by

dividing the range by the number of
classes. Width = 17/6 = 2.83. This
value is rounded up to 3.
Example
Step 5: Select a starting point for the

lowest class limit. For convenience,
this value is chosen to be 5, the
smallest data value. The lower class
limits will be 5, 8, 11, 14, 17, and 20.
Example
Step 6: The upper class limits will be

7, 10, 13, 16, 19, and 22. For
example, the upper limit for the first
class is computed as 8 - 1, etc.
Example
Step 7: Find the class boundaries by

subtracting 0.5 from each lower class
limit and adding 0.5 to the upper
class limit.
Example
Step 8: Tally the data, write the

numerical values for the tallies in the
frequency column, and find the
cumulative frequencies.
The grouped frequency distribution is
shown on the next slide.
Note: The dash - represents to.
Class Limits Class Boundaries Frequency Cumulative Frequency
05 t o 07 4.5 - 7.5 2 2
08 t o 10 7.5 - 10.5 3 5
11 t o 13 10.5 - 13.5 6 11
14 t o 16 13.5 - 16.5 5 16
17 t o 19 16.5 - 19.5 3 19
20 t o 22 19.5 - 22.5 1 20
Histograms, Frequency Polygons,
and Ogives
The three most commonly used

graphs in research are:
The histogram.
The frequency polygon.
The cumulative frequency graph, or
ogive (pronounced o-jive).
and Ogives
The histogram is a graph that

displays the data by using vertical
bars of various heights to represent
the frequencies.
Example of a Histogram
5
Frequency
5 8 11 14 17 20
N u m b e r o f C ig a re tte s S m o k e d p e r D a y
and Ogives
A frequency polygon is a graph that

displays the data by using lines that
connect points plotted for frequencies
at the midpoint of classes. The
frequencies represent the heights of
the midpoints.
Example of a Frequency Polygon
Frequency Polygon
5
Frequency
2 5 8 11 14 17 20 23 26
Number of Cigarettes Smoked per Day

and Ogives
A cumulative frequency graph or

ogive is a graph that represents the
cumulative frequencies for the
classes in a frequency distribution.
Example of an Ogive
Ogive
20
Cumulative Frequency
10
2 5 8 11 14 17 20 23 26
Number of C igarettes Smoked per Day

Other Types of Graphs
Pareto charts - a Pareto chart is

used to represent a frequency
distribution for a categorical variable.
Other Types of Graphs-
Pareto Chart
When constructing a Pareto chart -

Make the bars the same width.
Arrange the data from largest to
smallest according to frequencies.
Make the units that are used for the
frequency equal in size.
Example of a Pareto Chart
Pareto C hart for the num ber of Crim es Inves tigated by Law
Enforcement Officers in U.S. National Parks During 1995.
250 100
200 80
Percent
Count
150 60
100 40
50 20
0 0
Defec t
Count 164 34 29 13
Perc ent 68.3 14.2 12.1 5.4
Cum % 68.3 82.5 94.6 100.0
Time series graph - A time series

graph represents data that occur over
a specific period of time.
2-4 Other Types of Graphs -
Time Series Graph
P O R T AU T H O R IT Y T R AN S IT R ID E R S H IP
89
Ridership (in millions)
87
85
83
81
79
77
75
199 0 19 91 1992 1993 19 94
Y ear
Pie graph - A pie graph is a circle that

is divided into sections or wedges
according to the percentage of
frequencies in each category of the
distribution.
Other Types of Graphs -
Pie Graph
Pie Chart of the Robbery (29,
Number of Crimes 12.1%)
Investigated by Rape (34,
Law Enforcement 14.2%)
Officers In U.S.
National Parks Homicide
During 1995 (13, 5.4%)
Assaults
(164,
68.3%)
Organizing Data
Describing Data
Measures of Central Tendency
A statistic is a characteristic or
measure obtained by using the data
values from a sample.
A parameter is a characteristic or
measure obtained by using the data
values from a specific population.
The Mean (arithmetic average)
The mean is defined to be the sum
of the data values divided by the
total number of values.
We will compute two means: one
for the sample and one for a finite
population of values.
The mean, in most cases, is not an
actual data value.
The Sample Mean
The symbol X represents the sampl e mean.

X i s read as " X - bar " . The G reek symbol
i s read as " si gma" and i t means " to sum" .
X + X + ... + X
X= 1 2 n
n
X.
=
n
The Sample Mean - Example
T h e a g es i n w eek s o f a r a n d o m sa m p l e
o f s i x k i tte n s a t a n a n i m a l s h e l te r a r e
3 , 8 , 5 , 1 2 , 1 4 , a n d 1 2 . F i n d th e
a v e r a g e a g e o f t h i s s a m p l e.
T h e sa m p l e m ea n i s
X = X
=
3 + 8 + 5 +12 +14 +12
n 6
54
= = 9 w e e k s.
6
The Population Mean
The G r eek symbol m r epr esents the popul ati on

mean. The symbol m i s r ead as " mu" .
N i s the si ze of the fi ni te popul ati on.
X + X + ... + X
m=
1 2 N
N
X.
=
N
The Population Mean - Example
A smal l company consi sts of the owner , the manager ,

the sal esperson, and two techni ci ans. The sal ari es are
l i sted as $50,000, 20,000, 12,000, 9,000 and 9,000
respecti vel y. ( Assume thi s i s the popul ati on.)
Then the popul ati on mean wi l l be
= X
m
N
50,000 +20,000 +12,000 +9,000 +9,000
=
5
= $20,000.
The Sample Mean for an Ungrouped
The mean for an ungrouped frequency

di stri but i on i s gi ven by
(f X)
X= .
n
H ere f i s the frequency for the
correspondi ng val ue of X , and n = f .
Frequency Distribution - Example
The scores for 25 students on a 4 point quiz

are given in the table. Find the mean score
SSccoorree,,XX FFrreeqquueennccyy,,ff
00 22
11 44
22 1122
33 44
5
44 33
5
SSccoorree,,XX FFrreeqquueennccyy,,ff ff?XX

00 22 00
11 44 44
22 1122 2244
33 44 1122
44 33 1122
5
f X 52
X= = = 2.08.
n 25
The Sample Mean for a Grouped
The meanfor a groupedfrequency

distributionis givenby
( f X m)
X= .
n
Here X is thecorresponding
m
class midpoint.
Given the table below, find the mean.
CCllaassss FFrreeqquueennccyy,,ff
1155.5
.5--2200.5.5 33
2200.5
.5--2255.5
.5 55
2255.5
.5--3300.5
.5 44
3300.5
.5--3355.5
.5 33
3355.5
.5--4400.5
.5 22
5
5
Table with class midpoints, Xm.

CCla
lasss FFrreeqquueennccyy,,ff XXmm ff?XXmm
1155.5
.5--2200.5
.5 33 1188 5544
2200.5
.5--2255.5
.5 55 2233 111155
2255.5
.5--3300.5
.5 44 2288 111122
3300.5
.5--3355.5
.5 33 3333 9999
5
3355.5
.5--4400.5
.5 22 3388 7766
5
f X m = 54 + 115 + 112 + 99 + 76
= 456
and n = 17. So
f Xm
X=
n
456
= = 26.82.
17
The Median
When a data set is ordered, it is

called a data array.
The median is defined to be the
midpoint of the data array.
The symbol used to denote the
median is MD.
The Median - Example
The weights (in pounds) of seven

army recruits are 180, 201, 220,
191, 219, 209, and 186. Find the
median.
Arrange the data in order and
select the middle point.
Data array: 180, 186, 191, 201,

209, 219, 220.
The median, MD = 201.
The Median
In the previous example, there was

an odd number of values in the
data set. In this case it is easy to
select the middle number in the
data array.
The Median
When there is an even number of

values in the data set, the median
is obtained by taking the average of
the two middle numbers.
Six customers purchased the following

number of magazines: 1, 7, 3, 2, 3, 4.
Find the median.
Arrange the data in order and compute
the middle point.
Data array: 1, 2, 3, 3, 4, 7.
The median, MD = (3 + 3)/2 = 3.
The ages of 10 college students

are: 18, 24, 20, 35, 19, 23, 26, 23,
19, 20. Find the median.
Arrange the data in order and
compute the middle point.
Data array: 18, 19, 19, 20, 20, 23,

23, 24, 26, 35.
The median,
MD = (20 + 23)/2 = 21.5.
The Median-Ungrouped Frequency
Distribution
For an ungrouped frequency

distribution, find the median by
examining the cumulative
frequencies to locate the middle
value.
Distribution
If n is the sample size, compute

n/2. Locate the data point where
n/2 values fall below and n/2
values fall above.
Distribution - Example
LRJ Appliance recorded the number of

VCRs sold per week over a one-year
period. The data is given below.
NNoo. .SSeetstsSSoold
ld FFrreeqquueennccyy
11 44
22 99
33 66
44 22
55 33
To locate the middle point, divide n by 2;

24/2 = 12.
Locate the point where 12 values would fall
below and 12 values will fall above.
Consider the cumulative distribution.
The 12th and 13th values fall in class 2.
Hence MD = 2.
NNoo..SSeetstsSSoold
ld FFrreeqquueennccyy CCuum muulalatitv
ivee
FFrreeqquueennccyy
11 44 44
22 99 1133
33 66 1199
44 22 2211
55 33 2244
This class contains the 5th through the

13th values.
The Median for a Grouped
Themediancan be computed from:

(n 2) - cf
MD = (w) + Lm
f
Where
n = sum of the frequencies
cf = cumulativefrequencyof the class
immediatelyprecedingthe median class
f = frequencyof the medianclass
w = width of the median class
Lm = lower boundary of the median class
Given the table below, find the median.

1155.5
.5--2200.5.5 33
2200.5
.5--2255.5
.5 55
2255.5
.5--3300.5
.5 44
3300.5
.5--3355.5
.5 33
5
3355.5
.5--4400.5
.5 22
5
Table with cumulative frequencies.

CCla
lassss FFrreeqquueennccyy,,ff CCuum muulalatitv
ivee
FFrreeqquueennccyy
1155.5
.5--2200.5
.5 33 33
2200.5
.5--2255.5
.5 55 88
2255.5
.5--3300.5
.5 44 1122
3300.5
.5--3355.5
.5 33 1155
5
3355.5
.5--4400.5
.5 22 1177
5
The Median for a Grouped Frequency
To locate the halfway point, divide n by 2;

17/2 = 8.5 9.
Find the class that contains the 9th value.
This will be the median class.
Consider the cumulative distribution.
The median class will then be 25.5
30.5.
n =17
cf = 8
f =4
w = 25.520.5=5
Lm = 25.5
(n 2) - cf (17/ 2) 8
MD = (w) + Lm = (5) + 25.5
f 4
= 26.125.
The Mode
The mode is defined to be the value

that occurs most often in a data set.
A data set can have more than one
mode.
A data set is said to have no mode if
all values occur with equal frequency.
The Mode - Examples
The following data represent the duration (in

days) of U.S. space shuttle voyages for the
years 1992-94. Find the mode.
Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10,
14, 11, 8, 14, 11.
Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10,
10, 11, 11, 14, 14, 14. Mode = 8.
The Mode - Examples
Six strains of bacteria were tested to see how

long they could remain alive outside their
normal environment. The time, in minutes, is
given below. Find the mode.
Data set: 2, 3, 5, 7, 8, 10.
There is no mode since each data value
occurs equally with a frequency of one.
The Mode - Examples
Eleven different automobiles were tested at a

speed of 15 mph for stopping distances. The
distance, in feet, is given below. Find the
mode.
Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24,
26, 26.
There are two modes (bimodal). The values
are 18 and 24. Why?
The Mode for an Ungrouped
Given the table below, find the mode.

VVaalluueess FFrreeqquueennccyy,,ff
1155 33
Mode 2200 55
2255 88
3300 33
3355 22
5
5
The Mode - Grouped Frequency
Distribution
The mode for grouped data is the
modal class.
The modal class is the class with the
largest frequency.
Sometimes the midpoint of the class
is used rather than the boundaries.
The Mode for a Grouped Frequency
Given the table below, find the mod

Modal 1155.5
.5--2200.5.5 33
Class
2200.5
.5--2255.5
.5 55
2255.5
.5--3300.5
.5 77
3300.5
.5--3355.5
.5 33
3355.5
.5--4400.5
.5 22
5
5
The Midrange
The midrange is found by adding the

lowest and highest values in the data
set and dividing by 2.
The midrange is a rough estimate of the
middle value of the data.
The symbol that is used to represent the
midrange is MR.
The Midrange - Example
Last winter, the city of Brownsville,

Minnesota, reported the following number of
water-line breaks per month. The data is as
follows: 2, 3, 6, 8, 4, 1. Find the midrange.
MR = (1 + 8)/2 = 4.5.
Note: Extreme values influence the midrange
and thus may not be a typical description of
the middle.
The Weighted Mean
The weighted mean is used when the

values in a data set are not all equally
represented.
The weighted mean of a variable X is
found by multiplying each value by its
corresponding weight and dividing the sum
of the products by the sum of the weights.
The Weighted Mean
The wei ghted mean

w X + w X +...+ wn X n wX
X= 1 1 2
= 2
w + w +...+ wn
1 2 w
where w , w , ..., wn are the wei ghts
1 2
for the val ues X , X , ..., X n . 1 2

Distribution Shapes
Frequency distributions can assume

many shapes.
The three most important shapes are
positively skewed, symmetrical, and
negatively skewed.
Positively Skewed
Y
Positively Skewed
X
Mode < Median < Mean
Symmetrical
Y
Symmetrical
X
Mean = Median = Mode
Negatively Skewed
Negatively Skewed
X
Mean < Median < Mode
Measures of Variation - Range
The range is defined to be the highest

value minus the lowest value. The
symbol R is used for the range.
R = highest value lowest value.
Extremely large or extremely small data
values can drastically affect the range.
Measures of Variation - Population
Variance
The vari ance i s the average of the squares of the

di stance each val ue i s from the mean.
The symbol for the popul ati on vari ance is
s (s i s the G reek l owercase l etter si gma)
2
( X - m ) , where
2
s =
2
N
X = i ndi vi dual val ue
m = popul ati on mean
N = popul ati on si ze
Measures of Variation - Population
Standard Deviation
The standard devi ation i s the square

root of the vari ance.
( X - m) 2
s = s = .
2
N
Measures of Variation - Example
Consider the following data to constitute

the population: 10, 60, 50, 30, 40, 20.
Find the mean and variance.
The mean m = (10 + 60 + 50 + 30 + 40 +
20)/6 = 210/6 = 35.
The variance s 2 = 1750/6 = 291.67. See
next slide for computations.
Measures of Variation - Example
XX XX- mm ((XX - mm))

22
1100 --2255 662255

6600 +
+2255 662255
5500 +
+1155 222255
3300 --55 2255
4400 +
+55 2255
2200 --1155 222255
221100 11775500
3-3 Measures of Variation - Sample
3-58 Variance
The unbiased estimator of the population

variance o r the sample varianc e is a
statistic whose value approximates the
expected value of a population variance.
It is denoted by s , where
2
(X - X ) 2
s = , and
2
n-1
X = sample mean
n = sample size
Measures of Variation - Sample
Standard Deviation
The samplestandarddeviationis the squ

are
root of he
t samplevariance.
( X - X )2
s = s =
2
.
n-1
Shortcut Formula for the Sample
Variance and the Standard Deviation
X - ( X ) / n
2 2
s=
2
n-1
X - ( X ) / n
2 2
s=
n-1
Sample Variance - Example
Find the variance and standard

deviation for the following sample: 16,
19, 15, 15, 14.
X = 16 + 19 + 15 + 15 + 14 = 79.
X2 = 162 + 192 + 152 + 152 + 142
= 1263.
Sample Variance - Example
X - ( X ) / n
2 2
s =
2
n-1
1263- (79)/ 5
2
= = 3.7
4
s = 3.7 = 1.9.
Sample Variance for Grouped and
Ungrouped Data
For grouped data, use the class

midpoints for the observed value in the
different classes.
For ungrouped data, use the same
formula (see next slide) with the class
midpoints, Xm, replaced with the actual
observed X value.
Sample Variance for Grouped and
Ungrouped Data
The sample variance for grouped data:
f X - [( f X ) / n]
2 2
s = .
2 m m
n-1
For ungrouped data, replace Xm
with the observe X value.
Sample Variance for Grouped Data
- Example
XX ff ffX
X ffX 2
X 2
55 22 1010 5050
66 33 18
18 108
108
77 88 56
56 392
392
88 11 88 64
64
99 66 54
54 486
486
10
10 44 40
40 400
400
nn= 24
f X
=
= 24 f X = 186 186
f
fX=
X

22
=1500
1500
Sample Variance for Ungrouped
Data - Example
The samplevarianceand standard deviation:
f X 2 - [( f X )2 / n]
s =
2
n-1
1500- [(186)/ 24] =
2
= 2.54.
23
s = 2.54 = 1.6.
Coefficient of Variation
The coefficient of variation is defined to

be the standard deviation divided by the
mean. The result is expressed as a
percentage.
s s
CVar = 100% or CVar = 100%.
X m
Chebyshevs Theorem
The proportion of values from a data set that

will fall within k standard deviations of the
mean will be at least 1 1/k2, where k is any
number greater than 1.
For k = 2, 75% of the values will lie within 2
standard deviations of the mean. For k = 3,
approximately 89% will lie within 3 standard
deviations.
The Empirical (Normal) Rule
For any bell shaped distribution:

Approximately 68% of the data values will fall
within one standard deviation of the mean.
Approximately 95% will fall within two
standard deviations of the mean.
Approximately 99.7% will fall within three
standard deviations of the mean.
The Empirical (Normal) Rule
m s -- m s -- 95% m s --
m -s m -s m -s m m +s m +s m +s
Measures of Position z score
The standard score or z score for a

value is obtained by subtracting the
mean from the value and dividing the
result by the standard deviation.
The symbol z is used for the z
score.
Measures of Position z-score
The z score represents the number of

standard deviations a data value falls above
or below the mean.
For samples:
X-X
z= .
s
For populations:
=
X -m
z .
s
z-score - Example
A student scored 65 on a statistics exam that

had a mean of 50 and a standard deviation of
10. Compute the z-score.
z = (65 50)/10 = 1.5.
That is, the score of 65 is 1.5 standard
deviations above the mean.
Above - since the z-score is positive.
Measures of Position - Percentiles
Percentiles divide the distribution into 100

groups.
The Pk percentile is defined to be that
numerical value such that at most k% of
the values are smaller than Pk and at most
(100 k)% are larger than Pk in an ordered
data set.
Percentile Formula
The percentile corresponding to a given

value (X) is computed by using the
formula:
number of values below X + 0.5
Percentile= 100%
total number of values
Percentiles - Example
A teacher gives a 20-point test to 10 students.

Find the percentile rank of a score of 12.
Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10.
Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20.
Percentile = [(6 + 0.5)/10](100%) = 65th
percentile. Student did better than 65% of the
class.
Percentiles - Finding the value
Corresponding to a Given
Percentile
Procedure: Let p be the percentile and n the

sample size.
Step 1: Arrange the data in order.
Step 2: Compute c = (np)/100.
Step 3: If c is not a whole number, round up
to the next whole number. If c is a whole
number, use the value halfway between c
and c+1.
Percentile
Step 4: The value of c is the position value of

the required percentile.
Example: Find the value of the 25th
percentile for the following data set: 2, 3, 5, 6,
8, 10, 12, 15, 18, 20.
Note: the data set is already ordered.
n = 10, p = 25, so c = (1025)/100 = 2.5.
Hence round up to c = 3.
Percentile
Thus, the value of the 25th percentile is the

value X = 5.
Find the 80th percentile.
c = (10 80)/100 = 8. Thus the value of the
80th percentile is the average of the 8th and
9th values. Thus, the 80th percentile for the
data set is (15 + 18)/2 = 16.5.
Special Percentiles - Deciles and
Quartiles
Deciles divide the data set into 10

groups.
Deciles are denoted by D1, D2, , D9
with the corresponding percentiles
being P10, P20, , P90
Quartiles divide the data set into 4
groups.
Special Percentiles - Deciles and
Quartiles
Quartiles are denoted by Q1, Q2, and

Q3 with the corresponding percentiles
being P25, P50, and P75.
The median is the same as P50 or Q2.
Outliers and the Interquartile
Range (IQR)
An outlier is an extremely high or an

extremely low data value when
compared with the rest of the data
values.
The Interquartile Range, IQR
= Q3 Q1.
Range (IQR)
To determine whether a data value can be

considered as an outlier:
Step 1: Compute Q1 and Q3.
Step 2: Find the IQR = Q3 Q1.
Step 3: Compute (1.5)(IQR).
Step 4: Compute Q1 (1.5)(IQR) and
Q3 + (1.5)(IQR).
Range (IQR)
To determine whether a data value can be

considered as an outlier:
Step 5: Compare the data value (say X) with
Q1 (1.5)(IQR) and Q3 + (1.5)(IQR).
If X < Q1 (1.5)(IQR) or
if X > Q3 + (1.5)(IQR), then X is considered
an outlier.
Range (IQR) - Example
Given the data set 5, 6, 12, 13, 15, 18, 22, 50,
can the value of 50 be considered as an
outlier?
Q1 = 9, Q3 = 20, IQR = 11. Verify.
(1.5)(IQR) = (1.5)(11) = 16.5.
9 16.5 = 7.5 and 20 + 16.5 = 36.5.
The value of 50 is outside the range 7.5 to
36.5, hence 50 is an outlier.
Exploratory Data Analysis - Stem
and Leaf Plot
A stem and leaf plot is a data plot

that uses part of a data value as the
stem and part of the data value as
the leaf to form groups or classes.
and Leaf Plot - Example
At an outpatient testing center, a

sample of 20 days showed the following
number of cardiograms done each day:
25, 31, 20, 32, 13, 14, 43, 02, 57, 23,
36, 32, 33, 32, 44, 32, 52, 44, 51, 45.
Construct a stem and leaf plot for the
data.
and Leaf Plot - Example
Leading Digit (Stem) Trailing Digit (Leaf)
0 2
1 3 4
2 0 3 5
3 1 2 2 2 2 3 6
4 3 4 4 5
5 1 2 7
Exploratory Data Analysis
Box Plot
When the data set contains a small

number of values, a box plot is used to
graphically represent the data set.
These plots involve five values: the
minimum value, the lower hinge, the
median, the upper hinge, and the
maximum value.
Box Plot
The lower hinge is the median of all values

less than or equal to the median when the
data set has an odd number of values, or
as the median of all values less than the
median when the data set has an even
number of values. The symbol for the
lower hinge is LH.
Box Plot
The upper hinge is the median of all

values greater than or equal to the
median when the data set has an odd
number of values, or as the median of all
values greater than the median when the
data set has an even number of values.
The symbol for the upper hinge is UH.
Exploratory Data Analysis - Box
Plot - Example (Cardiograms data)
LH UH
MINIMUM MAXIMUM
MEDIAN
0 10 20 30 40 50 60
Information Obtained from a
Box Plot
If the median is near the center of the box,

the distribution is approximately symmetric.
If the median falls to the left of the center of
the box, the distribution is positively skewed.
If the median falls to the right of the center of
the box, the distribution is negatively skewed.
Information Obtained from a
Box Plot
If the lines are about the same length, the

distribution is approximately symmetric.
If the right line is larger than the left line, the
distribution is positively skewed.
If the left line is larger than the right line, the
distribution is negatively skewed.

Stat Lecture

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Stat Lecture

Caricato da

Copyright:

Formati disponibili

Elementary Statistics

(for Math 104 classes)

Statistics consists of conducting

Discrete variables assume values

The nominal level of measurement

The ordinal level of measurement

The interval level of measurement ranks

The ratio level of measurement

Data can be collected in a variety of ways.

To obtain samples that are unbiased,

Systematic samples are obtained by

Stratified samples are selected by

Cluster samples are selected by

Computers and calculators make

When data are collected in original

A frequency distribution is the

Categorical frequency distributions - can

C lass Frequency Percent

Grouped frequency distributions - can be

Class limits represent the smallest and

The class boundaries are used to

The class width for a class in a

There should be between 5 and 20

The classes must be continuous.

Find the highest and lowest value.

Select a starting point (usually the lowest

Step 1: Find the highest and lowest

Step 4: Find the class width by

Step 5: Select a starting point for the

Step 6: The upper class limits will be

Step 7: Find the class boundaries by

Step 8: Tally the data, write the

Class Limits Class Boundaries Frequency Cumulative Frequency

The three most commonly used

The histogram is a graph that

A frequency polygon is a graph that

Number of Cigarettes Smoked per Day

A cumulative frequency graph or

Number of C igarettes Smoked per Day

Pareto charts - a Pareto chart is

When constructing a Pareto chart -

Time series graph - A time series

Pie graph - A pie graph is a circle that

The symbol X represents the sampl e mean.

The G r eek symbol m r epr esents the popul ati on

A smal l company consi sts of the owner , the manager ,

The mean for an ungrouped frequency

The scores for 25 students on a 4 point quiz

SSccoorree,,XX FFrreeqquueennccyy,,ff ff?XX

The meanfor a groupedfrequency

Given the table below, find the mean.

Table with class midpoints, Xm.

When a data set is ordered, it is

The weights (in pounds) of seven

Data array: 180, 186, 191, 201,

In the previous example, there was

When there is an even number of

Six customers purchased the following

The ages of 10 college students

Data array: 18, 19, 19, 20, 20, 23,

For an ungrouped frequency

If n is the sample size, compute

LRJ Appliance recorded the number of

To locate the middle point, divide n by 2;