S252 Set1 Jan9

Stat 252: Applied Statistics II
Terminology and Sampling Distributions
Brian Franczak
MacEwan University
Winter 2018
Introduction
I Welcome to Stat 252: Applied Statistics II!
I The topics covered in this set of lecture notes are:

1. what to expect from Stat 252
2. a review of the terminology introduced in Stat 151
3. the definition of an unbiased estimator
4. a review of random variables
5. a review of sampling distributions
2 / 28
The Course Website
I All required course materials will be made available to you via

Blackboard, which will act as our course website.
I The course website will provide you with:

I the corresponding course outline.
I all needed materials (e.g., lecture notes and practice problems).
I assignments (problem sets and solutions).
I discussion boards.
I your grades.
I It is your responsibility to check the discussion boards, news

feed, and consult the reminders.
3 / 28
Statistics
“a body of methods for making wise decisions in the

face of uncertainty.” - W.A. Wallis
4 / 28
Statistics: Formal Definitions
I Depending on the context, statistics can be defined as:

1. facts or data, either numerical or nonnumerical, organized and
summarized so as to provide useful and accessible information
about a particular subject, or
2. the science of organizing and summarizing numerical or
nonnumerical information.
I We can also think of statistics as being a set of tools used to

determine whether the results we see are likely due to some
underlying phenomena, instead of random chance.
5 / 28
Types of Statistical Tools
I Depending on the goal of a particular project or experiment,

statistical tools can be either descriptive or inferential.
I Descriptive statistics are a suite of tools used to organize

and summarize information. e. standard deviation
g.
mean .
I Inferential statistics are ways to form and assess the

reliability (or validity) of conclusions made about a
population using a sample. g. e. confidence interval or hypothesis
test .
I In Stat 252, we will focus on inferential statistics.
6 / 28
Population vs. Sample
I A population is the collection of all elements, or units, (e.g.,
humans, animals, etc...) under consideration.
I A sample is a subset, or group of units, taken from the
population under consideration.
I Consider the following simple illustration:
-7 Population of interest .
#
of all 1
e.
g. heights year
Mac Ewan students .
\ selected
>
Sample , randomly
of students
group
.
I Question: What are the ideal characteristics of a sample?

selected
→ Desire a
randomly sample .
best chance for the sample

→ This the
gives
to be representative of population .
7 / 28
Assumptions
* Model :
Most often a normal distribution with meanie .
13 standard donation of ⇒ Vonaue is 02
-1M¥
,
Describes the unable

⇒ Implies : → .
I * When we use inferential statistics, we start with a model that -
describes the population of interest.
I In general, we can think of this model as being a framework

of assumptions about a population.
I It is of the utmost importance that the selected model is valid!

If we use an invalid model, the results will be meaningless...
I Note that we will discuss methods for verifying, or validating,

assumptions in the upcoming lectures.
8 / 28
Parameters vs. Statistics
I Again, we use inferential statistics to assess the reliability of a

conclusion made about a population.
I We can think of this statement in another way.
I First, note that a statistical model will usually depend on a set
of parameters.
I Where a parameter is essentially a value describing a
characteristic of a population.
I When we collect a sample from a population, we can use the
sample values to calculate statistics.
I Formally, a statistic is a descriptive value that gives precise
information about the sample and estimates a population
parameter.
9 / 28
Parameters vs. Statistics: Nomenclature Review

I In statistics, we use di↵erent symbols to di↵erentiate between
a statistic and a parameter (see examples below).
Sample "
Population
xbar
' ' ' ' "
mu
→ Pn , PN
i=1 xi i=1 xi
Mean x̄0 = n 0=
µ N
" "
'
"
q Sigma
'
s
→ q Pn → PN
(xi x̄)2 (xi µ)2
Standard Deviation s0 = i=1
n 1 0= i=1
N
Pn PN
(xi x̄)2 (xi µ)2
Variance s2 = i=1
n 1
2 = i=1
N
Pn Pn
Proportion i=1 xi i=1 xi
p̂ = n p= N
formula -7
each "
is an
' '
estimator .
10 / 28
Parameters vs. Statistics: Exercise
1. A randomly selected sample of final exam scores in a large
introductory statistics course is as follows:
88 67 64 76 86 85 82 39 75 34
90 63 89 90 84 81 96 100 70 96
a. Identify the population of interest.
b. Estimate the center of the population of interest, interpret
your estimate(s).
c. Estimate the population standard deviation, interpret your
estimate.
all students this uiro slats car
for in
Final
.
a. exanm scores 20
Zxiln
n
[ 88+67+64
*
Verify ;E×i/n 70+963120
=
I = =
b. Med ( × ) =
83 , i= , . . .
+
-
= 77.5
estimator W
score of the 20 observed students estimate .
The average
exam
is 77.5% .
If =[
2
88-77.532-+167 -77.512+(64-77.5)
<
=P s2= ( ⇒ % (
'
x -
,
'
.
s
.
+ "
it (96-77.55)/(20-1) 11 / 28
if population is behaved
310.112 For these 20 students the
Ct
=
,
final deviate
17.61 scores
'
. . s= exam
Estimation -
17.61% on each side of
eto the mean .
I What is the best way to estimate a population parameter?

OR What are the properties of a good estimator?
I In general, we prefer to use unbiased estimators that have
small variance.
I An unbiased estimator is an estimate whose expected value
equals the population parameter of interest.
I Recall: X
E [X ] = xp(x).
X
I ˆ is
As such, the expected value of an unbiased estimator, ✓,
h i
E ✓ˆ = ✓.
12 / 28
Random Variables: Review
I Consider a random variable X from a normally distributed
population with µ = 15 and = 8, i.e., X ⇠ N .(15, 64).
Normal with 15 }, 64
X is distributed mean variance .
I In this illustration, X is a random variable from what is called

an infinite population.
I We designate a realization of X as x, where a realization is
simply a value arising from the population of interest.
I Therefore, a sample is a collection of realizations of X from
the population of interest.
I Question: If we collect samples of size n from an infinite
population, how many samples are possible?
Also infinite !
13 / 28
The Sample Mean
I From a population of size N, there are K samples of size n.
I Therefore, we can compute K statistics, one for each sample

k = 1, . . . , K .
I Consider the sample mean, x̄k , for k = 1, . . . , K .
I As such, we can consider a sample mean, x̄k , to be a

realization of X̄k .
I Therefore, the sample mean is also a random variable.
I It follows that other statistics, e.g., s and s 2 , are also random

variables.
14 / 28
Sampling Error
I Recall: we are interested in making a statement about a

population quantity using sampled values.
I However, when we collect a sample, we only have information

about a proportion of the population.
I Therefore, there will be an error associated with the estimate

of the population quantity of interest.
I We call this error the sampling error.
I Question: Why would we be interested in describing the

amount of sampling error?
estimate is
understand how our
precise
.
→ It will help as
15 / 28
Sampling Distribution of the Sample Mean

I To describe the sampling error, we need to know the
sampling distribution.
I The sampling distribution is the distribution of the statistic

of interest.
I For example, the distribution of X̄ is the sampling

distribution of the sample mean.
I Recall: for any sample of size n, the distribution of X̄ has

of 5 M
I of distribution is
Mean: µX̄ = µ → Mean
sampling
.
I Standard deviation: = p .
X̄ n
I Question: As the sample size increases, what happens to the

standard deviation of the sample mean, X̄ ?
reflects information
→ 05 decreases ? This that having more
to the truth !
closer
gets us
16 / 28
Normally Distributed Random Variables
I If a random variable X is normally distributed, than X̄ will

also be normally distributed.
I This is because any linear combination of n independently and
identically normal random variables is normally distributed.
I Therefore, if X ⇠ N µ, 2
⇣
than X̄ ⇠ N µ, n .
2
⌘
a. FK
=
%n
I Suppose X ⇠ N (70, 100). Consider the histograms on the
next slide.
P1 The distribution of 10000 realizations of X .
P2 The distribution of x̄ when n = 2 (µ̂X̄ = 69.89, ˆX̄ = 7.16).
*Note: for all figures, µ is represented by the solid black line and µ̂X̄ is represented by the dashed red line.
17 / 28
Normally Distributed Random Variables: Example

0.15
0.15
Relative Frequency
Relative Frequency
0.10
0.10
0.05
0.05
0.00
0.00
40 50 60 70 80 90 100 40 50 60 70 80 90 100
Value Average Value of Two Observations

0.15
0.15
Relative Frequency
Relative Frequency
0.10
0.10
0.05
0.05
0.00
0.00
40 50 60 70 80 90 100 40 50 60 70 80 90 100
Average Value of Five Observations Average Value of Ten Observations
18 / 28
Making a Statement about the Sampling Error
*
I According to an article in the Journal of the American
There is
evidence that
Geriatrics Society, the standard deviation of the lengths of
variable
hospital stays on the intervention ward is 8.3 days.
the ,
I
For all samples of size 80, what is the probability the sampling
' '
4
length
of stay
distributed error made in estimating the population mean length of stay is
is
normally
.
at most 2 days?
± ✓ ◆
probability 2 2
being
within
2 days
{ P µ 2 < X̄ < µ + 2 = P 8.3/p80 < Z < 8.3/p80
Many
of true *
= P ( 2.16 < Z < 2.16) = 0.9692
L
Recall :
I Standard Interpretation: Approximately 96.92% of all

z=×= samples of size 80 will estimate the average length of stay
otn
within 2 days of the population mean.
÷
I Interpretation in terms of the Sampling Error: There is
approximately a 96.92% chance that the sampling error made
y←y¥a÷y
tied
*
in estimating the average length of stay using samples of size
80 will be at most 2 days.
t.q.hn?fi:YaFtYzau
*
19 / 28
=P ( Z < 2.16 ) 0.0154

0.9846
-
- =
=
0.9692
Discussion on Normally Distributed Random Variables
I The preceding examples illustrate why Normality is so

desirable.
I However, these examples also highlight the properties of the
sampling distribution of X̄ when N is infinite.
I In practice, we typically work with populations of a finite size.
I When N is countable, our estimate of the standard deviation
finite population correction
of X̄ becomes
r m ✓ ◆
N n
X̄ = p if 0.05 < n/N < 0.95
N 1 n
I Another complication is that we typically won’t work with

normally distributed populations.
20 / 28
Non-Normally Distributed Populations
I There are many examples of populations that are not normally
distributed.
I For example, consider the following histogram constructed
using data from the Australian Bureau of Statistics.
Histogram of Age at Death of Australian Males, 2012

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
Density
0 20 40 60 80 100
Age at Death of Australian Males, 2012
21 / 28
Non-Normally Distributed Populations: Sampling Distn.
I Regardless of the distribution, we may still be interested in

conducting inferences about some parameter, e.g., µ.
I So, what happens to the sampling distribution of X̄ if we

sample from a non-normal population?
I Consider the histograms on the next slide.

P1 The distribution of 50000 realizations of X (µ = 7.59).
P2 The distribution of X̄ when n = 2 (µ̂X̄ = 7.59, ˆX̄ = 4.63).
*Note: for all figures, µ is represented by the solid black line and µ̂X̄ is represented by the dashed red line.
22 / 28
Non-Normally Distributed Populations: Example
0.00 0.10 0.20 0.30
0.00 0.10 0.20 0.30

Relative Frequency
Relative Frequency
−30 −20 −10 0 10 −30 −20 −10 0 10
Value Value
0.00 0.10 0.20 0.30
0.00 0.10 0.20 0.30

Relative Frequency
Relative Frequency
−30 −20 −10 0 10 −30 −20 −10 0 10
Value Value
23 / 28
The Central Limit Theorem
I As the sample size increases, the sampling distribution of X̄

becomes more and more normal.
I This phenomenon is defined by the central limit theorem

(CLT).
I Formally, the central limit theorem states that

For a relatively large sample size, the variable X̄
is approximately normally distributed, regardless of
the distribution of the variable under consideration.
The approximation becomes better with increasing
sample size.
I In most cases, the rule-of-thumb is to regard n 30 as being

the lower bound on n for the CLT.
24 / 28
Conclusions
I The sample mean, X̄ , is a random variable.
I Therefore, it has a distribution.
I The distribution of X̄ is called the sampling distribution of

X̄ .
I If X is normal, than X̄ is normal.
I If X is not normal, X̄ is approximately normal as long as n is

large enough.
I X̄ is an unbiased estimator of µ.
25 / 28
Highlights
I Definition of Statistics
I Types of Statistics: Descriptive and Inferential
I Population vs. Sample
I Sampling Error and Sampling Distributions (Focus on the

Sample Mean)
I Normally Distributed Random Variables
I Non-Normally Distributed Random Variables
I The Central Limit Theorem
26 / 28
Cumulative Exercise
1. Suppose the population of interest has the shape given in the
plot on the next slide, µ = 0.2, and 2 = 0.04.
a. Find the parameters of the sampling distribution of X̄ if n = 5.
b. What shape will the sampling distribution in a. have? Why?
c. What is the sampling distribution of X̄ if n = 50? Explain.
d. For all samples of size 50, what is the probability the sampling
error made in estimating µ is within 0.05?
oE=I o*=FE
anxious ,
,
⇒ Foes
=
⇒
S = o . 0894
= 0.008
distribution unimodal
sampling
the is
of
b .
The expected shape
skewed
and
right
-
distribution of X will be
approximately normally
C . The sampling standard deviation
}
distributed with mean iex
= 0.2
of = 0.0283
d. P( n .
a. os < I < utoos )
=p(k-o.°5)=<z<(nto?gEI3I )
o . 0283
27 / 28
=
P( -
1.06 < z < 1.77 ) =

P( 7<1.77 ) -
PCZL -1.06 )
= 0.817
Cumulative Exercise Figure

Distribution of X
4
3
Density
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0 1.2
28 / 28

S252 Set1 Jan9

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

S252 Set1 Jan9

Caricato da

Copyright:

Formati disponibili

Stat 252: Applied Statistics II

Terminology and Sampling Distributions

I Welcome to Stat 252: Applied Statistics II!

I The topics covered in this set of lecture notes are:

I All required course materials will be made available to you via

I The course website will provide you with:

I It is your responsibility to check the discussion boards, news

“a body of methods for making wise decisions in the

I Depending on the context, statistics can be defined as:

I We can also think of statistics as being a set of tools used to

Types of Statistical Tools

I Depending on the goal of a particular project or experiment,

I Descriptive statistics are a suite of tools used to organize

I Inferential statistics are ways to form and assess the

I In Stat 252, we will focus on inferential statistics.

Mac Ewan students .

I Question: What are the ideal characteristics of a sample?

best chance for the sample

13 standard donation of ⇒ Vonaue is 02

Describes the unable

I * When we use inferential statistics, we start with a model that -

describes the population of interest.

I In general, we can think of this model as being a framework

I It is of the utmost importance that the selected model is valid!

I Note that we will discuss methods for verifying, or validating,

I Again, we use inferential statistics to assess the reliability of a

Parameters vs. Statistics: Nomenclature Review

eto the mean .

I What is the best way to estimate a population parameter?

I In this illustration, X is a random variable from what is called

The Sample Mean

I From a population of size N, there are K samples of size n.

I Therefore, we can compute K statistics, one for each sample

I Consider the sample mean, x̄k , for k = 1, . . . , K .

I As such, we can consider a sample mean, x̄k , to be a

I Therefore, the sample mean is also a random variable.

I It follows that other statistics, e.g., s and s 2 , are also random

I Recall: we are interested in making a statement about a

I However, when we collect a sample, we only have information

I Therefore, there will be an error associated with the estimate

I We call this error the sampling error.

I Question: Why would we be interested in describing the

Sampling Distribution of the Sample Mean

I The sampling distribution is the distribution of the statistic

I For example, the distribution of X̄ is the sampling

I Recall: for any sample of size n, the distribution of X̄ has

I Question: As the sample size increases, what happens to the

I If a random variable X is normally distributed, than X̄ will

Normally Distributed Random Variables: Example

Value Average Value of Two Observations

Average Value of Five Observations Average Value of Ten Observations

I Standard Interpretation: Approximately 96.92% of all

=P ( Z < 2.16 ) 0.0154

I The preceding examples illustrate why Normality is so

I Another complication is that we typically won’t work with

Histogram of Age at Death of Australian Males, 2012

Age at Death of Australian Males, 2012

Non-Normally Distributed Populations: Sampling Distn.

I Regardless of the distribution, we may still be interested in

I So, what happens to the sampling distribution of X̄ if we

I Consider the histograms on the next slide.

0.00 0.10 0.20 0.30

0.00 0.10 0.20 0.30