Sei sulla pagina 1di 14

Stat 252: Applied Statistics II

Terminology and Sampling Distributions

Brian Franczak

MacEwan University

Winter 2018

Introduction

I Welcome to Stat 252: Applied Statistics II!

I The topics covered in this set of lecture notes are:


1. what to expect from Stat 252
2. a review of the terminology introduced in Stat 151
3. the definition of an unbiased estimator
4. a review of random variables
5. a review of sampling distributions

2 / 28
The Course Website

I All required course materials will be made available to you via


Blackboard, which will act as our course website.

I The course website will provide you with:


I the corresponding course outline.
I all needed materials (e.g., lecture notes and practice problems).
I assignments (problem sets and solutions).
I discussion boards.
I your grades.

I It is your responsibility to check the discussion boards, news


feed, and consult the reminders.

3 / 28

Statistics

“a body of methods for making wise decisions in the


face of uncertainty.” - W.A. Wallis

4 / 28
Statistics: Formal Definitions

I Depending on the context, statistics can be defined as:


1. facts or data, either numerical or nonnumerical, organized and
summarized so as to provide useful and accessible information
about a particular subject, or
2. the science of organizing and summarizing numerical or
nonnumerical information.

I We can also think of statistics as being a set of tools used to


determine whether the results we see are likely due to some
underlying phenomena, instead of random chance.

5 / 28

Types of Statistical Tools

I Depending on the goal of a particular project or experiment,


statistical tools can be either descriptive or inferential.

I Descriptive statistics are a suite of tools used to organize


and summarize information. e. standard deviation
g.
mean .

I Inferential statistics are ways to form and assess the


reliability (or validity) of conclusions made about a
population using a sample. g. e. confidence interval or hypothesis
test .

I In Stat 252, we will focus on inferential statistics.

6 / 28
Population vs. Sample
I A population is the collection of all elements, or units, (e.g.,
humans, animals, etc...) under consideration.
I A sample is a subset, or group of units, taken from the
population under consideration.
I Consider the following simple illustration:
-7 Population of interest .

#
of all 1
e.
g. heights year

Mac Ewan students .

\ selected
>
Sample , randomly
of students
group
.

I Question: What are the ideal characteristics of a sample?


selected
→ Desire a
randomly sample .

best chance for the sample


→ This the
gives
to be representative of population .

7 / 28

Assumptions
* Model :
Most often a normal distribution with meanie .

13 standard donation of ⇒ Vonaue is 02

-1M¥
,

Describes the unable


⇒ Implies : → .

I * When we use inferential statistics, we start with a model that -

describes the population of interest.

I In general, we can think of this model as being a framework


of assumptions about a population.

I It is of the utmost importance that the selected model is valid!


If we use an invalid model, the results will be meaningless...

I Note that we will discuss methods for verifying, or validating,


assumptions in the upcoming lectures.

8 / 28
Parameters vs. Statistics

I Again, we use inferential statistics to assess the reliability of a


conclusion made about a population.
I We can think of this statement in another way.
I First, note that a statistical model will usually depend on a set
of parameters.
I Where a parameter is essentially a value describing a
characteristic of a population.
I When we collect a sample from a population, we can use the
sample values to calculate statistics.
I Formally, a statistic is a descriptive value that gives precise
information about the sample and estimates a population
parameter.

9 / 28

Parameters vs. Statistics: Nomenclature Review


I In statistics, we use di↵erent symbols to di↵erentiate between
a statistic and a parameter (see examples below).

Sample "
Population
xbar
' ' ' ' "
mu
→ Pn , PN
i=1 xi i=1 xi
Mean x̄0 = n 0=
µ N
" "
'
"
q Sigma
'
s
→ q Pn → PN
(xi x̄)2 (xi µ)2
Standard Deviation s0 = i=1
n 1 0= i=1
N

Pn PN
(xi x̄)2 (xi µ)2
Variance s2 = i=1
n 1
2 = i=1
N

Pn Pn
Proportion i=1 xi i=1 xi
p̂ = n p= N

formula -7
each "

is an
' '
estimator .

10 / 28
Parameters vs. Statistics: Exercise
1. A randomly selected sample of final exam scores in a large
introductory statistics course is as follows:

88 67 64 76 86 85 82 39 75 34
90 63 89 90 84 81 96 100 70 96
a. Identify the population of interest.
b. Estimate the center of the population of interest, interpret
your estimate(s).
c. Estimate the population standard deviation, interpret your
estimate.
all students this uiro slats car
for in
Final
.

a. exanm scores 20

Zxiln
n

[ 88+67+64
*
Verify ;E×i/n 70+963120
=
I = =

b. Med ( × ) =
83 , i= , . . .
+
-
= 77.5
estimator W
score of the 20 observed students estimate .

The average
exam

is 77.5% .

If =[
2
88-77.532-+167 -77.512+(64-77.5)
<
=P s2= ( ⇒ % (
'

x -

,
'
.

s
.

+ "
it (96-77.55)/(20-1) 11 / 28

if population is behaved
310.112 For these 20 students the

Ct
=
,

final deviate
17.61 scores
'
. . s= exam

Estimation -
17.61% on each side of

eto the mean .

I What is the best way to estimate a population parameter?


OR What are the properties of a good estimator?
I In general, we prefer to use unbiased estimators that have
small variance.
I An unbiased estimator is an estimate whose expected value
equals the population parameter of interest.

I Recall: X
E [X ] = xp(x).
X

I ˆ is
As such, the expected value of an unbiased estimator, ✓,
h i
E ✓ˆ = ✓.

12 / 28
Random Variables: Review
I Consider a random variable X from a normally distributed
population with µ = 15 and = 8, i.e., X ⇠ N .(15, 64).
Normal with 15 }, 64
X is distributed mean variance .

I In this illustration, X is a random variable from what is called


an infinite population.
I We designate a realization of X as x, where a realization is
simply a value arising from the population of interest.
I Therefore, a sample is a collection of realizations of X from
the population of interest.
I Question: If we collect samples of size n from an infinite
population, how many samples are possible?
Also infinite !

13 / 28

The Sample Mean

I From a population of size N, there are K samples of size n.

I Therefore, we can compute K statistics, one for each sample


k = 1, . . . , K .

I Consider the sample mean, x̄k , for k = 1, . . . , K .

I As such, we can consider a sample mean, x̄k , to be a


realization of X̄k .

I Therefore, the sample mean is also a random variable.

I It follows that other statistics, e.g., s and s 2 , are also random


variables.

14 / 28
Sampling Error

I Recall: we are interested in making a statement about a


population quantity using sampled values.

I However, when we collect a sample, we only have information


about a proportion of the population.

I Therefore, there will be an error associated with the estimate


of the population quantity of interest.

I We call this error the sampling error.

I Question: Why would we be interested in describing the


amount of sampling error?
estimate is
understand how our
precise
.

→ It will help as

15 / 28

Sampling Distribution of the Sample Mean


I To describe the sampling error, we need to know the
sampling distribution.

I The sampling distribution is the distribution of the statistic


of interest.

I For example, the distribution of X̄ is the sampling


distribution of the sample mean.

I Recall: for any sample of size n, the distribution of X̄ has


of 5 M
I of distribution is
Mean: µX̄ = µ → Mean
sampling
.

I Standard deviation: = p .
X̄ n

I Question: As the sample size increases, what happens to the


standard deviation of the sample mean, X̄ ?
reflects information
→ 05 decreases ? This that having more

to the truth !
closer
gets us

16 / 28
Normally Distributed Random Variables

I If a random variable X is normally distributed, than X̄ will


also be normally distributed.
I This is because any linear combination of n independently and
identically normal random variables is normally distributed.

I Therefore, if X ⇠ N µ, 2

than X̄ ⇠ N µ, n .
2

a. FK
=
%n
I Suppose X ⇠ N (70, 100). Consider the histograms on the
next slide.
P1 The distribution of 10000 realizations of X .
P2 The distribution of x̄ when n = 2 (µ̂X̄ = 69.89, ˆX̄ = 7.16).
P3 The distribution of x̄ when n = 5 (µ̂X̄ = 69.90, ˆX̄ = 4.47).
P4 The distribution of x̄ when n = 10 (µ̂X̄ = 69.91, ˆX̄ = 3.18).
*Note: for all figures, µ is represented by the solid black line and µ̂X̄ is represented by the dashed red line.

17 / 28

Normally Distributed Random Variables: Example


0.15

0.15
Relative Frequency

Relative Frequency
0.10

0.10
0.05

0.05
0.00

0.00

40 50 60 70 80 90 100 40 50 60 70 80 90 100

Value Average Value of Two Observations


0.15

0.15
Relative Frequency

Relative Frequency
0.10

0.10
0.05

0.05
0.00

0.00

40 50 60 70 80 90 100 40 50 60 70 80 90 100

Average Value of Five Observations Average Value of Ten Observations

18 / 28
Making a Statement about the Sampling Error
*
I According to an article in the Journal of the American
There is

evidence that
Geriatrics Society, the standard deviation of the lengths of
variable
hospital stays on the intervention ward is 8.3 days.
the ,

I
For all samples of size 80, what is the probability the sampling
' '
4
length
of stay
distributed error made in estimating the population mean length of stay is
is
normally
.

at most 2 days?
± ✓ ◆
probability 2 2
being
within
2 days
{ P µ 2 < X̄ < µ + 2 = P 8.3/p80 < Z < 8.3/p80
Many
of true *
= P ( 2.16 < Z < 2.16) = 0.9692
L
Recall :

I Standard Interpretation: Approximately 96.92% of all


z=×= samples of size 80 will estimate the average length of stay
otn
within 2 days of the population mean.

÷
I Interpretation in terms of the Sampling Error: There is
approximately a 96.92% chance that the sampling error made
y←y¥a÷y
tied
*
in estimating the average length of stay using samples of size
80 will be at most 2 days.

t.q.hn?fi:YaFtYzau
*
19 / 28

=P ( Z < 2.16 ) 0.0154


0.9846
-

- =

=
0.9692
Discussion on Normally Distributed Random Variables

I The preceding examples illustrate why Normality is so


desirable.
I However, these examples also highlight the properties of the
sampling distribution of X̄ when N is infinite.
I In practice, we typically work with populations of a finite size.
I When N is countable, our estimate of the standard deviation
finite population correction
of X̄ becomes
r m ✓ ◆
N n
X̄ = p if 0.05 < n/N < 0.95
N 1 n

I Another complication is that we typically won’t work with


normally distributed populations.

20 / 28
Non-Normally Distributed Populations
I There are many examples of populations that are not normally
distributed.
I For example, consider the following histogram constructed
using data from the Australian Bureau of Statistics.

Histogram of Age at Death of Australian Males, 2012


0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
Density

0 20 40 60 80 100

Age at Death of Australian Males, 2012

21 / 28

Non-Normally Distributed Populations: Sampling Distn.

I Regardless of the distribution, we may still be interested in


conducting inferences about some parameter, e.g., µ.

I So, what happens to the sampling distribution of X̄ if we


sample from a non-normal population?

I Consider the histograms on the next slide.


P1 The distribution of 50000 realizations of X (µ = 7.59).
P2 The distribution of X̄ when n = 2 (µ̂X̄ = 7.59, ˆX̄ = 4.63).
P3 The distribution of X̄ when n = 5 (µ̂X̄ = 7.55, ˆX̄ = 2.06).
P4 The distribution of X̄ when n = 10 (µ̂X̄ = 7.54, ˆX̄ = 1.19).
*Note: for all figures, µ is represented by the solid black line and µ̂X̄ is represented by the dashed red line.

22 / 28
Non-Normally Distributed Populations: Example

0.00 0.10 0.20 0.30

0.00 0.10 0.20 0.30


Relative Frequency

Relative Frequency
−30 −20 −10 0 10 −30 −20 −10 0 10

Value Value
0.00 0.10 0.20 0.30

0.00 0.10 0.20 0.30


Relative Frequency

Relative Frequency

−30 −20 −10 0 10 −30 −20 −10 0 10

Value Value

23 / 28

The Central Limit Theorem

I As the sample size increases, the sampling distribution of X̄


becomes more and more normal.

I This phenomenon is defined by the central limit theorem


(CLT).

I Formally, the central limit theorem states that


For a relatively large sample size, the variable X̄
is approximately normally distributed, regardless of
the distribution of the variable under consideration.
The approximation becomes better with increasing
sample size.

I In most cases, the rule-of-thumb is to regard n 30 as being


the lower bound on n for the CLT.

24 / 28
Conclusions

I The sample mean, X̄ , is a random variable.

I Therefore, it has a distribution.

I The distribution of X̄ is called the sampling distribution of


X̄ .

I If X is normal, than X̄ is normal.

I If X is not normal, X̄ is approximately normal as long as n is


large enough.

I X̄ is an unbiased estimator of µ.

25 / 28

Highlights

I Definition of Statistics

I Types of Statistics: Descriptive and Inferential

I Population vs. Sample

I Sampling Error and Sampling Distributions (Focus on the


Sample Mean)

I Normally Distributed Random Variables

I Non-Normally Distributed Random Variables

I The Central Limit Theorem

26 / 28
Cumulative Exercise
1. Suppose the population of interest has the shape given in the
plot on the next slide, µ = 0.2, and 2 = 0.04.
a. Find the parameters of the sampling distribution of X̄ if n = 5.
b. What shape will the sampling distribution in a. have? Why?
c. What is the sampling distribution of X̄ if n = 50? Explain.
d. For all samples of size 50, what is the probability the sampling
error made in estimating µ is within 0.05?
oE=I o*=FE
anxious ,
,

⇒ Foes
=


S = o . 0894
= 0.008
distribution unimodal
sampling
the is
of
b .
The expected shape
skewed
and
right
-

distribution of X will be
approximately normally
C . The sampling standard deviation
}
distributed with mean iex
= 0.2

of = 0.0283

d. P( n .
a. os < I < utoos )

=p(k-o.°5)=<z<(nto?gEI3I )
o . 0283
27 / 28

=
P( -

1.06 < z < 1.77 ) =


P( 7<1.77 ) -
PCZL -1.06 )
= 0.817

Cumulative Exercise Figure


Distribution of X
4
3
Density

2
1
0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

28 / 28

Potrebbero piacerti anche