Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
5/13/17
Week 1, Part 3:
Sampling Data to
get information.
Gatherinformation from the whole
population or a part of it
When we gather information from the
entire population, we perform a census.
A census requires that we know how to
access every item in the population.
It requires that we are able to list
every item in the population.
Unfortunately, this is most often
very hard to do.
Why do we sample?
Simple
Random
Sample
Probabil
Pseudo
ity Random
Sampli
Samples
ng Non- Convenienc
probabil e,
ity judgment
samples
Suchsamples are characterized by having
certain population items that have a zero
probability of being included in the
sample.
Conveniencesample
Judgment sample
Non-Probability Samples
These sampling methods are
characterized by all items in a
population having a non-zero
chance of being in a sample.
Pseudo-Random Samples
Systematic random samples:
The first observation is chosen at random;
remaining observations are drawn at
intervals determined by the size of the
sample desired.
The process: 1) select a sample size, 2)
calculate N/n to get an interval, 3)
randomly select a starting point and
choose the observation at the starting
point, 4) select the next observation that
is N/n observations away, and 5) continue
until you have selected n observations.
Pseudo-Random Samples
I have a list of 150 Locatio
items that are in n Value
numerical order. I
decided I would like 28.7551
a sample of 10. 92 7
N/n = 150/10 = 15 107 28.9428
interval 28.9616
Randomly selected
item # 92.
122 5
Go to the item 92+15 28.9713
later = 107 & select 137 6
that number.
Here is the complete
2 24.6281
sample 27.0884
17 3
27.2672
A systematic example
32 1
27.4108
Using Systematic Samples
With data that has an inherent order
items moving on an assembly line
numbered invoices
customers at checkout
If the population is in random order, however,
this method results in a simple random sample.
If the population is ordered by magnitude, this
method results in a sample that is more
informative than a simple random sample.
if the population has cyclical variation this
method results in a sample that is less
informative than a simple random sample. It is
not appropriate for situations where there is a
cycle in the population.
Pseudo-Random Samples
Stratified random samples
Use when a population can be divided into
mutually exclusive (non-overlapping) and
collectively exhaustive (including all
population items) subgroups, such as gender
or political preference.
Each sub-group is randomly sampled.
Often the goal is to select a sample
representative of all the strata in a population
but minimize the size of the whole sample.
Often the number sampled from each strata
represents the proportion of that strata in the
population.
Pseudo-Random Samples
Each item in the list Oil Condition Obs Value
used earlier was a bird Not Visibly 27.580
found on the shore after
Oiled 60 37
the Deep Horizon Oil
Spill. Not Visibly 27.412
The oiled-condition of Oiled 49 4
the bird was listed, Not Visibly 27.412
visibly oiled (20%), not Oiled 48 4
visibly oiled (67%) and Not Visibly 28.966
unknown (13%). Oiled 130 79
Sorting the observations Not Visibly 27.594
into these strata and Oiled 69 39
selecting a random Not Visibly 28.967
sample from each Oiled 133 75
stratum, generates a
stratified random
Not Visibly 27.585
sample. Oiled 66 14
28.942
Pseudo-Random Samples
Clusters can be defined Week from
by location in time. spill Value
For example, the 28.964
number of weeks after 20
11
the oil spill a bird was 27.782
found is recorded. 20
72
The items were sorted 28.429
by week found, then a 22
94
random sample of 4 of 28.960
the weeks was 22
64
selected.
22 28.55
In an ideal world, the 28.978
number of items in the 27
73
weeks selected would
27 28.593
give the sample size
28.594
we wanted, however, it 27
78
often doesnt. A
A cluster example 33
second sampling
method might be used
28.956
48
to reduce the sample 28.966
33
size after using the 34