Sei sulla pagina 1di 16

E370

5/13/17
Week 1, Part 3:
Sampling Data to
get information.
Gatherinformation from the whole
population or a part of it
When we gather information from the
entire population, we perform a census.
A census requires that we know how to
access every item in the population.
It requires that we are able to list
every item in the population.
Unfortunately, this is most often
very hard to do.

How do we get data?


We use samples because
dont have access to the whole population
because it is an ongoing, recurring process.
New items are being generated constantly.
the cost of contacting the entire population
would be too great in terms of time and money.
the items being selected will be destroyed in
the process of getting their information.
(Think of testing length of light bulb life.)
a census may require repetitive record keeping
tasks making it prone to errors resulting from
boredom or fatigue. Samples can be more
accurate than a census in such circumstances.

Why do we sample?
Simple
Random
Sample
Probabil
Pseudo
ity Random
Sampli
Samples
ng Non- Convenienc
probabil e,
ity judgment
samples
Suchsamples are characterized by having
certain population items that have a zero
probability of being included in the
sample.
Conveniencesample
Judgment sample

Such samples introduce bias into the


data and are not appropriate as a base for
statements about the population at large.
Bias means that as the sample gets
larger, it does not start to resemble the
population more and more.

Non-Probability Samples
These sampling methods are
characterized by all items in a
population having a non-zero
chance of being in a sample.

A very special probability sampling


method results in a simple
random sample. All items in the
population from which the sample
was drawn have the SAME non-
zero chance of being selected.
Probability Samples
Characteristics
Unbiased as the sample gets larger,
the sample gets to resemble the
population more and more closely.
Independent selecting any item from
the population has no effect on the
selection of any other item in the
sample.
The statistics gold standard. All
inferential techniques learned depend
on the sample being a simple random
sample.

Simple Random Sample


When simple random samples are too
complicated, too costly or a particular
situation calls for a special method.
The systematic random sample
The stratified random sample
The random cluster sample

These methods are not considered


as pure as the simple random
sample, but they have many good
qualities.

Pseudo-Random Samples
Systematic random samples:
The first observation is chosen at random;
remaining observations are drawn at
intervals determined by the size of the
sample desired.
The process: 1) select a sample size, 2)
calculate N/n to get an interval, 3)
randomly select a starting point and
choose the observation at the starting
point, 4) select the next observation that
is N/n observations away, and 5) continue
until you have selected n observations.

Pseudo-Random Samples
I have a list of 150 Locatio
items that are in n Value
numerical order. I
decided I would like 28.7551
a sample of 10. 92 7
N/n = 150/10 = 15 107 28.9428
interval 28.9616
Randomly selected
item # 92.
122 5
Go to the item 92+15 28.9713
later = 107 & select 137 6
that number.
Here is the complete
2 24.6281
sample 27.0884
17 3
27.2672
A systematic example
32 1
27.4108
Using Systematic Samples
With data that has an inherent order
items moving on an assembly line
numbered invoices
customers at checkout
If the population is in random order, however,
this method results in a simple random sample.
If the population is ordered by magnitude, this
method results in a sample that is more
informative than a simple random sample.
if the population has cyclical variation this
method results in a sample that is less
informative than a simple random sample. It is
not appropriate for situations where there is a
cycle in the population.

Pseudo-Random Samples
Stratified random samples
Use when a population can be divided into
mutually exclusive (non-overlapping) and
collectively exhaustive (including all
population items) subgroups, such as gender
or political preference.
Each sub-group is randomly sampled.
Often the goal is to select a sample
representative of all the strata in a population
but minimize the size of the whole sample.
Often the number sampled from each strata
represents the proportion of that strata in the
population.

Pseudo-Random Samples
Each item in the list Oil Condition Obs Value
used earlier was a bird Not Visibly 27.580
found on the shore after
Oiled 60 37
the Deep Horizon Oil
Spill. Not Visibly 27.412
The oiled-condition of Oiled 49 4
the bird was listed, Not Visibly 27.412
visibly oiled (20%), not Oiled 48 4
visibly oiled (67%) and Not Visibly 28.966
unknown (13%). Oiled 130 79
Sorting the observations Not Visibly 27.594
into these strata and Oiled 69 39
selecting a random Not Visibly 28.967
sample from each Oiled 133 75
stratum, generates a
stratified random
Not Visibly 27.585
sample. Oiled 66 14
28.942

A stratified example Unknown 106 8


27.803
Visibly Oiled 71 05
27.978
Cluster random samples
Cluster sampling is used when a list of
all the individuals in a population
doesnt exist, but a list of the location
of groups of individuals does exist.
The population is divided into units,
often geographical.
A group of units is selected randomly.
All items of interest in those selected
units are measured or surveyed.
Multi-stage cluster sampling involves
several iterations of this process

Pseudo-Random Samples
Clusters can be defined Week from
by location in time. spill Value
For example, the 28.964
number of weeks after 20
11
the oil spill a bird was 27.782
found is recorded. 20
72
The items were sorted 28.429
by week found, then a 22
94
random sample of 4 of 28.960
the weeks was 22
64
selected.
22 28.55
In an ideal world, the 28.978
number of items in the 27
73
weeks selected would
27 28.593
give the sample size
28.594
we wanted, however, it 27
78
often doesnt. A
A cluster example 33
second sampling
method might be used
28.956
48
to reduce the sample 28.966
33
size after using the 34

Potrebbero piacerti anche