Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Learning
goals
Importance
of
data
How
do
we
obtain
data?
Randomiza=on
Sampling
Data
Recorded,
relevant
informa=on
Not
necessarily
just
numbers
Any
relevant
facts,
gures,
observa=ons
or
descrip=ons
of
things
But
why
do
we
need
it?
Why
data?
Bases
/Raw
material
for
analysis
Help
provide
answers
to
ques=ons
we
cannot
answer
right
away
Empirical
support
to
(against)
theories
How
do
we
answer
this
ques=on:
Do
individuals
with
larger
waist
circumferences
have
larger
intra- abdominal
adipose
=ssue
(AT)
area?
Good
idea
to
gather
relevant
informa=on
Study
a
few
individuals
,
not
possible
to
study
all
individuals
Data
:
Age
Waist
circumference
AT
area
Sources:
Primary
Original
sources
from
which
researchers
directly
collect
First
hand
informa=on
Collected
through:
Observa=on
(
AT-WC)
Interviewing
(
Stores-
Customers)
Complete
enumera=on
Sampling
Designed
experiment
Classied as:
Complete
enumera=on
One
way
to
gather
useful
informa=on
Collect
data
from
each
and
every
unit
in
the
popula=on
Applied
where
informa=on
on
all
units
under
study
is
needed:
Popula=on
Census
(
popula=on:
all
individuals
in
the
country)
Prepara=on
of
voters
list
(popula=on:
all
individuals
with
voter
Ids)
Selec=on
from
many
applicants
for
a
job
etc
Social
networking
sites
trace
all
your
ac=vi=es,
online
shopping
etc
Sources: Primary
Idea
is
that
if
the
sample
considered
is
representa+ve
of
the
popula+on
that
it
is
taken
from,
the
results
derived
from
it
can
be
generalized
to
the
popula+on.
Applied
Sta+s+cs
and
Compu+ng
Lab
Sampling:
Example
The
cars
data
Not
possible
to
study
every
used
car
in
US
Sample
of
used
cars
Cars
of
all
makes,
models,
engine
sizes
etc
included
AT-WC
example:
Data
only
on
men
Not
representa=ve
if
interest
is
in
measuring
CVD
risk
for
women
Representa=ve
Samples
Which
samples
are
representa=ve?
Samples
are
biased
if
some
characteris=cs
of
popula=on
are
over/under
represented
Results
from
these
samples
cannot
be
adjusted
Randomiza=on
To
overcome
bias
Shuing
a
deck
of
cards
before
dealing
s=rring
a
pot
of
soup
before
tas=ng
Random
selec=on
is
fair
Random
sampling
means
sampling
based
on
each
popula=on
unit
having
a
par=cular
chance
of
being
selected
in
to
the
sample
In
a
popula=on
with
60%
men
and
40%
women
a
sample
drawn
with
women
having
a
greater
chance
of
selec=on
Applied
Sta+s+cs
and
Compu+ng
Lab
Sampling:
Advantages
Time
and
money
saved
More
accurate/reliable
informa=on
Opera=onal
exibility
Not
a
new
idea!
Applied
in
day
to
day
life:
Taste
only
a
spoonful
of
soup
Only
a
hand
full
of
grains
are
examined
before
buying
the
en=re
sack
Applied
Sta+s+cs
and
Compu+ng
Lab
Sampling:
Disadvantages
Sampling
errors:
Bound
to
occur
Due
to
fact
that
only
a
part
of
popula=on
is
considered
Sampling:
Trivia
More
popularly
used
than
complete
enumera=on
Sampling
method
depends
on
ques=on
at
hand
Source: Primary
To conduct an experiment on eye focus =me Eect of distance of object from eye on focus =me 4 dierent distances and 5 subjects Dierences between individuals ( more on this in future sessions)
Source: Primary
Experimental Design
Disadvantages:
Experimenter
bias
Cannot
design
experiments
for
all
studies
(
cars
example)
Source:
Secondary
Have
already
been
collected
and
compiled
Readily
available
E.g.:
scores
data!
The
school
already
has
a
mark
sheet
for
each
student!
Dierent sources:
Want to study poverty rates but deni=on of the poverty line has changed over the years Data available from many sources Have to be merged E.g.: we have prices from one secondary source in dollars and other data in rupees from another source. Merging them is not easy.
Need to consider exchange rates at dierent =mes, Purchasing power parity dierences
Thank you