0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

34 visualizzazioni60 pagineEPIDEMIOLOGY AND
BIOSTATISTICS
REVIEW, PART I

Dec 01, 2014

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

EPIDEMIOLOGY AND
BIOSTATISTICS
REVIEW, PART I

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

34 visualizzazioni60 pagineEPIDEMIOLOGY AND
BIOSTATISTICS
REVIEW, PART I

© All Rights Reserved

Sei sulla pagina 1di 60

BIOSTATISTICS

REVIEW, PART I

Tommy Byrd MSII

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

Nominal

Ordinal

Interval

Ratio

qualitative categories or groups

Male

Female

Black

White

Suburban

Rural

Class rankings data (1st / 2nd / 3rd)

Answers to these types of questions:

cannot tell by how many percentage points Tommy is

ranked 1st in his class)

interval

temperatures

Anno Domini years (1990,

**But ratios of this kind of data are not meaningful

100C is not twice as hot as 50C because 0C does not

and is based on an absolute zero

Kelvin temperatures

MOST BIOMEDICAL VARIABLES

Weight (grams, pounds)

Time (seconds, days zero is the starting point of measurement)

Age (years)

Blood pressure (mmHg)

Pulse (beats per minute)

With these types of data ratios are valid:

300K is twice as hot as 150K

A pulse rate of 120 beats/min is twice as fast as a pulse rate of 60

beats/min

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

distributed in the bell-shaped normal or

Gaussian distribution

Score

(Blood pressure, cholesterol, etc.)

location of the tail of the curve, not the

location of the hump

Mode

Median

Mean

Score

greatest frequency

2 4 5 7 4 2 3 6 8 9 7 5 4 4 2 4 6 7 7 7

Bimodal

distribution!

2

distribution in half

Odd # total elements: the median is the middle one

Even # total elements: the median is the average of the

values divided by the total # of values

scores

Therefore NOT good for measuring skewed distributions

Therefore the mean is the measure of central tendency that BEST

with its corresponding hash mark

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

& Public Health. N.p.: n.p., n.d. 9. Print.

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

measures of central tendency can have

different variabilities

Variability = the extent to which their scores are clustered

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it

again.

away, on average, that values lay away

from the mean of the population

Remember the last infectious disease quiz?

Lets assume the mean (average) grade was a 70% with a normal

distribution

If the was really HIGH, there was probably a bunch of As and a bunch

So, since

gun hard,

how

we use standard deviation

of Fs we

in addition

to Bs and

Cscan

and Ds

Ifexactly

the washow

reallywe

LOW,

people probably to

goteverybody

a high D or low

C

to tell

didmost

in comparison

else?

Approx. 68% of the distribution falls within 1 standard deviations

Approx. 95% of the distribution falls within 2 standard deviations

Approx. 99.7% of the distribution falls within 3 standard deviations

So, out of a

class of 100,

about how

many people

got an A?

(assume

extra credit

was possible)

A) 9-11

B) 2-3

C) 14-16

D) 4-6

E) 19-21

Therefore,

assuming the

of the test

scores was 10

points, we can

assume the

following:

Grade (%)

deviations the element lies above or

below the mean

A table of z scores

compares the z score to

the Area beyond Z

65

z = 0.5

Grade (%)

85

z = + 1.5

deviations the element lies above or

below the mean

A table of z scores

compares the z score to

the Area beyond Z

6.7% got beyond an 85%

on our startlingly realistic,

made-up test

~7

people

here

specify probability

We know that 6.7% of the class

has a grade above 85%, so the

probability of one randomly

selected person from this

population having a grade above

85% is 6.7%, or 0.067

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

persons score on the test?

But, through some stealthy looking-over-shoulders while

sample of random scores

How close to the actual class average will our sample be?

One sample

representing

one score

the average of a

sample of 4 scores

is ~80%

n = the size of

each sample

0%

70%

100%

is the standard deviation over the square

root of the sample size

SEM = /n

SEM = 10/1 = 10

standard

deviation () of

this test was 10

percentage

points

SEM = 10/4 = 5

0%

70%

100%

same way as standard deviation

But remember that SEM decreases as n

Now we have gathered a sample of 10 random scores

SEM = /n

SEM = 10/10 = 3.2

**Do you remember how much of the population falls within 2 standard

deviations (or SEMs) of the mean?

approximately equal to the sample mean

plus or minus 2 standard errors

Practically, the 95%

confidence interval is

the range in which the

means 95% of

samples would be

expected to fall

In other words,

there is a 95%

chance that the

average of our

random sample

would be in this

range

approximately equal to the sample mean

plus or minus 2 standard errors

Remember, the on our test was 10%, and the mean was

So the standard error (SEM) = /n = 10/10 = 3.2%

So our 95% confidence interval is 70% 2(SEM)

= 70% 2(3.2%) = 70% 6.4%

= 63.6% - 76.4%

A random sample of 10 peoples scores on this test has

a 95% chance of averaging between 63.6% and 76.4%

The width of the confidence interval reflects precision

estimate?

Double the sample size?

We need to quadruple the sample

size!

SEM = /n

can we still calculate SEM?

Pretend we dont have any fancy ExamSoft statistics from

We can calculate the standard deviation of the 10 scores

in our sample (S), and substitute it in for in the SEM

equation to come up with the estimated standard error of

the mean

estimated standard error is to the

Similar to P values !

For USMLE

purposes,

consider

degrees of

freedom

(df) to

equal n-1

So what do

we do with

all this?

errors away from the sample mean

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

1) State the null and alternative hypothesis, H0 and HA

H0 = no difference

HA = there is a difference

2) Select the decision criterion (level of significance)

3) Establish the critical values of t

4) Draw a random sample, find its mean

5) Calculate the standard deviation of the sample (S) and

6) Calculate the value of the test statistic t that

corresponds to the mean of the sample (tcalc)

7) Compare the calculated value of t with the critical

values of t, then accept or reject the null hypothesis

hypotheses

We want to test Julia Silvas claim: Because of Tommy

Step 1 score of our class will be 260

Null hypothesis = The mean score is 260

Alternative hypothesis = The mean score is not 260

save time

Again, our sample size will be 10 randomly selected students

Random sampling error (this is normal) will always cause

We have to decide what an acceptable level of this chance

deviation is

If the probability of obtaining the sample mean is greater than 0.05,

H0 is accepted:

The class indeed scored an average of 260

rejected:

= 0.05

Sample size

(n) = 10

students, so

df = 9

So tcrit = 2.262

calculate the mean of the sample

284 234 268 254 246 264 266 265 245 244

Average = 257

estimated standard error of the sample

In our sample, standard deviation (S) = 15

(You dont have to know the equation for standard deviation on the

USMLE)

= 15 / 10

= 4.747

Remember, similar to a z-value, the t-score represents the

away from the hypothesized mean

Our average score was 257, which is 3 points away from

Therefore, our t-value is the # of estimated standard errors

contained in 3 points

Our estimated standard error from the last slide is 4.747

This gives a t-score (tcalc) of:

3

/ 4.747 = 0.632

concerned that Julia Silva is a psychic

Our calculated t-value (same thing as t-score) is 0.632

Our critical t-value is 2.262

Clearly our calculated t lies between +2.2 and 2.2,

therefore:

H0 is accepted and reported as follows: The hypothesis that the

t = 0.632, df = 9, p 0.5

+2.262

2.262

t=0

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

wrong hypothesis

Type I Error

False-positive error

You accept the alternative

hypothesis when there is no

difference

Also known as alpha ()

referring to the we just

talked about

The p-value is the

probability of making a

type I error

Type II Error

False-negative error

You fail to reject the null

hypothesis when there

actually is a difference

Also known as error

is the probability of

type II () error

The power of a statistical test = 1

The power represents the probability of rejecting the null

error); we want this to happen!

Conventionally, a study is required to have a power of 0.8

(or a of 0.2) to be acceptable

Power increases as increases trade off

High-yield point: Increasing the sample size is the

most practical and important way of increasing the

power of a statistical test

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

study designs Cohort studies

Group without disease are selected and followed for an

extended period

Some members may have already been exposed to risk

factor

Exception: Inception Cohorts follow those recently

diagnosed to track progression

Can estimate incidence

Not good for rare diseases

Historical cohort study = retrospective cohort study

study designs Case-control studies

All are retrospective

Compare people who do have the disease (the cases) w/

Start w/ outcome then LOOK BACK into the past for

possible independent variables that may have caused the

disease

Cheap, good for rare or that take a long time to develop

study designs Case-series studies

Essentially a series of case reports that may link disease

group w/o the disease compared to)

Eg. Kaposiss sarcoma

study designs Prevalence survey

Survey (snap shot) of a whole population, also asks

Prevalence ratio = the prevalence of a disease in people

who have and have not been exposed to a risk factor

Likely to overrepresent chronic diseases and

underrepresent acute diseases

study designs Ecological studies

Check non-individual info (eg. study of the rate of

ownership)

May be experimental:

Community intervention trials

Experimental group consists of an entire community, while the control

of intervention

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

random) errors when one outcome is

systematically favored over another

e

What is th

difference

between

bias

n

o

i

t

c

e

l

e

s

ling

and samp

bias?

(Magazine

subscribers in

great

depression)

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file

again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and

then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

(Referral bias)

random) errors when one outcome is

systematically favored over another

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then

open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then

open the file again. If the red x still appears, you may have to delete the image and then insert it again.

group and blacks in

control group for treating a

racially selective disease)

Race = confounding

variable

random) errors when one outcome is

systematically favored over another

open the file again. If the red x still appears, you may have to delete the image and then insert it again.

file again. If the red x still appears, you may have to delete the image and then insert it again.

or the image may have been corrupted. Restart your computer, and then open the file again. If the

red x still appears, you may have to delete the image and then insert it again.

http://www.usmle.org/pdfs/step-1/2013midMay2014_Step1.pdf

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.