Sei sulla pagina 1di 32

Biomedical Presentation

Name:
Teach Professor:

Outline
Symmetry,

Skewness and Kurtosis


a. Symmetry and Skewness
b. Kurtosis
.Resampling
a. One sample case
b. Two independent samples
c. Two matched samples

Skewness and Kurtosis


We consider a random variablexand a
data setS ={x1, x2, , xn}of sizenwhic
h contains possible values ofx.
Looking atSas representing a distribut
ion, theskewnessofSis a measure of
symmetry andkurtosismeasure ofpe
akedness of the data inS.

Symmetry and Skewness


We useskewnessas a measure of sym
metry. If the skewness ofS= 0 then the
distribution represented bySis perfectl
y symmetric.
If the skewness is negative, then the di
stribution is skewed to the left, Contrar
y to the positive.

Consistent with Excel we calculate the s


kewness ofSas follows:
n xi x%

n 1 n 2 s 3
n
i 1

wherex% is the mean andsis the standa


rd deviation ofS.

Observation: When a distribution is s


ymmetric, the mean = median, when th
e distribution is positively skewed the
mean > median and when the distributi
on is negatively skewed the mean < me
dian.

Example: SupposeS={2, 5, -1, 3, 4, 5,


0, 2}. The skewness ofS= -0.43, i.e. SKE
W(R) =-0.43 where R is a range in an Ex
cel worksheet containing the data in S.
Since this value is negative, the curve r
epresenting the distribution is skewed t
o the left (i.e. the fatter part of the curv
e is on the right). Also SKEW.P(R) = -0.3
4.

Kurtosis
We usekurtosisas a measure of peakedne
ss (or flatness).Positive kurtosis indicates a
relatively peaked distribution.
Consistent with Excel we calculate the kurto
sis ofSas follows:
n n 1 in1 xi x
3 n 1

4
n 1 n 2 n 3 s n 2 n 3
4

where is the mean andsis the standard d


eviation ofS.

Example: SupposeS={2, 5, -1, 3, 4, 5,


0, 2}. The kurtosis ofS= -0.94, i.e. KURT
(R) =-0.94 where R is a range in an Exc
el worksheet containing the data in S. S
ince this value is negative, the curve re
presenting the distribution is relatively
flat.

Resample
Resampling proceduresare based on
the assumption that the underlying po
pulation distribution is the same as a gi
ven sample.
Resampling is useful when the populati
on distribution is unknown or other tec
hniques are not available.

We consider two types of resampling p


rocedures:bootstrapping, where sam
pling is done with replacement, andpe
rmutation (also known asrandomizat
ion tests), where all possible permutat
ions of the data are made.

One sample case


Example 1.
Calculate a 95% confidence interval aro
und the median for the memory loss pr
ogram described inExample 1of theSi
gn Test, but with the data given in colu
mns A and B of Figure 1.

Figure. 1 Resampling One sample case

We treat the sample as the population


and draw 2,000 samples of size 20 (the
same size as the original sample) with r
eplacement.

Referring to Figure 1, each element in e


ach sample is selected using the followi
ng function:
=INDEX(B4:B23,RANDBETWEEN(1,20))

We now take the median of each of the


2,000 samples (only the first 21 sample
s are shown in Figure 1) and plot their
distribution in a histogram. The results
are displayed in Figure 2.

Figure. 2 Analysis for Example 1

The value at the 2.5% percentile is 3 an


d the value at the 97.5% percentile is 1
3. Thus we can consider the confidence
interval as [3, 13], which contains the s
ample median of 9.5.

Two independent samples


We now consider the case where we ha
ve two independent samples. When the
data is normally distributed, we would
use the t-test.
We can also use the
Wilcoxon Rank SumorMann-Whitneyn
on-parametric test. We now show how
to address such problems using the pe
rmutation version of resampling.

Example 2.
Using resampling determine whether t
here is a significant difference between
the median life expectancy of smokers
and non-smokers using the data descri
bed in Figure 3

Figure. 3 Data for Example 2

Note that the median score of the nonsmokers is 76.5 while the median score
of smokers is 70.5, a difference of 6.
The null hypothesis is that there is no d
ifference between the two groups, i.e.
H0: the median score for the populatio
n of smokers and non-smokers are the
same.

Based on the null hypothesis, we can a


ssume that we have a single population
of 78. To test the hypothesis we take 2
,000 random samples of size 78 from t
his population without replacement an
d assume that for each sample the first
40 scores come from the non-smokers
and the remaining 38 come from the s
mokers.

We use formulas of form


=INDEX(J4:CI4,1,RANK(DC6,DC6:GB6))

where the range J4:CI4 contains all 78 d


ata elements in the population and D
C6:GB6 contains 78 random numbers,
generated using RAND().
For each of the 2,000 samples we calcu
late the median of the non-smokers an
d smokers and record the difference.

Figure. 4 Resampling for two independent samples


Now we need to check whether the mean difference of the original
sample is in the extreme 2.5% of the above data (2-tail test). From
Figure 14.20, we see that 1.60% of the samples have a median
difference of -6 or less and 2.35% of the samples have a median
difference of 6 or more, for a total of 3.95%.

This means that the probability of getti


ng a sample in either tail based on the
null hypothesis is .0395 < .05 =, and
so we reject the null hypothesis and co
nclude with 95% confidence that there
is a significant difference between the l
ife expectancy of smokers and non-sm
okers.

Two matched samples


We now consider the case where we ha
ve two matched samples.
we would use thePaired Sample t-test.
Even for non-normal data we can use t
heWilcoxon Signed-Ranksnon-parame
tric test.

Example 3:Using resampling determine


whether there is a significant difference b
etween the median life expectancy of smo
kers and non-smokers using the data desc
ribed in Figure 3
The null hypothesis is there is no differenc
e between the right and left eyes ability to
recognize objects, i.e. the median differen
ce is zero.

If the null hypothesis is true then each


of the 15 scores for the right eye is just
as likely to be larger as smaller than th
e scores for the left eye.
This is a form of sampling without repl
acement. The absolute values of the el
ements in each sample are as in the po
pulation, only the signs are variable.

Figure. 5 Resampling for paired samples

Figure 5 shows the first 16 samples (ou


t of 2,000).

and similarly for the other 1,999 sampl


es.For each sample we calculate the m
edian and create a histogram of the 2,0
00 median values as in Figure 6.

Figure. 6 Analysis for Example 3

The median of the original sample (i.e.


the resampling population) is 3. From
Figure 6 we see that 10.00% all the sam
ples have a median -3 and 12.30% h
ave a median 3. Since 10.00 + 12.30
% = 22.30% 5% =, we cannot reject
the null hypothesis, and so conclude th
ere is no significant difference between
the right and left eye of the population.

Potrebbero piacerti anche