Sei sulla pagina 1di 9

USING R FOR NONPARAMETRIC ANALYSIS - PART ONE

Binomial Probabilities
>pbinom(b,n,p)

will yield the value for P( B b ) when B is a binomial with n trials and P( S ) = p

>pbinom(b-1,n,p,lower.tail=F) or >1 pbinom(b-1,n,p) will yield P( B b )


>pbinom(b,n,p) - pbinom(b-1,n,p) will yield P( B = b )
Example: Hypothesis Testing Example 1, For a binomial with n = 20 testing H0: p =.5 versus H1: p > .5
Find for RR: B 14:

>1 pbinom(13,20,.5) or >pbinom(13,20,.5,lower.tail=F)


[1] 0.05765915
( this was .058 from table )

Find the P-value for TS: B = 12:

>1 pbinom(11,20,.5) or >pbinom(11,20,.5,lower.tail=F)


[1] 0.2517223
( this was .252 from table )

Find power of test when p = .8:

>1 pbinom(13,20,.8) or >pbinom(13,20,.8,lower.tail=F)


[1] 0.9133075
( this was .913 from table )

Normal Probabilities
>qnorm(p,,) will yield the value y0 such that P( Y < y0 ) = p when Y is normal, mean = , std.dev. =
>pnorm(y, ,) will yield the value of P( Y < y ) when Y is normal, mean = , std.dev. =
Note: if you leave out the values for and , R will assume you want to use the standard normal Z
Example: Hypothesis Testing Example 2, Testing H0: = 5 versus H1: < 5
Find the RR for the large sample test at = .05

Find the power when = 4:

s = 3.1

qnorm(.05) or qnorm(.05,0,1)
[1] -1.644854 ( we used Z < - 1.645 )

Find the RR in terms of x : qnorm(.05,5,0.31)


Find the P-value when x = 4.4:

n = 100

[1] 4.490095

( we used x < 4.49 )

pnorm(4.4,5,.31) or pnorm(-1.94)
[1] 0.02646547
[1] 0.02618984
pnorm(4.49,4,.31)
[1] 0.9430204

or

( we got .0262 )

pnorm(1.58)
[1] 0.9429466 ( we got .9429 )

Binomial Test for p


>binom.test(b,n,p0,g or l or t) will test H0: p = p0 versus H1: p > or < or p0 )
> qbinom(prob,n,p) will return the smallest value of b such that P( B b ) > prob (closest w/o going under)
Example: Forty percent of the assembly line workers at a large corporation are from minority groups. A
committee of 15 assembly line workers is to be selected at random to look into job relate complaints. The
committee that is chosen has 3 minority persons on it. Using = .05, test to see if random selection of
the committee can be doubted.

Find RR for each of the three possible alternatives using = .05


>qbinom(.05,15,.4)
>qbinom(.95,15,.4)
>qbinom(.025,15,.4)
>pbinom(1,15,.4)

[1] 3
means we should use B 2 >pbinom(2,15,.4) [1] 0.027114 (=.027)
[1] 9 means we should use B 10 >1-pbinom(9,15,.4) [1] 0.0338333 (=.034)
[1] 2 and >qbinom(.975,15,.4)
[1] 10 means use B 1 or 11
[1] 0.005172035 >1pbinom(10,15,.4) [1] 0.009347661 ( =.005+.009 = .014)

Obtain the P-values for each of the three possible alternatives

>binom.test(B,b,p=p0,g or l or t)

> binom.test(3,15,p=.4,"g")
Exact binomial test
data: 3 and 15 number of successes = 3, number of trials = 15, p-value = 0.9729 (.973)
alternative hypothesis: true probability of success is greater than 0.4
95 percent confidence interval:
0.05684687 1.00000000
sample estimates:
probability of success
0.2
> binom.test(3,15,p=.4,"l")
Exact binomial test
data: 3 and 15 number of successes = 3, number of trials = 15, p-value = 0.0905 (.091)
alternative hypothesis: true probability of success is less than 0.4
95 percent confidence interval:
0.0000000 0.4397844
sample estimates:
probability of success
0.2
>binom.test(3,15,p=.4,"t")
Exact binomial test
data: 3 and 15 number of successes = 3, number of trials = 15, p-value = 0.1855 (.182)
alternative hypothesis: true probability of success is not equal to 0.4
95 percent confidence interval
0.04331201 0.48089113
sample estimates:
probability of success
0.2
Example: A random sample of 200 registered voters yields 95 that say they would like to see the Health
Care Bill repealed. Using = .01, test to see if the true proportion of registered voters who would like to see
the Health Care Bill repealed is greater than, less than or different from .50.
Obtain the P-values for each of the three possible alternatives
> binom.test(95,200,.5,"g")
Exact binomial test
data: 95 and 200
number of successes = 95, number of trials = 200, p-value = 0.7816
alternative hypothesis: true probability of success is greater than 0.5
> binom.test(95,200,.5,"l")
Exact binomial test
data: 95 and 200
number of successes = 95, number of trials = 200, p-value = 0.2623
alternative hypothesis: true probability of success is less than 0.5
> binom.test(95,200,.5,"t")
Exact binomial test
data: 95 and 200
number of successes = 95, number of trials = 200, p-value = 0.5246
alternative hypothesis: true probability of success is not equal to 0.5
Find the rejection region for the exact two-tailed test at = .01
> qbinom(.005,200,.5) [1] 82 and > qbinom(.995,200,.5) [1] 118 means use B 81 or 119
> pbinom(81,200,.5) [1] 0.00436 > 1-pbinom(118,200,.5) [1] 0.00436 [=2(.00436)=.00872]

Find the RR for the large sample two-tailed test at = .01 > qnorm(.995) [1] 2.575829 B*>2.576
Find the RR in terms of B using large sample:

> qnorm(.995,100,7.07107)

[1] 118.2139

use 119

Find the P-value for large sample when B = 95:


> pnorm(95,100,7.07107)
[1] 0.2397501
or
> pnorm(-0.71)
[1] 0.2388521
double this for P-value
[we used 2(.2389)]

Sign and Signed Rank Test for Using Paired Data


Example: A manufacturing company was disturbed about its safety record so they made every employee take
an industrial safety training program. The data below represent the number of work-hours lost due to accidents
at each of the companys 8 locations in the month before the mandatory training program and in the month
after. Use = .05 to test to see if the training program was effective. (Source: Example 3.1, pages 170171, Nonparametric Statistical Inference by Gibbons & Chakraborti)
Plant
1
2
3
4
5
6
7
8

X = Before
51.2
46.5
24.1
10.2
65.3
92.1
30.3
49.2

Y = After
45.8
41.3
15.8
11.1
58.5
70.3
31.6
35.4

Z =YX
-5.4
-5.2
-8.3
0.9
-6.8
-21.8
1.3
-13.8

Do the exact signed rank test.

Z
5.4
5.2
8.3
0.9
6.8
21.8
1.3
13.8

Ri
4
3
6
1
5
8
2
7

i
0
0
0
1
0
0
1
0

>wilcox.test(y,x,paired=T,g or l or t)

> x <- c(51.2,46.5,24.1,10.2,65.3,92.1,30.3,49.2)


> y <- c(45.8,41.3,15.8,11.1,58.5,70.3,31.6,35.4)
> wilcox.test(y,x,paired=T,"l")
Wilcoxon signed rank test
data: y and x
V = 3, p-value = 0.01953 ( .020 ) alternative hypothesis: true location shift is less than 0
How to find a RR for the exact signed rank
> psignrank(4,8) [1] 0.027343 > psignrank(5,8) [1] 0.03906 > psignrank(6,8)
Do the exact sign test

[1] 0.05468

>SIGN.test(y,x,0,g or l or t)

You must first install package BSDA


>library(BSDA)
> SIGN.test(y,x,0,"l")
Dependent-samples Sign-Test
data: y and x
S = 2, p-value = 0.1445 (same)alternative hypothesis: true median difference is less than 0
95 percent confidence interval:
-Inf 0.07214286
sample estimates:
median of x-y
-6.1
Conf.Level L.E.pt
U.E.pt
Lower Achieved CI 0.8555
-Inf
-5.2000
Interpolated CI
0.9500
-Inf
0.0721
Upper Achieved CI 0.9648
-Inf
0.9000

You could use the qbinom or pbinom commands to find RR for the sign test
Example: Does a certain prescription drug affect heart rate ? A sample of 10 patients resting heart rates were
recorded ( X ). All 10 patients were given a dose of the drug in question and, after thirty minutes, their resting
heart rate was again recorded ( Y ). Use the data below to test, at = .10, to see if this drug has any effect on
resting heart rate.
Patient

1
2
3
4
5
6
7
8
9
10

68
73
75
77
78
78
80
81
84
87

70
72
80
80
79
78
79
83
81
89

Z =YX

Ri

Exact Signed Rank Test


> x <- c(68,73,75,77,78,78,80,81,84,87)
> y <- c(70,72,80,80,79,78,79,83,81,89)
> wilcox.test(y,x,paired=T,"t")
Wilcoxon signed rank test with continuity correction
data: y and x
V = 33.5,
p-value = 0.2099 ( .204 P .250 )
alternative hypothesis: true location shift is not equal to 0
Warning messages:
1: In wilcox.test.default(y, x, paired = T, "t") : cannot compute exact p-value with ties
2: In wilcox.test.default(y, x, paired = T, "t") : cannot compute exact p-value with zeroes
Exact Sign Test
> SIGN.test(y,x,0,"t")
Dependent-samples Sign-Test
data: y and x
S = 6,
p-value = 0.5078 ( same as we got )
alternative hypothesis: true median difference is not equal to 0
sample estimates:
median of x-y
1.5
Conf.Level
L.E.pt
U.E.pt
Lower Achieved CI 0.8906
-1
2.0000
Interpolated CI
0.9500
-1
2.6756
Upper Achieved CI
0.9785
-1
3.0000
Large sample sign and signed rank tests can be done using the qnorm and pnorm commands once you have
calculated the mean and standard deviations of the statistics.
Testing a value for other than 0
Example

Student
X = Before

1
20

2
21

3
25

4
26

5
32

6
27

7
38

8
34

9
28

10
20

11
29

Y = After
20
22
10
16
11
20
20
19
13
21
12
Using = .05, test to try to show that anxiety is reduced by more than 3 points by taking the course.
Exact Signed Rank Test
> x<- c(20,21,25,26,32,27,38,34,28,20,29)
> y<- c(20,22,10,16,11,20,20,19,13,21,12)
> wilcox.test(y,x,paired=T,mu=-3,"l")
Wilcoxon signed rank test with continuity correction
data: y and x
V = 7, p-value = 0.01142
(.009)
alternative hypothesis: true location shift is less than -3
Warning message:In wilcox.test.default(y, x, paired = T, mu = -3, "l") :
cannot compute exact p-value with ties
Exact Sign Test
> SIGN.test(y,x,-3,"l")
Dependent-samples Sign-Test
data: y and x
S = 3,
p-value = 0.1133
alternative hypothesis: true median difference is less than -3
Estimation of using the signed rank and the sign procedures
CI and Point estimate for the safety program data
> wilcox.test(y-x,conf.int=T,conf.level=.9)
Wilcoxon signed rank test
data: y x
V = 3,
p-value = 0.03906
alternative hypothesis: true location is not equal to 0
90 percent confidence interval:
-13.60 -2.15
sample estimates:
(pseudo)median
-6.6
> wilcox.test(y-x,conf.int=T,conf.level=.95)
95 percent confidence interval:
-14.30 -1.95
> wilcox.test(y-x,conf.int=T,conf.level=.99)
99 percent confidence interval:
-21.8 1.3
> SIGN.test(y,x,0,"t",conf.level=.9)
Dependent-samples Sign-Test
data: y and x
S = 2,
p-value = 0.2891
alternative hypothesis: true median difference is not equal to 0
90 percent confidence interval:
-13.05357143 0.07214286
sample estimates:
median of x-y
-6.1
Conf.Level
L.E.pt
U.E.pt
Lower Achieved CI 0.7109
-8.3000
-5.2000
Interpolated CI
0.9000
-13.0536
0.0721
Upper Achieved CI
0.9297
-13.8000
0.9000
> SIGN.test(y,x,0,"t",conf.level=.95)
Conf.Level
Lower Achieved CI 0.9297
Interpolated CI
0.9500

( you get same output for 99 % in this case )


L.E.pt
U.E.pt
-13.8
0.90
-16.4
1.03

Upper Achieved CI

0.9922

-21.8

1.30

Ordered Walsh Averages


You must first install package NSM3
( this is a very large package )
Library(NSM3)
> owa(x,y)
These are for safety program data
$owa
[1] -21.80 -17.80 -15.05 -14.30 -13.80 -13.60 -13.50 -11.05 -10.45 -10.30
[11] -10.25 -9.60 -9.50 -8.30 -7.55 -6.85 -6.80 -6.75 -6.45 -6.25
[21] -6.10 -6.00 -5.40 -5.30 -5.20 -3.70 -3.50 -2.95 -2.75 -2.25
[31] -2.15 -2.05 -1.95 0.90 1.10 1.30
$h.l [1] -6.6
CI and Point estimate for the drug effect data
> owa(x,y)
$owa
[1] -3.0 -2.0 -2.0 -1.5 -1.0 -1.0 -1.0 -1.0 -0.5 -0.5 -0.5 -0.5 -0.5 0.0 0.0
[16] 0.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0
[31] 1.0 1.5 1.5 1.5 1.5 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.5
[46] 2.5 2.5 2.5 3.0 3.0 3.5 3.5 3.5 4.0 5.0
$h.l [1] 1
> wilcox.test(y-x,conf.int=T,conf.level=.9)
Wilcoxon signed rank test with continuity correction
data: y x
V = 33.5, p-value = 0.2099
alternative hypothesis: true location is not equal to 0
90 percent confidence interval:
-0.500093 2.500018
sample estimates:
(pseudo)median
1.000058
Warning messages:
1: In wilcox.test.default(y - x, conf.int = T, conf.level = 0.9) :
cannot compute exact p-value with ties
2: In wilcox.test.default(y - x, conf.int = T, conf.level = 0.9) :
cannot compute exact confidence interval with ties
3: In wilcox.test.default(y - x, conf.int = T, conf.level = 0.9) :
cannot compute exact p-value with zeroes
4: In wilcox.test.default(y - x, conf.int = T, conf.level = 0.9) :
cannot compute exact confidence interval with zeroes
> wilcox.test(y-x,conf.int=T,conf.level=.95)
95 percent confidence interval:
-0.9999859 3.0000166
> wilcox.test(y-x,conf.int=T,conf.level=.99)
98 percent confidence interval:
-1.999972 3.500034
> SIGN.test(y,x,conf.level=.9)
( you get same output for 95 % )
sample estimates:
median of x-y
1.5
Conf.Level
L.E.pt
U.E.pt
Lower Achieved CI 0.8906
-1
2.0000
Interpolated CI
0.9000
-1
2.1067
Upper Achieved CI
0.9785
-1
3.0000
> SIGN.test(y,x,conf.level=.99)
Conf.Level
Lower Achieved CI 0.9785

L.E.pt
-1.000

U.E.pt
3.000

Interpolated CI
Upper Achieved CI
One Sample Signed Rank and Sign

0.9900
0.9980

-2.176
-3.000

4.176
5.000

Example 1: A large bank wishes to test to see if the median time its customers spend in line waiting for a teller
is less than 3 minutes. A random sample of 11 customers was taken and their waiting times ( to the nearest
second ) were recorded. Use the data below to test the hypotheses of interest to the bank using = .05.
Zi
2:45
3:12
0:21
0:00
1:34
6:54
0:12
5:32
0:00
3:56
3:31

Zi - 3
-0:15
0:12
-2:39
-3:00
-1:26
3:54
-2:48
2:32
-3:00
0:56
0:31

Zi - 3
0:15
0:12
2:39
3:00
1:26
3:54
2:48
2:32
3:00
0:56
0:31

Rank
2
1
7
9.5
5
11
8
6
9.5
4
3

H0: = 3
H1: < 3
TS: T+ = 25
1 + 11 + 6 + 4 + 3
RR ( = .042 )
T+ 13 = 66 - 53
P-Value
P( T+ 25 ) = .260
same as P( T+ 41 )

> z <- c(165,192,21,0,94,414,12,332,0,236,211)


> wilcox.test(z,mu=180,alternative="l")
Wilcoxon signed rank test with continuity correction
data: z
V = 25,
p-value = 0.2523
alternative hypothesis: true location is less than 180
Warning message: In wilcox.test.default(z, mu = 180, alternative = "l") :
cannot compute exact p-value with ties
> SIGN.test(z,md=180,alternative="l")
One-sample Sign-Test
data: z
s = 5,
p-value = 0.5
alternative hypothesis: true median is less than 180
> SIGN.test(z,conf.level=.9)
Lower Achieved CI
Interpolated CI
Upper Achieved CI

Conf.Level
0.7734
0.9000
0.9346

L.E.pt
21.0000
13.9309
12.0000

U.E.pt
211.0000
230.6364
236.0000

Conf.Level
0.9346
0.9500
0.9883

L.E.pt
12.0000
8.5527
0.0000

U.E.pt
236.0000
263.5782
332.0000

Conf.Level
0.9883
0.9900
0.9990

L.E.pt
0
0
0

U.E.pt
332.00
345.12
414.00

> SIGN.test(z,conf.level=.95)
Lower Achieved CI
Interpolated CI
Upper Achieved CI
> SIGN.test(z,conf.level=.99)
Lower Achieved CI
Interpolated CI
Upper Achieved CI

Example 3: A random sample of 5 healthy males between the ages of 19 - 30 was taken. These males
were all non-smokers and either doctors or medical research workers. For each male in the sample their forced
vital capacity ( a measure of aerobic health ) was measured. Using the 5 values given below, find a 90 %
confidence interval for the true median forced vital capacity of males in this group. Also, find a point estimate
for the median.
n = 5 5(6) / 2 = 15 Walsh averages
P( T+ 15 ) = .031
Thus t/2 = 15 and 15 + 1 - 15 = 1
(1)
(15)
A 93.8 % C.I for would be: [ W , W ] and the point estimate for would be W(8) .
Zi
4290
5280
5280
5555
5610

4290
4290.0
4785.0
4785.0
4922.5
4950.0

5280

5280

5555

5610

5280.0
5280.0
5417.5
5445.0

5280.0
5417.5
5445.0

5555.0
5582.5

5610.0

The 93.8 % C.I. is ( 4290, 5610 ) and the point estimate for the median is 5280 .
z <- c(4290,5280,5280,5555,5610)
> wilcox.test(z,conf.int=T,conf.level=.9)
Wilcoxon signed rank test with continuity correction
90 percent confidence interval:
4290 5610
sample estimates:
(pseudo)median
5280
Warning messages:
1: In wilcox.test.default(z, conf.int = T, conf.level = 0.9) :
cannot compute exact p-value with ties
2: In wilcox.test.default(z, conf.int = T, conf.level = 0.9) :
cannot compute exact confidence interval with ties
Example 4: The values below are the effective doses of a drug for 9 different patients. Use this data to find
a 90 % confidence interval for the true median effective dose and also find a point estimate for the true median
effective dose.
n = 9 9(10) / 2 = 45 Walsh averages
P( T+ 37 ) = .049
Thus t/2 = 37 and 45 + 1 - 37 = 9
(9)
(37)
A 90.2 % C.I for would be: [ W , W ] and the point estimate for would be W(23) .
Zi
.41
.45
.52
.68
.75
.78
.82
.91
1.06

.41
.410
.430
.465
.545
.580
.595
.615
.660
.735

.45

.52

.68

.75

.78

.82

.91

1.06

.450
.485
.565
.600
.615
.635
.680
.755

.520
.600
.635
.650
.670
.715
.790

.680
.715
.730
.750
.795
.870

.750
.765
.785
.830
.905

.780
.800
.845
.920

.820
.865
.940

.910
.985

1.060

The 90.2 % C.I. is ( .580, .845 ) and the point estimate for the median is .715 .
> z <- c(.41,.45,.52,.68,.75,.78,.82,.91,1.06)
> wilcox.test(z,conf.int=T,conf.level=.9)

Wilcoxon signed rank test


90 percent confidence interval:
0.580 0.845
sample estimates:
(pseudo)median

0.715

Potrebbero piacerti anche