Using R For Nonparametric Analysis

USING R FOR NONPARAMETRIC ANALYSIS - PART ONE
Binomial Probabilities
>pbinom(b,n,p)
will yield the value for P( B b ) when B is a binomial with n trials and P( S ) = p
>pbinom(b-1,n,p,lower.tail=F) or >1 pbinom(b-1,n,p) will yield P( B b )

>pbinom(b,n,p) - pbinom(b-1,n,p) will yield P( B = b )
Example: Hypothesis Testing Example 1, For a binomial with n = 20 testing H0: p =.5 versus H1: p > .5
Find for RR: B 14:
>1 pbinom(13,20,.5) or >pbinom(13,20,.5,lower.tail=F)

[1] 0.05765915
( this was .058 from table )
Find the P-value for TS: B = 12:

[1] 0.2517223
Find power of test when p = .8:

[1] 0.9133075
Normal Probabilities
>qnorm(p,,) will yield the value y0 such that P( Y < y0 ) = p when Y is normal, mean = , std.dev. =
>pnorm(y, ,) will yield the value of P( Y < y ) when Y is normal, mean = , std.dev. =
Note: if you leave out the values for and , R will assume you want to use the standard normal Z
Example: Hypothesis Testing Example 2, Testing H0: = 5 versus H1: < 5
Find the RR for the large sample test at = .05
Find the power when = 4:
s = 3.1
qnorm(.05) or qnorm(.05,0,1)
[1] -1.644854 ( we used Z < - 1.645 )
Find the RR in terms of x : qnorm(.05,5,0.31)

Find the P-value when x = 4.4:
n = 100
[1] 4.490095
( we used x < 4.49 )
pnorm(4.4,5,.31) or pnorm(-1.94)
[1] 0.02646547
[1] 0.02618984
pnorm(4.49,4,.31)
[1] 0.9430204
or
( we got .0262 )
pnorm(1.58)
[1] 0.9429466 ( we got .9429 )
Binomial Test for p

>binom.test(b,n,p0,g or l or t) will test H0: p = p0 versus H1: p > or < or p0 )
> qbinom(prob,n,p) will return the smallest value of b such that P( B b ) > prob (closest w/o going under)
Example: Forty percent of the assembly line workers at a large corporation are from minority groups. A
committee of 15 assembly line workers is to be selected at random to look into job relate complaints. The
committee that is chosen has 3 minority persons on it. Using = .05, test to see if random selection of
the committee can be doubted.
Find RR for each of the three possible alternatives using = .05

>qbinom(.05,15,.4)
>qbinom(.95,15,.4)
>qbinom(.025,15,.4)
>pbinom(1,15,.4)
[1] 3
means we should use B 2 >pbinom(2,15,.4) [1] 0.027114 (=.027)
[1] 9 means we should use B 10 >1-pbinom(9,15,.4) [1] 0.0338333 (=.034)
[1] 2 and >qbinom(.975,15,.4)
[1] 10 means use B 1 or 11
[1] 0.005172035 >1pbinom(10,15,.4) [1] 0.009347661 ( =.005+.009 = .014)
Obtain the P-values for each of the three possible alternatives
>binom.test(B,b,p=p0,g or l or t)
> binom.test(3,15,p=.4,"g")
Exact binomial test
data: 3 and 15 number of successes = 3, number of trials = 15, p-value = 0.9729 (.973)
alternative hypothesis: true probability of success is greater than 0.4
95 percent confidence interval:
0.05684687 1.00000000
sample estimates:
probability of success
0.2
> binom.test(3,15,p=.4,"l")
Exact binomial test
alternative hypothesis: true probability of success is less than 0.4
0.0000000 0.4397844
sample estimates:
0.2
>binom.test(3,15,p=.4,"t")
Exact binomial test
alternative hypothesis: true probability of success is not equal to 0.4
95 percent confidence interval
0.04331201 0.48089113
sample estimates:
0.2
Example: A random sample of 200 registered voters yields 95 that say they would like to see the Health
Care Bill repealed. Using = .01, test to see if the true proportion of registered voters who would like to see
the Health Care Bill repealed is greater than, less than or different from .50.
Obtain the P-values for each of the three possible alternatives
> binom.test(95,200,.5,"g")
Exact binomial test
data: 95 and 200
number of successes = 95, number of trials = 200, p-value = 0.7816
alternative hypothesis: true probability of success is greater than 0.5
> binom.test(95,200,.5,"l")
Exact binomial test
data: 95 and 200
alternative hypothesis: true probability of success is less than 0.5
> binom.test(95,200,.5,"t")
Exact binomial test
data: 95 and 200
alternative hypothesis: true probability of success is not equal to 0.5
Find the rejection region for the exact two-tailed test at = .01
> qbinom(.005,200,.5) [1] 82 and > qbinom(.995,200,.5) [1] 118 means use B 81 or 119
> pbinom(81,200,.5) [1] 0.00436 > 1-pbinom(118,200,.5) [1] 0.00436 [=2(.00436)=.00872]
Find the RR for the large sample two-tailed test at = .01 > qnorm(.995) [1] 2.575829 B*>2.576
Find the RR in terms of B using large sample:
> qnorm(.995,100,7.07107)
[1] 118.2139
use 119
Find the P-value for large sample when B = 95:

> pnorm(95,100,7.07107)
[1] 0.2397501
or
> pnorm(-0.71)
[1] 0.2388521
double this for P-value
[we used 2(.2389)]
Sign and Signed Rank Test for Using Paired Data

Example: A manufacturing company was disturbed about its safety record so they made every employee take
an industrial safety training program. The data below represent the number of work-hours lost due to accidents
at each of the companys 8 locations in the month before the mandatory training program and in the month
after. Use = .05 to test to see if the training program was effective. (Source: Example 3.1, pages 170171, Nonparametric Statistical Inference by Gibbons & Chakraborti)
Plant
1
2
3
4
5
6
7
8
X = Before
51.2
46.5
24.1
10.2
65.3
92.1
30.3
49.2
Y = After
45.8
41.3
15.8
11.1
58.5
70.3
31.6
35.4
Z =YX
-5.4
-5.2
-8.3
0.9
-6.8
-21.8
1.3
-13.8
Do the exact signed rank test.
Z
5.4
5.2
8.3
0.9
6.8
21.8
1.3
13.8
Ri
4
3
6
1
5
8
2
7
i
0
0
0
1
0
0
1
0
>wilcox.test(y,x,paired=T,g or l or t)
> x <- c(51.2,46.5,24.1,10.2,65.3,92.1,30.3,49.2)

> y <- c(45.8,41.3,15.8,11.1,58.5,70.3,31.6,35.4)
> wilcox.test(y,x,paired=T,"l")
Wilcoxon signed rank test
data: y and x
V = 3, p-value = 0.01953 ( .020 ) alternative hypothesis: true location shift is less than 0
How to find a RR for the exact signed rank
> psignrank(4,8) [1] 0.027343 > psignrank(5,8) [1] 0.03906 > psignrank(6,8)
Do the exact sign test
[1] 0.05468
>SIGN.test(y,x,0,g or l or t)
You must first install package BSDA

>library(BSDA)
> SIGN.test(y,x,0,"l")
Dependent-samples Sign-Test
data: y and x
S = 2, p-value = 0.1445 (same)alternative hypothesis: true median difference is less than 0
-Inf 0.07214286
sample estimates:
median of x-y
-6.1
Conf.Level L.E.pt
U.E.pt
Lower Achieved CI 0.8555
-Inf
-5.2000
Interpolated CI
0.9500
-Inf
0.0721
Upper Achieved CI 0.9648
-Inf
0.9000
You could use the qbinom or pbinom commands to find RR for the sign test
Example: Does a certain prescription drug affect heart rate ? A sample of 10 patients resting heart rates were
recorded ( X ). All 10 patients were given a dose of the drug in question and, after thirty minutes, their resting
heart rate was again recorded ( Y ). Use the data below to test, at = .10, to see if this drug has any effect on
resting heart rate.
Patient
1
2
3
4
5
6
7
8
9
10
68
73
75
77
78
78
80
81
84
87
70
72
80
80
79
78
79
83
81
89
Z =YX
Ri
Exact Signed Rank Test

> x <- c(68,73,75,77,78,78,80,81,84,87)
> y <- c(70,72,80,80,79,78,79,83,81,89)
> wilcox.test(y,x,paired=T,"t")
Wilcoxon signed rank test with continuity correction
data: y and x
V = 33.5,
p-value = 0.2099 ( .204 P .250 )
alternative hypothesis: true location shift is not equal to 0
Warning messages:
1: In wilcox.test.default(y, x, paired = T, "t") : cannot compute exact p-value with ties
2: In wilcox.test.default(y, x, paired = T, "t") : cannot compute exact p-value with zeroes
Exact Sign Test
> SIGN.test(y,x,0,"t")
data: y and x
S = 6,
p-value = 0.5078 ( same as we got )
alternative hypothesis: true median difference is not equal to 0
sample estimates:
median of x-y
1.5
Conf.Level
L.E.pt
U.E.pt
-1
2.0000
Interpolated CI
0.9500
-1
2.6756
Upper Achieved CI
0.9785
-1
3.0000
Large sample sign and signed rank tests can be done using the qnorm and pnorm commands once you have
calculated the mean and standard deviations of the statistics.
Testing a value for other than 0
Example
Student
X = Before
1
20
2
21
3
25
4
26
5
32
6
27
7
38
8
34
9
28
10
20
11
29
Y = After
20
22
10
16
11
20
20
19
13
21
12
Using = .05, test to try to show that anxiety is reduced by more than 3 points by taking the course.
Exact Signed Rank Test
> x<- c(20,21,25,26,32,27,38,34,28,20,29)
> y<- c(20,22,10,16,11,20,20,19,13,21,12)
> wilcox.test(y,x,paired=T,mu=-3,"l")
data: y and x
V = 7, p-value = 0.01142
(.009)
alternative hypothesis: true location shift is less than -3
Warning message:In wilcox.test.default(y, x, paired = T, mu = -3, "l") :
cannot compute exact p-value with ties
Exact Sign Test
> SIGN.test(y,x,-3,"l")
data: y and x
S = 3,
p-value = 0.1133
alternative hypothesis: true median difference is less than -3
Estimation of using the signed rank and the sign procedures
CI and Point estimate for the safety program data
> wilcox.test(y-x,conf.int=T,conf.level=.9)
data: y x
V = 3,
p-value = 0.03906
alternative hypothesis: true location is not equal to 0
-13.60 -2.15
sample estimates:
(pseudo)median
-6.6
-14.30 -1.95
-21.8 1.3
> SIGN.test(y,x,0,"t",conf.level=.9)
data: y and x
S = 2,
p-value = 0.2891
alternative hypothesis: true median difference is not equal to 0
-13.05357143 0.07214286
sample estimates:
median of x-y
-6.1
Conf.Level
L.E.pt
U.E.pt
-8.3000
-5.2000
Interpolated CI
0.9000
-13.0536
0.0721
Upper Achieved CI
0.9297
-13.8000
0.9000
> SIGN.test(y,x,0,"t",conf.level=.95)
Conf.Level
Interpolated CI
0.9500
( you get same output for 99 % in this case )

L.E.pt
U.E.pt
-13.8
0.90
-16.4
1.03
Upper Achieved CI
0.9922
-21.8
1.30
Ordered Walsh Averages

You must first install package NSM3
( this is a very large package )
Library(NSM3)
> owa(x,y)
These are for safety program data
$owa
[1] -21.80 -17.80 -15.05 -14.30 -13.80 -13.60 -13.50 -11.05 -10.45 -10.30
[11] -10.25 -9.60 -9.50 -8.30 -7.55 -6.85 -6.80 -6.75 -6.45 -6.25
[21] -6.10 -6.00 -5.40 -5.30 -5.20 -3.70 -3.50 -2.95 -2.75 -2.25
[31] -2.15 -2.05 -1.95 0.90 1.10 1.30
$h.l [1] -6.6
CI and Point estimate for the drug effect data
> owa(x,y)
$owa
[1] -3.0 -2.0 -2.0 -1.5 -1.0 -1.0 -1.0 -1.0 -0.5 -0.5 -0.5 -0.5 -0.5 0.0 0.0
[16] 0.0 0.0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0
[31] 1.0 1.5 1.5 1.5 1.5 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.5
[46] 2.5 2.5 2.5 3.0 3.0 3.5 3.5 3.5 4.0 5.0
$h.l [1] 1
data: y x
V = 33.5, p-value = 0.2099
alternative hypothesis: true location is not equal to 0
-0.500093 2.500018
sample estimates:
(pseudo)median
1.000058
Warning messages:
1: In wilcox.test.default(y - x, conf.int = T, conf.level = 0.9) :
cannot compute exact confidence interval with ties
cannot compute exact p-value with zeroes
cannot compute exact confidence interval with zeroes
-0.9999859 3.0000166
-1.999972 3.500034
> SIGN.test(y,x,conf.level=.9)
( you get same output for 95 % )
sample estimates:
median of x-y
1.5
Conf.Level
L.E.pt
U.E.pt
-1
2.0000
Interpolated CI
0.9000
-1
2.1067
Upper Achieved CI
0.9785
-1
3.0000
> SIGN.test(y,x,conf.level=.99)
Conf.Level
L.E.pt
-1.000
U.E.pt
3.000
Interpolated CI
Upper Achieved CI
One Sample Signed Rank and Sign
0.9900
0.9980
-2.176
-3.000
4.176
5.000
Example 1: A large bank wishes to test to see if the median time its customers spend in line waiting for a teller
is less than 3 minutes. A random sample of 11 customers was taken and their waiting times ( to the nearest
second ) were recorded. Use the data below to test the hypotheses of interest to the bank using = .05.
Zi
2:45
3:12
0:21
0:00
1:34
6:54
0:12
5:32
0:00
3:56
3:31
Zi - 3
-0:15
0:12
-2:39
-3:00
-1:26
3:54
-2:48
2:32
-3:00
0:56
0:31
Zi - 3
0:15
0:12
2:39
3:00
1:26
3:54
2:48
2:32
3:00
0:56
0:31
Rank
2
1
7
9.5
5
11
8
6
9.5
4
3
H0: = 3
H1: < 3
TS: T+ = 25
1 + 11 + 6 + 4 + 3
RR ( = .042 )
T+ 13 = 66 - 53
P-Value
P( T+ 25 ) = .260
same as P( T+ 41 )
> z <- c(165,192,21,0,94,414,12,332,0,236,211)

> wilcox.test(z,mu=180,alternative="l")
data: z
V = 25,
p-value = 0.2523
alternative hypothesis: true location is less than 180
Warning message: In wilcox.test.default(z, mu = 180, alternative = "l") :
> SIGN.test(z,md=180,alternative="l")
One-sample Sign-Test
data: z
s = 5,
p-value = 0.5
alternative hypothesis: true median is less than 180
> SIGN.test(z,conf.level=.9)
Lower Achieved CI
Interpolated CI
Upper Achieved CI
Conf.Level
0.7734
0.9000
0.9346
L.E.pt
21.0000
13.9309
12.0000
U.E.pt
211.0000
230.6364
236.0000
Conf.Level
0.9346
0.9500
0.9883
L.E.pt
12.0000
8.5527
0.0000
U.E.pt
236.0000
263.5782
332.0000
Conf.Level
0.9883
0.9900
0.9990
L.E.pt
0
0
0
U.E.pt
332.00
345.12
414.00
Lower Achieved CI
Interpolated CI
Upper Achieved CI
Lower Achieved CI
Interpolated CI
Upper Achieved CI
Example 3: A random sample of 5 healthy males between the ages of 19 - 30 was taken. These males
were all non-smokers and either doctors or medical research workers. For each male in the sample their forced
vital capacity ( a measure of aerobic health ) was measured. Using the 5 values given below, find a 90 %
confidence interval for the true median forced vital capacity of males in this group. Also, find a point estimate
for the median.
n = 5 5(6) / 2 = 15 Walsh averages
P( T+ 15 ) = .031
Thus t/2 = 15 and 15 + 1 - 15 = 1
(1)
(15)
A 93.8 % C.I for would be: [ W , W ] and the point estimate for would be W(8) .
Zi
4290
5280
5280
5555
5610
4290
4290.0
4785.0
4785.0
4922.5
4950.0
5280
5280
5555
5610
5280.0
5280.0
5417.5
5445.0
5280.0
5417.5
5445.0
5555.0
5582.5
5610.0
The 93.8 % C.I. is ( 4290, 5610 ) and the point estimate for the median is 5280 .
z <- c(4290,5280,5280,5555,5610)
> wilcox.test(z,conf.int=T,conf.level=.9)
4290 5610
sample estimates:
(pseudo)median
5280
Warning messages:
1: In wilcox.test.default(z, conf.int = T, conf.level = 0.9) :
2: In wilcox.test.default(z, conf.int = T, conf.level = 0.9) :
cannot compute exact confidence interval with ties
Example 4: The values below are the effective doses of a drug for 9 different patients. Use this data to find
a 90 % confidence interval for the true median effective dose and also find a point estimate for the true median
effective dose.
n = 9 9(10) / 2 = 45 Walsh averages
P( T+ 37 ) = .049
Thus t/2 = 37 and 45 + 1 - 37 = 9
(9)
(37)
A 90.2 % C.I for would be: [ W , W ] and the point estimate for would be W(23) .
Zi
.41
.45
.52
.68
.75
.78
.82
.91
1.06
.41
.410
.430
.465
.545
.580
.595
.615
.660
.735
.45
.52
.68
.75
.78
.82
.91
1.06
.450
.485
.565
.600
.615
.635
.680
.755
.520
.600
.635
.650
.670
.715
.790
.680
.715
.730
.750
.795
.870
.750
.765
.785
.830
.905
.780
.800
.845
.920
.820
.865
.940
.910
.985
1.060
The 90.2 % C.I. is ( .580, .845 ) and the point estimate for the median is .715 .
> z <- c(.41,.45,.52,.68,.75,.78,.82,.91,1.06)
> wilcox.test(z,conf.int=T,conf.level=.9)

0.580 0.845
sample estimates:
(pseudo)median
0.715

Using R For Nonparametric Analysis

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Using R For Nonparametric Analysis

Caricato da

Copyright:

Formati disponibili

USING R FOR NONPARAMETRIC ANALYSIS - PART ONE

>pbinom(b-1,n,p,lower.tail=F) or >1 pbinom(b-1,n,p) will yield P( B b )

>1 pbinom(13,20,.5) or >pbinom(13,20,.5,lower.tail=F)

Find the P-value for TS: B = 12:

>1 pbinom(11,20,.5) or >pbinom(11,20,.5,lower.tail=F)

Find power of test when p = .8:

>1 pbinom(13,20,.8) or >pbinom(13,20,.8,lower.tail=F)

Find the power when = 4:

Find the RR in terms of x : qnorm(.05,5,0.31)

( we used x < 4.49 )

Binomial Test for p

Find RR for each of the three possible alternatives using = .05

Obtain the P-values for each of the three possible alternatives

Find the P-value for large sample when B = 95:

Sign and Signed Rank Test for Using Paired Data

Do the exact signed rank test.

> x <- c(51.2,46.5,24.1,10.2,65.3,92.1,30.3,49.2)

You must first install package BSDA

Exact Signed Rank Test

( you get same output for 99 % in this case )

Ordered Walsh Averages

> z <- c(165,192,21,0,94,414,12,332,0,236,211)

Wilcoxon signed rank test

Potrebbero piacerti anche