Nonparametric Methods

Nonparametric methods
Nonparametric tests are often used in place of their parametric counterparts

when certain assumptions about the underlying population are questionable such as
normality of the data, or when observations may be measured on an ordinal rather than
an interval scale, or come from skewed or multimodal distributions. All tests involving
ranked data, i.e. data that can be put in order, are nonparametric.
Nonparametric methods
One sample
Sign Test
Wilcoxon Signed Ranks Test
Two samples
Parametric methods
One-sample t test
(paired t-test)
One-sample t test
(paired t-test)
Two-sample t test
Wilcoxon Mann-Whitney
Test
One-way ANOVA Test
One-way ANOVA
Kruskal-Wallis Test
Two-way ANOVA Test
Two-way ANOVA
Friedman Test
Correlation Test
Correlation
Spearman Rank Correlation

Test
Goodness -of-Fit
Kolmogorov-Smirnov Test
Regression
Nonparametric linear
regression
Chi-Squared goodness of fit

test
Linear regression
Non-parametric statistics are used if the data are not compatible with the assumptions of
parametric methods such as normality or homogeneous variance.
Advantages of non-parametric methods
1. are easy to apply
2. Assumptions, such as normality, can be relaxed
3. When observations are drawn from non-normal populations, non-parametric
methods are more reliable
4. can be used for ranks scores which are not exact in a numerical sense.
Disadavantages of non-parametric methods
1. when observations are drawn from normal populations, non-parametric
tests are not as powerful as parametric tests.
2. Parametric methods are sometimes robust to certain types of

departure from normality (especially as n gets large)
Central Limit Theorem
3. Confidence interval construction is difficult with non-parametric
methods.
One Sample
Sign Test
The sign test is designed to test a hypothesis about the location of a population
distribution. It is most often used to test the hypothesis about a population median, and
often involves the use of matched pairs, for example, before and after data, in which case
it tests for a median difference of zero.
Example
Out of a population, 10 mentally retarded boys received general appearance scores as
follows: 4, 5, 8, 8, 9, 6, 10,7,6,6.
1. H 0 : median 5 vs H A : Not H 0
2. Transform the data into signs: Assign + if the observed value > 5
- if the observed value < 5
0 if the observed value = 5.
obs
4
5
8
8
9
6
10
7
6
Signs 0
+
+
+
+
+
+
+
6
+
Zeros are eliminated from the analysis. Since there is 1 zero, the number of
observations is reduced from 10 to 9.
3. Thus we observed 8 +s out of 9 trials. The probability that we observe as many as 8
or more +s is, in EXCEL, (1-BINOMDIST(7,9,0.5, True))=.0195. Since we perform
the two-sided test, the p-value = 2*.0195 = .0370.
Large Sample Approximation for n > 20:
Z
In this example, Z =
89/ 2
9/4
Tn/2
~ N (0,1)
n/4
2.33
From EXCEL, the p-value =2*[1- NORMDIST(2.33,0,1,TRUE)] = 0.020 < 0.05.
4. Since the p-value is smaller than 0.05, we conclude that the median score is
not equal to 5.
Wilcoxon Signed Ranks Test

The Wilcoxon Signed Ranks test is designed to test a hypothesis about the median
of a population distribution. It often involves the use of matched pairs, for example,
before and after data, in which case it tests for a median difference of zero. In many
applications, this test is used in place of the one sample t-test when the normality
assumption is questionable. It is a more powerful alternative to the sign test, but does
assume that the population probability distribution is symmetric.
Example
id
1
2 3
4
5 6
7
8 9
10
T+
score
4
5 8
8
9 6
10 7 6
6
x5
-1 0 3
3
4 1
5
2 1
1
x 5
1
0 3
3
4 1
5
2 1
1
Rank- 2.5
Rank+
6.5
6.5
8 2.5 9
5 2.5
2.5
42.5
1. H 0 : median Cardiac 5 vs H A : Not H 0 .
2. Find the differences between observed values and the proposed median = 5.
3. Find the absolute values of the differences.
4. Eliminate the observation whose value, after subtracting 5, becomes 0.
5. Rank the absolute values of the differences, breaking the ranks among the ties.
6. Add all the ranks for the positive differences. T+ = 42.5
From the statistical table for the Wilcoxon Signed-Rank Test
when n = 9, T+ = 42.5, T- = 2.5, p-value = 2* .008 = .016
For Large sample approximation if n > 20
Z
T [ n ( n 1) / 4]
~ N (0,1)
n ( n 1)(2n 1) / 24
42.5 [9 * 10 / 4]
2.37 , its p-value is .018.
For this example Z
9 *10 * 19 / 24
8. Since n < 20, based on the p-value from the table (.1514),
we conclude that the median is not equal to 5.
SAS program for the sign test and the Wilcoxon ranked sign test for 1 sample data
DATA IN;
INPUT X @@;
diff=X-5;
CARDS;
4 5 8 8 9 6 10 7 6 6
run;
PROC UNIVARIATE;
VAR diff;
run;
Output
Univariate Procedure
Variable=DIFF
Moments
N
10 Sum Wgts
10
Mean
1.9 Sum
19
Std Dev 1.852926 Variance 3.433333
Skewness 0.180769 Kurtosis -0.62777
USS
67 CSS
30.9
CV
97.5224 Std Mean 0.585947
T:Mean=0 3.242617 Pr>|T|
0.0101
Num ^= 0
9 Num > 0
8
M(Sign)
3.5 Pr>=|M|
0.0391
Sgn Rank
20 Pr>=|S|
0.0195
M(sign) = # of +s n/2
= 8 - 9/2 = 3.5
Sgn Rank = T - n(n+1)/4
= 42.5 22.5 = 20
Two Paired Samples

Sign Test
Example (From Table 18-1 (p489))
Matched-pair design involving change scores in self-perception of heath among
hypertensives
id
Treatment
Control
Sign
1
10
6
+
2
12
5
+
3
8
7
+
4
8
9
-
5
13
10
+
6
11
12
-
7
15
9
+
8
16
8
+
9
4
3
+
10
13
14
-
11
2
6
-
12
15
10
+
13
5
1
+
14
6
2
+
15
8
1
+
T+
11
1. H 0 : median Treatment median Control vs H A : Not H 0 .

2. The signs are assigned + if Treatment > Control
- if Treatment < Control
= if Treatment = Control.
3. n = 15, T+ = 11
The exact p value =2*[1 - BINOMDIST(10,15,0.5, TRUE)] = 0.1185
Large Sample Approximation for n > 20:
Z
In this example, Z =
Tn/2
n/4
~ N (0,1)
11 15 / 2
1.81
15 / 4
From EXCEL, the p-value =2*[1- NORMDIST(1.81,0,1,TRUE)] = 0.07 > 0.05.

Based on the large sample approximation and the exact p -value, we conclude that
the median of the treatment group is not different from the median of the
control group.
Wilcoxon Signed Rank Test

Example
Matched-pair design involving change scores in self-perception of heath among
hypertensives
id
Treatment
Control
Diff
Diff
1
10
6
4
4
2
12
5
7
7
3
8
7
1
1
4
8
9
-1
1
3
5
13
10
3
3
6
11
12
-1
1
3
7
15
9
6
6
8
16
8
8
8
9
4
3
1
1
10
13
14
-1
1
3
11
2
6
-4
4
8.5
12
15
10
5
5
13
5
1
4
4
14
6
2
4
4
15
8
1
7
7
RankRank+
8.5 13.5 3
6
12 15 3
11 8.5 8.5 13.5
H
:
median
median
vs
H
:
Not
H
.
1.
0
Treatment
Control
A
0
2. Find the difference between treatment and control.
3. Rank the absolute differences breaking the ranks among the ties.
4. Add all the ranks for the positive differences. T+ = 110
From the statistical table for the Wilcoxon Signed-Rank Test
when n = 15, T+ =102.5, T- = 17.5, p-value = 2* .007 = .014
For Large sample approximation if n > 20
Z
T [ n ( n 1) / 4]
~ N (0,1)
n ( n 1)(2n 1) / 24
102.5 [15 * 16 / 4]
2.41 and its p-value = .016.
For this example, Z
15 * 16 * 31 / 24
5. Since the p-value < .05, we conclude that the medians are not different.
Note: the results from the sign test and the Wilcoxon signed ranks test on the paired
sample are different. The Wilcoxon signed ranks test is more powerful.
SAS program for the sign test for paired data
data hyper;
input treat control @@;
diff = treat-control;
cards;
10 6 12 5
4 3 13 14
run;
8 7 8 9 13 10 11 12 15 9 16 8
2 6 15 10 5 1 6 2 8 1
proc univariate;
var diff;
run;
Output
Variable=DIFF
Moments
Univariate Procedure
parametric paired
t-test
T+
102.5
N
15 Sum Wgts
15
Mean
2.866667 Sum
43
Std Dev 3.563038 Variance 12.69524
Skewness -0.34413 Kurtosis -0.82487
USS
301 CSS
177.7333
CV
124.292 Std Mean 0.919972
T:Mean=0 3.116036 Pr>|T|
0.0076
Num ^= 0
15 Num > 0
11
M(Sign)
3.5 Pr>=|M|
0.1185
Sgn Rank
42.5 Pr>=|S|
0.0139
M(sign) = # of +s n/2
= 11 15/2 = 3.5
Sgn Rank = T - n(n+1)/4
= 102.5 60 = 42.5
Two Independent Samples

Wilcoxon Mann-Whitney Test
The Wilcoxon Mann-Whitney Test is one of the most powerful of the nonparametric tests
for comparing two populations. In many applications, the Wilcoxon Mann-Whitney Test
is used in place of the two sample t-test when the normality assumption is questionable.
This test can also be applied when the observations in a sample of data are ranks, that is,
ordinal data rather than direct measurements.
Example
A researcher assess the effects of prolonged inhalation of cadmium oxide on the
hemoglobin level.
Exposed
Unexposed
Exposed(sort)
Rank
Unexposed(sort)
Rank
14.4
14.2
13.8
16.5
14.1
16.6
15.9
15.6
14.1
15.3
15.7
16.7
13.7
15.3
14.0
17.4
16.2
17.1
17.5
15.0
16.0
16.9
15.0
16.3
16.8
13.7
13.8
14
14.1
14.1
14.2
14.4
15.3
15.3
15.6
15.7
15.9
16.5
16.6
16.7
S1
1
2
3
4.5
4.5
6
7
10.5
10.5
12
13
14
18
19
20
145
15
15
16
16.2
16.3
16.8
16.9
17.1
17.4
17.5
8.5
8.5
15
16
17
21
22
23
24
25
S2 180
1. Hypotheses: H 0 : median X median Y vs H A : Not H 0
2. Sort each column of the variables. Assign the joint ranks to the samples from
the two variables. Find S1 and S2, the sum of the ranks assigned to each group.
S = max (S1,S2)
3. The test statistic is T S
n (n 1)
2
where n = max (n1, n2)

SAS program for the Wilcoxon Mann-Whitney Test
data oxide;
infile cards missover;

input group $ @;
do until (hemo = .);
input hemo @;
if hemo ne .
then output;
end;
cards;
1 13.7 13.8 14 14.1 14.1 14.2 14.4 15.3 15.3 15.6 15.7 15.9 16.5 16.6 16.7
2 15 16 16.2 16.3 16.8 16.9 17.1 17.4 17.5 15
run;
proc npar1way wilcoxon;
class group;
var hemo;
exact;
run;
proc means median;
class group;
var hemo;
run;
NPAR1WAY PROCEDURE
GROUP
1
2
Wilcoxon Scores (Rank Sums) for Variable HEMO

Classified by Variable GROUP
Sum of
Expected
Std Dev
Mean
N
Scores
Under H0
Under H0
Score
15
145.0
195.0
18.0173527
9.6666667
10
180.0
130.0
18.0173527
18.0000000
Average Scores Were Used for Ties
Wilcoxon 2-Sample Test
S = 180.000
Exact P-Values
(One-sided) Prob >= S
= 0.0021
(Two-sided) Prob >= |S - Mean| = 0.0042
Normal Approximation (with Continuity Correction of .5)
Z = 2.74735
Prob > |Z| = 0.0060
T-Test Approx. Significance = 0.0112
The MEANS Procedure

Analysis Variable : hemo
N
group
Obs
Median
1
15
15.3000000
2
10
16.5500000
Conclusion: Reject H 0 : median X median Y and conclude that the median hemo of
Treatment group 1 and 2 are different or the median hemo due to Group 2 is greater
than the median due to Group 1.
One-way ANOVA
Kruskal-Wallis Test
The Kruskal-Wallis test is a nonparametric test used to compare three or more samples.
Data
Levels
Observations
Sum i
Mean i
Y11 Y12 Y13 .....Y1n1
Y1.
Y1.
Y21 Y22 Y23 .....Y2 n 2
Y2.
Y2.
:
a
Ya1 Ya 2 Ya 3 .....Yan a
Ya .
Ya .
__________________________________________
Total
Y..
Y..
N = n 1 n 2 ... n a
Converting the original data into the ranks, we get
Levels
Observations
Sum i
Mean i
R 11 R 12 R 13 .....R 1n1
R 1.
R 1.
R 21 R 22 R 23 .....R 2 n 2
R 2.
R 2.
:
a
R a1 R a 2 R a 3 .....R an a
R a.
R a.
__________________________________________
Total
R ..
R ..
Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0

Test Statistics
a
R
12
R..
T =
n i i.
N ( N 1) i 1 n i
N
a
12
(R 2 i. / n i ) 3( N 1) ~ X 2 a 1
N( N 1) i 1
SAS program for the Kruskal-Wallis Test

data Kruskal;
input treat $ @;
do until (time = .);
input time @;
if time ne . then output;
end;
lines;
A 17 20 40 31 35
B8798
C2543
run;
proc npar1way wilcoxon;
class treat;
var time;
exact;
run;
proc means median;
class treat;
var time;
run;
NPAR1WAY PROCEDURE
Wilcoxon Scores (Rank Sums) for Variable TIME
Classified by Variable TREAT
TREAT
A
B
C
Sum of
Scores
N
5
4
4
Expected
Under H0
Std Dev
Under H0
55.0
35.0
6.82191040
26.0
28.0
6.47183246
10.0
28.0
6.47183246
Average Scores Were Used for Ties
Kruskal-Wallis Test
Mean
Score
11.0000000
6.5000000
2.5000000
S = 10.711
Exact P-Value
Prob >= S = 6.66E-05
Chi-Square Approximation
DF = 2
Prob > S = 0.0047
The MEANS Procedure

Analysis Variable : time
N
treat
Obs
Median
A
5
31.0000000
B
4
8.0000000
C
4
3.5000000
Conclusion: Reject H 0 : median A median B median C and conclude that the median
times of Treatment group A, B, and C are different. It seems that the median time
due to Treatment A is greater than the medians due to Treatments B and C. This
must be followed up by a nonparametric multiple comparison procedure.
Randomized Block Design
Friedman Test
Data
Let the data be in the following format;
Blocks
Levels
Sum i
Mean i
Y11 Y12 Y13 .....Y1b
Y1.
Y1.
Y21 Y22 Y23 .....Y2 b
Y2.
Y2.
Ya1 Ya 2 Ya 3 .....Yab
Ya .
Ya .
Sum j
Y.1 Y.2 Y.3 ..... Y.b
Y..
Mean j
Y.1 Y.2 Y.3 ..... Y.b
Y..
where N = ab
Converting the data into the separate ranks within each block, we get
Blocks
Levels 1 2 3
Sum i
Mean i
R 1 R 1 R 1 .....R 1
R 1.
R 1.
R 2 R 2 R 2 .....R 2
R 2.
R 2.
R a R a R a .....R a
R a.
R a.
:
a
Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0

Test Statistics
a
12
R 2 i. 3b(a 1) ~ X 2 a 1
ab(a 1) i 1
SAS program for the Friedman Test

DATA Fried;
INPUT BLOCK $ TRTMENT $ YIELD @@;
CARDS;
1 A 32.6
1 B 36.4
1 C 29.5 1 D 29.4
2 A 42.7
2 B 47.1
2 C 32.9 2 D 40.0
3 A 35.3
3 B 40.1
3 C 33.6 3 D 35.0
4 A 35.2
4 B 40.3
4 C 35.7 4 D 40.0
5 A 33.2
5 B 34.3
5 C 33.2 5 D 34.0
6 A 33.1
6 B 34.4
6 C 33.1 6 D 34.1
run;
PROC RANK;
BY BLOCK;
VAR YIELD;
RANKS RYIELD;
RUN;
proc freq;
tables block*trtment*ryield / noprint cmh;
title 'Friedman''s Chi-Square';
run;
proc means median;
class trtment;
var yield;
run;
Output
Friedman's Chi-Square
The FREQ Procedure
Cochran-Mantel-Haenszel Statistics (Based on Table Scores)
Statistic Alternative Hypothesis DF
Value
Prob
--------------------------------------------------------------1
Nonzero Correlation
1
0.7448 0.3881
2
Row Mean Scores Differ
3
12.6207 0.0055
3
General Association
12
27.7500 0.0060
Total Sample Size = 24
The MEANS Procedure
Analysis Variable : YIELD
N
TRTMENT
Obs
Median
------------------------------A
6
34.2000000
B
6
38.2500000
C
6
33.1500000
D
6
34.5500000
-------------------------------
Conclusion: We reject the null hypothesis and conclude that the median yields of
Treatment A, B, C and D are different. It seems that the median yield due to
Treatment B is greater than the medians due to Treatment A, C, and D. This must
be followed up by the nonparametric multiple comparison procedure.
Correlation
The Spearman Rank Correlation Coefficeint
The Spearman Rank Correlation test uses the ranks (rather than the actual values) of the
two sets of variables to calculate a statistic, the correlation coefficient: rs.
data rankcorr;
input age EEG @@;
lines;
20 98 21 75 22 95 24 100 27 99 30 65 31 64 33 70 35 85
38 74 40 68 42 66 46 48 51 54 53 63 55 52 58 67 60 55
run;
proc rank;
var age;
ranks rage;
run;
proc rank;
var EEG;
ranks rEEG;
run;
Proc corr;
var rage rEEG;
run;
Output
2 'VAR' Variables: RAGE
Variable
RAGE
REEG
N
18
18
Correlation Analysis
REEG
Simple Statistics
Mean Std Dev
Sum Minimum Maximum Label
9.5000 5.3385
171.0 1.0000 18.0000 RANK FOR VARIABLE AGE
9.5000 5.3385
171.0 1.0000 18.0000 RANK FOR VARIABLE EEG
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 18

RAGE
REEG
RAGE
1.00000
-0.76264
RANK FOR VARIABLE AGE
0.0
0.0002
REEG
-0.76264
1.00000
RANK FOR VARIABLE EEG
0.0002
0.0
Conclusion: The Spearman rank correlation between age and EEG is 0.76264. We
reject H 0 : 0 and conclude that the correlation is different from 0 based on the pvalue of 0.0002.
Goodness- Of-Fit Test

Kolmogorov-Smirnov Goodness- Of-Fit Test
1) The Kolmogorov-Smirnov test can be used to test whether a data is from a normal
distribution or not.
The graph in the right is a plot of the
empirical distribution function with a
normal cumulative distribution function
for 100 normal random numbers. The
Kolmogorov-Smirnov test is based on
the maximum distance between these
two curves.
The test statistic, here designated Dmax, is
the maximum difference between the
cumulative proportions of the two
patterns.
2) Suppose that we observe N = m + n observations, X 1 ,..., X m and Y1 ,..., Yn , which

are mutually independent, and from two populations.
The question becomes: are the populations the same or different?
Equivalently: H0: P(X < a) = P(Y < a) for all a .
To test this, define Empirical
Distribution Functions (EDFs):
( t ) # X' s
F
x
m
#
X
's
(t )
F
x
m
t
t
and
.
The test statistic, here designated

Dmax, is the maximum difference
between the cumulative
proportions of the two patterns.
PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as
1 2
KS max
n j (Fi ( x j ) F( x j )) 2 where j = 1,2,...,n
j
n i
The asymptotic Kolmogorov-Smirnov statistic is computed as
KS a KS n
If there are only two class levels, PROC NPAR1WAY computes the two-sample
Kolmogorov statistic as
D = maxj | F1 (xj) - F2(xj) |
where j = 1,2, ... ,n
SAS program for Kolmogorov-Smirnov Test

data oxide;
input group $ @;
do until (hemo = .);
input hemo @;
if hemo ne .
then output;
end;
cards;
1 13.7 13.8 14 14.1 14.1 14.2 14.4 15.3 15.3 15.6 15.7 15.9 16.5 16.6 16.7
2 15 16 16.2 16.3 16.8 16.9 17.1 17.4 17.5 15
run;
/* checking the normality of a single variable using the Kolmogorov-Smirnov test */
proc univariate normal;
var hemo;
run;
SAS output(edited)
Tests for Normality
Test
--Statistic---
-----p Value------
Shapiro-Wilk
W
0.943237 Pr < W
0.1925
Kolmogorov-Smirnov D
0.123822 Pr > D
>0.1500
Cramer-von Mises
W-Sq 0.053008 Pr > W-Sq >0.2500
Anderson-Darling
A-Sq 0.403511 Pr > A-Sq >0.2500
Conclusion: The distribution of the variable, hemo, may be considered normal.

/* comparing the distribution of two sample observations */
proc npar1way wilcoxon edf;
class group;
var hemo;
run;
/* edf = empirical distribution function */
SAS output
Kolmogorov-Smirnov 2-Sample Test (Asymptotic)

KS = 0.293939
D = 0.600000
KSa = 1.46969
Prob > KSa = 0.0266
Conclusion: The distribution of X is different from that of Y.

Regression
Nonparametric linear regression using the Theil estimate.
X
Y
X1
Y1
X2
Y2
....
....
Xn
Yn
Here, we construct estimates of the slope using (n(n-1))/2 pairs of observations,

and use the median of these as the estimate of the slope.
Suppose that we have ordered the data by their x values so that for (Yi, xi) and (Yj,
xj) satisfying i<j , then x i x j , and we now have: S i , j
Yj Yi
xi x j
Note that this causes a big problem if some of the x values are the same! In this case, one
must use the finite slopes (i.e. only use the values Si,j from x i x j and use the median of
this reduced set. Therefore, median(Si , j ) i j .
When one cannot assume that the error terms are symmetric about 0, find the n terms,
Yi * X i , i = 1,...,n. The median of these n terms is the estimate of the intercept.
In the following example, Y = acid levels and X = exercise times (in minutes). We want
to establish the relationship between these two variables.
Parametric linear regression
DATA npreg1;
input indep dep @@;
CARDS;
230 421
275 465
run;
175 278
150 105
315 618
360 550
290 482
425 750
proc reg;
model dep = indep;
run;
Model: MODEL1
Dependent Variable: DEP
Analysis of Variance
Source
Model
Error
C Total
Sum of
Mean
DF
Squares
Square
F Value
Prob>F
1 53614.66151 53614.66151
58.115
0.0003
6 5535.33849 922.55642
7 59150.00000
Root MSE
30.37361
R-square
Dep Mean
277.50000
Adj R-sq
C.V.
10.94545
Parameter Estimates
0.9064
0.8908
Parameter
Standard T for H0:
Variable DF
Estimate
Error Parameter=0
INTERCEP 1
INDEP
1
76.210483 28.50456809
0.438898 0.05757290
Nonparametric linear regression

DATA npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
DO I = 1 TO 8;
INPUT Y(I) X(I) @@;
END;
OUTPUT;
CARDS;
230 421 175 278 315 618
275 465 150 105 360 550
run;
290 482
425 750
DATA npreg2;
SET npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
DO I=1 TO 7;
DO J=I+1 TO 8;
SLOPE = (Y(J)-Y(I))/(X(J)-X(I));
OUTPUT;
END;
END;
KEEP SLOPE;
PROC SORT;
BY SLOPE;
run;
PROC PRINT;
TITLE 'THEIL SLOPE ESTIMATE EXAMPLE';
run;
proc means median;
var slope;
run;
Prob > |T|
2.674
7.623
0.0368
0.0003
Data npreg3;
set npreg1;
ARRAY X(8) X1-X8;
ARRAY Y(8) Y1-Y8;
/* when one assumes that the error terms are not symmetric about 0 */
DO I = 1 to 8;
inter1 = Y(I)- 0.4878*X(I); /* .4878 is the median of the slopes */
output;
END;
keep inter1;
Proc print;
var inter1;
run;
Proc means median;
var inter1;
run;
THEIL SLOPE ESTIMATE EXAMPLE
Obs
1
2
3
4
5
6
SLOPE
-0.66176
0.14451
0.18382
0.25316
0.26144
0.32164
7
0.32500
8
0.34722
9
0.37135
10
0.38462
11
0.41176
12
0.42636
13
0.43147
14
0.47191
15
16
17
18
19
20
21
22
0.50373
0.52632
0.52966
0.53476
0.56373
0.59271
0.68015
0.83333
23
24
25
26
27
28
0.88235
0.98361
1.00000
1.00775
1.02273
1.02941
5
6
7
48.1730
98.7810
91.7100
59.1500
The MEANS Procedure

Analysis Variable : SLOPE
Median
-----------0.4878207
-----------Obs
1
inter1
24.6362
2
3
4
39.3916
13.5396
54.8804
Analysis Variable : inter1

Median
-----------51.5267000
------------
Compare the linear regression equations

due to parametric regression:
and due to nonparametric regression:
76.2105 0.4389 * X
Y
51.5267 0.4878 * X .
Y
To compare the performance of the predicted values, lets compare the means of their
residuals as follows:
DATA npreg1;
input indep dep @@;
pred1 = 76.2105+ 0.4389*dep;
resid1 = indep - pred1;
np_pred2 = 51.5267 + 0.4878*dep;
resid2 = indep - np_pred2;
CARDS;
230 421
175 278
315 618
290 482
275 465
150 105
360 550
425 750
run;
proc means;
var resid1 resid2;

run;
The MEANS Procedure

Variable N
Mean
Std Dev
Minimum
Maximum
resid1
8
-0.0010125
28.1205022
-32.4507000
42.3945000
resid2
8
2.2560250
29.7632035
-37.9871000
47.2543000
In this case, seems that the parametric regression equation predicts the values of Y more closely than that of the
nonparametric regression. Still, the use of the method depends on the distribution assumption of the data.
Homework problems
1. A sample of 15 patients suffering from asthma participated in an experiment to study
the effect of a new treatment on pulmonary function. The dependent variable is FEV
(forced expiratory volume, liters, in 1 second) before and after application of the
treatment.
Subject
1
2
3
4
5
6
7
8
Before
1.69
2.77
1.00
1.66
3.00
.85
1.42
2.82
After
1.69
2.22
3.07
3.35
3.00
2.74
3.61
5.14
Subject
9
10
11
12
13
14
15
Before
2.58
1.84
1.89
1.91
1.75
2.46
2.35
After
2.44
4.17
2.42
2.94
3.04
4.62
4.42
On the basis of these data, can one conclude that the treatment is effective in increasing
the FEV level? Let 0.05 and find the p-value.
a) Perform the sign test.
b) Perform the Wilcoxon signed-rank test.
2. From the same context with Problem 1, subjects 1-8 came from Clinic A and subjects
9 15, from Clinic B. Can one conclude that the FEV levels from these two groups are
different? Let 0.05 and find the p-value. Perform the Wilcoxon Mann-Whitney
Test.
Subject
1
2
3
4
5
Clinic A
1.69
2.22
3.07
3.35
3.00
Subject
9
10
11
12
13
Clinic B
2.44
4.17
2.42
2.94
3.04
6
7
8
2.74
3.61
5.14
14
15
4.62
4.42
3. From the same context with Problems 1 & 2, subjects 1-8 came from Clinic A,
subjects 9 15, from Clinic B, and subjects 16-20 from Clinic C were added later. Can
one conclude that the FEV levels from these three groups are different? Let 0.05
and find the p-value. Perform the Kruskal-Wallis Test.
Subject
1
2
3
4
5
6
7
8
Clinic A
1.69
2.22
3.07
3.35
3.00
2.74
3.61
5.14
Subject
9
10
11
12
13
14
15
Clinic B
2.44
4.17
2.42
2.94
3.04
4.62
4.42
Subject
16
17
18
19
20
Clinic C
2.34
3.17
4.42
4.94
5.04
4. The following table shows the scores made by nine randomly selected student nurses
on final examination in three subject areas:
Student number
1
2
3
4
5
6
7
8
9
Subject Area
Fundamentals
98
95
76
95
83
99
82
75
88
Physiology
95
71
80
81
77
70
80
72
81
Anatomy
77
79
91
84
80
93
87
81
83
Test the null hypothesis that student nurses from which the above sample was drawn
perform equally well in all three subject areas against the alternative hypothesis that they
perform better in, at least, one area. Let 0.05 and find the p-value. Perform the
Friedman Test.
5. From Problem 4, find the Spearman rank correlation between the Physiology scores
and the Anatomy scores.

Nonparametric Methods

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Nonparametric Methods

Caricato da

Copyright:

Formati disponibili

Nonparametric methods

Nonparametric tests are often used in place of their parametric counterparts

Spearman Rank Correlation

Chi-Squared goodness of fit

2. Parametric methods are sometimes robust to certain types of

From EXCEL, the p-value =2*[1- NORMDIST(2.33,0,1,TRUE)] = 0.020 < 0.05.

Wilcoxon Signed Ranks Test

Two Paired Samples

1. H 0 : median Treatment median Control vs H A : Not H 0 .

From EXCEL, the p-value =2*[1- NORMDIST(1.81,0,1,TRUE)] = 0.07 > 0.05.

Wilcoxon Signed Rank Test

Two Independent Samples

where n = max (n1, n2)

infile cards missover;

Wilcoxon Scores (Rank Sums) for Variable HEMO

The MEANS Procedure

Y11 Y12 Y13 .....Y1n1

Y21 Y22 Y23 .....Y2 n 2

Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0

SAS program for the Kruskal-Wallis Test

The MEANS Procedure

Let the data be in the following format;

Y11 Y12 Y13 .....Y1b

Y21 Y22 Y23 .....Y2 b

Y.1 Y.2 Y.3 ..... Y.b

Y.1 Y.2 Y.3 ..... Y.b

Hypotheses H 0 : Median 1 Median 2 ... Median a vs H A : Not H 0

SAS program for the Friedman Test

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 18

Goodness- Of-Fit Test

2) Suppose that we observe N = m + n observations, X 1 ,..., X m and Y1 ,..., Yn , which

The test statistic, here designated

where j = 1,2, ... ,n

SAS program for Kolmogorov-Smirnov Test

Conclusion: The distribution of the variable, hemo, may be considered normal.

/* edf = empirical distribution function */

Kolmogorov-Smirnov 2-Sample Test (Asymptotic)

Conclusion: The distribution of X is different from that of Y.

Here, we construct estimates of the slope using (n(n-1))/2 pairs of observations,

Nonparametric linear regression

Prob > |T|

The MEANS Procedure

Analysis Variable : inter1

Compare the linear regression equations

var resid1 resid2;

The MEANS Procedure

b) Perform the Wilcoxon signed-rank test.

Potrebbero piacerti anche