Sei sulla pagina 1di 7

CHI-SQUARED TEST FOR NORMALITY (pp.

562-564)
Please refer to Example 11.1 in the Text and in your notes. We
computed the mean (460.38) and standard deviation (38.827) for
50 workers.
505
400
499
415
418

467
551
444
481
429

480
466
477
445
413

537
484
418
465
496

487
373
416
424
471

427
509
410
515
435

482
442
465
449
523

488
508
432
405
440

409
501
440
444
485

475
470
485
469
450

Find the probabilities associated with a set of intervals: e.g., 6


intervals given by the 2k n rule, where k is the number of
categories.
Max = 551; Min = 373; Range = 178; Range/k = 178/6 = 29.667

Class width
Frequencies per class:
402.67
402.67 432.33
432.33 462.00
462.00 491.66
491.66 521.33
521.33

l1
111111111111
111111111
11111111111111111
1111111
111

2
12
9
17
7
3
=======
50

18
16
14
12
10
8
6
4
2
0
1

How like the standard normal curve is the frequency distribution?


What would be the case if the normal distribution applied?
P(X 402.67) = P{[(X - )/ ] [(402.67-460.38)/38.827] =
0.0681
z1 = -; z2 = -1.486 0.5 0.4319 = 0.0681
P(402.67 < X 432.33) =
P{[(402.67-460.38)/38.827] < [(X - )/ ]
[(432.33-460.38)/38.827] = 0.1677
z1 = -1.486; z2 = -0.722 0.4319 - 0.2642 = 0.1677
P(432.33 < X 462.00) =
P{[(432.33-460.38)/38.827] < [(X - )/ ]
[(462.00-460.38)/38.827] = 0.2802
z1 = -0.722; z2 = 0.042 0.2642 + 0.0160 = 0.2802
P(462.00 < X 491.66) =
P{[(462.00-460.38)/38.827] < [(X - )/ ]
[(491.66-460.38)/38.827] = 0.2750

z1 = 0.042; z2 = 0.806 0.2910 - 0.0160 = 0.2750


P(491.66 < X 521.33) =
P{[(491.66-460.38)/38.827] < [(X - )/ ]
[(521.33-460.38)/38.827] = 0.1508
z1 = 0.806; z2 = 1.570 0.4418 - 0.2910 = 0.1508
P(X > 521.33) = P{[(X - )/ ] [(521.33-460.38)/38.827] =
0.0582
z1 = 0.4418; z2 = 0.5 0.4418 = 0.0582
Probability
per Class
0.0681
0.1677
0.2802
0.275
0.1508
0.0582

Expected Value
in Class

Observed
Frequencies

3.405
8.385
14.01
13.75
7.54
2.91

2
12
9
17
7
3

How different are the expected and the observed frequencies?

18
16
14
12
10

Expected
Observed

8
6
4
2
0
1

The Chi-Square Statistic:


6

{(fi ei)2 / ei}

=
i=1

{(fi ei)2 / ei}


0.579743025
1.55852415
1.791584582
0.768181818
0.03867374
0.002783505

4.739490821

The rejection region:

>

,k-3

0.05,3

= 7.81473

From Table 5, B-10

Since 2 = 4.739 < 2critical = 7.815 there is no


evidence to conclude that these data are not
normally distributed.

HOWEVER, we did violate one of the classic requirements


for the Chi Square test: The Rule of Five: The test statistic has
an approximate chi-squared distribution. The actual distribution of
the test statistic is discrete, but can be approximated conveniently
by a continuous chi-squared distribution when the sample size is
large, just as we approximated the discrete binomial distribution by
using the normal distribution. This approximation may be poor,
however, if the expected cell frequencies are small. As a
consequence, we commonly require that the expected
frequency of each cell be at least 5.
So, the four classes used in your Text (pp. 562-564) is safer!! The
authors use the Mean One Standard Deviation and Mean +
One Standard Deviation as two central categories, and then two
open-ended classes below (Less than 421.55) and above (More
than 499.21) those two classes. Study the procedure in the Text.
There is no evidence here either to conclude that these data are
not normally distributed.
THE LILLIEFORS TEST FOR NORMALITY
(16.6 in Text: pp. 613-616)
The Lilliefors Test is a nonparametric test for normality that can
be applied to any number of observations.
An alternative test is called the Kolmogorov-Smirnow test where,
however, we assume that we know the mean and the standard
deviation of the population. That is frequently unrealistic.

The example in your book goes as follows:


x
110
89
102
80
93
121
108
97
105
103

p(x<=80) =
p(x<=89) =
p(x<=93) =
p(x<=97) =
p(x<=102) =
p(x<=103) =
p(x<=105) =
p(x<=108) =
p(x<=110) =
p(x<=121) =

Sorted in
ascending order
80
89
93
97
102
103
105
108
110
121

< or =
each x
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
S(x)

x - mean
st. dev
-1.79
-1.02
-0.67
-0.33
0.10
0.19
0.36
0.62
0.79
1.74

|F(x)-S(x)|
0.5-0.4633 =
0.5-0.3461 =
0.5-0.2486 =
0.5-0.1293 =
0.5+0.0398 =
0.5+0.0753 =
0.5+0.1406 =
0.5+0.2324 =
0.5+0.2852 =
0.5+0.4591 =

Z score

Prob.

0.0367
0.1539
0.2514
0.3707
0.5398
0.5753
0.6406
0.7324
0.7852
0.9591
F(x)

0.0633
0.0461
0.0486
0.0293
0.0398
0.0247
0.0594
0.0676
0.1148
0.0409

S(x) is the proportion of values of x that are less than or equal to


80,
89,
93,
..
1/10 = 0.1,
2/10 = 0.2,
3/10 = 0.3,
..
H0: The data are normally distributed
HA: The data are not normally distributed

We estimate and from the sample statistics:

Mean
Standard Deviation

100.8
11.62181856

F(x) is the cumulative distribution function: [(x- ) / ]

If the null hypothesis is true the sample and normal cumulative


probabilities should be similar. If the null hypothesis is false we
expect S(x) and F(x) to differ for at least some of the values of x.
We define the test statistic D as the largest absolute difference
between S(x) and F(x):
D = max|F(x)-S(x)|

In our case this value is: 0.1148

D is then compared with the critical values in the Lilliefors table:


Appendix B, Table 10 (B21)
Look up the sample size (in our case 10) and an appropriate
, here 0.05: 0.258 which is the critical value.
Dmax = 0.1148 < 0.258
we do not reject H0 and
conclude that there is not enough evidence to infer that the
data are nonnormally distributed use parametric tests.

Potrebbero piacerti anche