Sei sulla pagina 1di 49

GROUP COMPARISION AND TEST STATISTIC INTRODUCTION: According to the Coffee Research Organization the typical American coffee

drinker consumes an average of 3.1 cups per day. A sample of 12 senior citizens reported the amounts of coffee in cups consumed in a particular day. At the 0.05 significance level does the sample data provided in the exercise suggest a difference between the national average and the sample mean from senior citizens? Sample Data: 3.1, 3.3, 3.5, 2.6, 2.5, 4.3, 4.4, 3.8, 3.1, 4.1, 3.1, 3.2 To answer this question understanding of hypothesis and hypothesis testing is required. WHAT IS HYPOTHESIS? A hypothesis is a statement about a population subject to verification. Data are then used to check the reasonableness of the statement. In Statistical analysis we make a claim, that is, state a hypothesis, collect data, then use the data to test the assertion. In most of the cases the population is so large that it is not feasible to study all the items, objects, or persons in the population. For example, it would be not possible to contact every systems analyst/engineer in Bangalore to find out his or her monthly income. WHAT IS HYPOTHESIS TESTING? The terms hypothesis testing starts with a statement, assumptions, about a population parameter-such as population mean. This statement is referred to as a hypothesis. A hypothesis might that the mean monthly commission of a sales person in retail consumer stores, such as Kannan Store, is Rs. 3000. The cost of locating and interviewing every retail consumer store would be very expensive. To test the validity of the assumption (mean = 3000), we must select a sample from the population of all retail consumer stores, calculate sample statistics, and based on certain decision rules accept or reject the hypothesis. A sample mean of Rs. 2000 for the retail store would certainly cause rejection of the hypothesis. However, suppose the sample mean is Rs. 2995. Is that close enough to Rs. 3000 for us to accept the assumption that the population mean is Rs. 3000? Can we attribute the difference of Rs.5 between the two means to sampling error, or is that difference statistically significant? Thus, it is a procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement. 1

A FLOW CHART FOR TESTING UNIVARIATE DATA UNIVARIATE TECHNIQUE NON PARAMETRIC STATISTICS ONE SAMPLE Chi squareChisquare Kolmogorov -Smirnov Runs 2 OR MORE SAMPLE PARAMETRIC STATISTICS ONE SAMPLE Z test T test 2 OR MORE SAMPLE

INDEPENDENT SAMPLES

DEPENDENT SAMPLES

INDEPENDENT SAMPLES

DEPENDENT SAMPLES

Chi square Kolmogorov -Smirnov Runs


Wilcoxon-Mann-Whitney test

T test Z test ANOVA

Paired sample T test

Kruskal-Wallis H Test Wilcoxon - Signed Rank Sum Test Mc Nemar

PARAMETRIC STATISTICS ONE SAMPLE- Z TEST This can also be referred as testing for a population mean when population standard deviation is known. Stage 1: Objective: To find out whether the sample mean is different from the proposed value for the population mean. We have the historic information about the population and in fact we know its standard deviation. 2

Example case: The scores on an aptitude test required for entry into a certain job position have a mean of 500 and a standard deviation of 120. If a random sample of 36 applicants has a mean of 546, is there evidence that their mean score is different from the mean that is expected from all applicants? State2: Designing Research Hypothesis Null and Alternative Hypothesis : = 500 Ha: 500 Convert 546 to a z-score to compare it to the assumed population mean.
z= x 546 500 46 = = = 2.3 120 20 n 36

Stage3: Checking for Assumptions The underlying distribution is normal or the Central Limit Theorem can be assumed to hold The sample has been randomly selected The population standard deviation is known or the sample size is at least 25.

Stage4: Output This means that 546 is 2.3 standard deviations from the hypothesized mean. sing the ztable, we find that the probability that a value is to the right of 2.3 or to the left of -2.3 is 2*(.0107) = 0.0214. This value is called the p value p = 0.0214. This probability is considered very small (values less than 0.05 are typically considered small). Thus, if the mean is really 500, it is unlikely that we would get a sample mean that is 2.3 standard deviations from it. Thus, we conclude that the population mean is not 500; that is we reject the null hypothesis and accept the alternate, concluding that the mean is not 500. The probability that we are rejecting a true null hypothesis is 0.0294 (the value of p). Lets construct a 95% confidence interval estimate of the population mean.

546 1.96*( 546 39.2

120 36

The lower limit of the interval is 546 - 39.2 = 506.8 The upper limit of the interval is 546 + 39.2 = 585.2 Thus, we conclude that the actual mean score for the population from which this sample was drawn falls between 507 and 585. ONE SAMPLE T- TEST Stage 1: Objective To test whether the average writing score (write) differs significantly from 50. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing The single sample t-test tests the null hypothesis that the population mean is equal to the number specified by the user. Thus our null and alternative hypotheses: H0: 200 Students =50 H1: 200 Students 50 SPSS calculates the t-statistic and its p-value under the assumption that the sample comes from an approximately normal distribution. If the p-value associated with the ttest is small (0.05 is often used as the threshold), there is evidence that the mean is different from the hypothesized value. If the p-value associated with the t-test is not small (p > 0.05), then the null hypothesis is not rejected and you can conclude that the mean is not different from the hypothesized value. Stage3: Checking for Assumptions 4

the underlying distribution is normal the samples have been randomly and independently selected from population the variability of the measurements in the population is the same and can be measured by a common variance.

Stage4: Output Go to Analyse- Compare means- One sample T test- Select write as test variable test value as 50-Ok
One-Sample Statistics N writing score 200 Mean 52.7750 Std. Deviation 9.47859 Std. Error Mean .67024

One-Sample Test Test Value = 50 95% Confidence Interval of the Difference t writing score 4.140 df 199 Sig. (2-tailed) .000 Mean Difference 2.77500 Lower 1.4533 Upper 4.0967

In this example, the t-statistic is 4.140 with 199 degrees of freedom. The corresponding two-tailed p-value is .000, which is less than 0.05. We conclude that the mean of variable write is different from 50. State 5: Reporting the results The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly different from the test value of 50. We would conclude that this group of students has a significantly higher mean on the writing test than 50. TWO OR MORE SAMPLE INDEPENDENT SAMPLES- T TEST

Stage 1: Objective To test whether the mean for write is the same for males and females. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing This t-test is designed to compare means of same variable between two groups. In our example, we compare the mean writing score between the group of female students and the group of male students. Ideally, these subjects are randomly selected from a larger population of subjects. The test assumes that variances for the two populations are the same. The interpretation for p-value is the same as in other type of t-tests. Thus our null and alternative hypotheses: H0: male= female H1: male female Stage3: Checking for Assumptions The underlying distribution is normal The samples have been randomly and independently selected from two populations, The variability of the measurements in the two populations is the same and can be measured by a common variance.

Stage4: Output Go to Analyse- Compare means- Independent sample T test- Select write as test variable Grouping Variable as female- OK

Group Statistics female writing score male female N 91 109 Mean 50.1209 54.9908 Std. Deviation 10.30516 8.13372 Std. Error Mean 1.08027 .77907

Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F 11.133 Sig. .001 t -3.734 df 198 Sig. (2tailed) .000 Mean Difference -4.86995 Std. Error Difference 1.30419 Lower -7.44183 Upper -2.29806

writing score

Equal variances assumed Equal variances not assumed

-3.656

169.707

.000

-4.86995

1.33189

-7.49916

-2.24073

In this example, the t-statistic is -3.7341 with 198 degrees of freedom. The corresponding two-tailed p-value is 0.0002, which is less than 0.05. We conclude that the difference of means in write between males and females is different from 0. State 5: Reporting the results

f. - This column lists the dependent variable(s). In our example, the dependent variable is write (labeled "writing score"). g. - This column specifies the method for computing the standard error of the difference of the means. The method of computing this value is based on the assumption regarding the variances of the two groups. If we assume that the two populations have the same variance, then the first method, called pooled variance estimator, is used. h. F - The test statistic of the two-sample F test is a ratio of sample variances, F = s 12/s22 where it is completely arbitrary which sample is labeled sample 1 and which is labeled sample 2. i. Sig. - This is the two-tailed p-value associated with the null that the two groups have the same variance. In our example, the probability is less than 0.05. So there is evidence that the variances for the two groups, female students and male students, are different. Therefore, we may want to use the second method (Satterthwaite variance estimator) for our t-test. j. t - These are the t-statistics under the two different assumptions: equal variances and unequal variances. These are the ratios of the mean of the differences to the standard errors of the difference under the two different assumptions: (-4.86995 / 1.30419) = -3.734, (-4.86995/1.33189) = -3.656.

k. df - The degrees of freedom when we assume equal variances is simply the sum of the two sample sized (109 and 91) minus 2. The degrees of freedom when we assume unequal variances is calculated using the Satterthwaite formula. l. Sig. (2-tailed) - The p-value is the two-tailed probability computed using the t distribution. It is the probability of observing a t-value of equal or greater absolute value under the null hypothesis. For a one-tailed test, halve this probability. If the p-value is less than our pre-specified alpha level, usually 0.05, we will conclude that the difference is significantly different from zero. For example, the p-value for the difference between females and males is less than 0.05 in both cases, so we conclude that the difference in means is statistically significantly different from 0. m. Mean Difference - This is the difference between the means. n. Std Error Difference - Standard Error difference is the estimated standard deviation of the difference between the sample means. If we drew repeated samples of size 200, we would expect the standard deviation of the sample means to be close to the standard error. This provides a measure of the variability of the sample mean. The Central Limit Theorem tells us that the sample means are approximately normally distributed when the sample size is 30 or greater. Note that the standard error difference is calculated differently under the two different assumptions. o. 95% Confidence Interval of the Difference - These are the lower and upper bound of the confidence interval for the mean difference. A confidence interval for the mean specifies a range of values within which the unknown population parameter, in this case the mean, may lie. It is given by

where s is the sample deviation of the observations and N is the number of valid observations. The t-value in the formula can be computed or found in any statistics book with the degrees of freedom being N-1 and the p-value being 1-alpha/2, where alpha is the confidence level and by default is .95. 9

INDEPENDENT SAMPLES- Z test In the previous section we selected a single random sample from a population and conducted a test of whether the proposed population value was reasonable. Now we expand the idea of hypothesis testing to two samples. that is, we select random samples from two different populations to determine whether the population means or proportions are equal. We may want to ask, Is there an increase in the production rate if music is piped into the production area? Stage 1: Objective To test the likelihood that the population means of concentrations of the element are the same for men and women? Example case: The amount of a certain trace element in blood is known to vary with a standard deviation of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors. Random samples of 75 male and 50 female donors yield concentration means of 28 and 33 ppm, respectively. What is the likelihood that the population means of concentrations of the element are the same for men and women? Stage 2: Designing Formula:

where

and

are the means of the two samples, is the hypothesized difference

between the population means (0 if testing for equal means), 1 and 2 are the standard deviations of the two populations, and n1and n2are the sizes of the two samples. Null hypothesis: H0: 1 = 2 or H0: 1 2= 0 alternative hypothesis: Ha : 1 2

10

or: Ha : 1 2 0 Stage3: Checking for Assumptions Two normally distributed but independent populations, is known Stage4: Output

The computed z-value is negative because the (larger) mean for females was subtracted from the (smaller) mean for males. But because the hypothesized difference between the populations is 0, the order of the samples in this computation is arbitrary as well have been the female sample mean and could just the male sample mean, in which case z

would be 2.37 instead of 2.37. An extreme z-score in either tail of the distribution (plus or minus) will lead to rejection of the null hypothesis of no difference. The area of the standard normal curve corresponding to a z-score of 2.37 is 0.0089. Because this test is two-tailed, that figure is doubled to yield a probability of 0.0178 that the population means are the same. If the test had been conducted at a pre-specified significance level of < 0.05, the null hypothesis of equal means could be rejected. If the specified significance level had been the more conservative (more stringent) < 0.01, however, the null hypothesis could not be rejected. In practice, the two-sample z-test is not used often, because the two population standard deviations 1 and 2 are usually unknown. Instead, sample standard deviations and the tdistribution are used. INDEPENDENT SAMPLES- ANOVA Stage 1: Objective

11

To test for differences in the means of the dependent variable broken down by the levels of the independent variable. For example, using the HSB we wish to test whether the mean of write differs between the three program types (prog). Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing We need to examine whether there are any significant differences in the write that is writing scores from prog that is program type (general, academic and vocation). Ideally we assume populations to follow the normal distribution, population to have equal standard deviation or variances and populations are independent. Thus our null and alternative hypotheses: H0: general= academic= vocation H1: The mean scores are not equal Dependent Variable- write Independent variable- prog Stage3: Checking for Assumptions

Independent variable consists of two or more categorical independent groups. Dependent variable is either interval or ratio (continuous) Dependent variable is approximately normally distributed for each category of the independent variable Equality of variances between the independent groups (homogeneity of variances). Independence of cases.

Stage 4: Output Click Analyze - Compare Means - One-Way ANOVA- Drag-and-drop (or use the buttons) to transfer the dependent variable (write) into the Dependent List: box and the 12

independent variable (prog) into the Factor- Click the Post Hoc- button.-Tick the "Tukey"- Click the OPTIONS Button- Tick the "Descriptive", "Homogeneity of variance test", "Welch" checkboxes in the Statistics area as shown below: OK

13

Descriptives writing score 95% Confidence Interval for Mean N general academic vocation Total 45 105 50 200 Mean 51.3333 56.2571 46.7600 52.7750 Std. Deviation 9.39778 7.94334 9.31875 9.47859 Std. Error 1.40094 .77519 1.31787 .67024 Lower Bound 48.5099 54.7199 44.1116 51.4533 Upper Bound 54.1567 57.7944 49.4084 54.0967 Minimum 31.00 33.00 31.00 31.00 Maximum 67.00 67.00 67.00 67.00

The descriptives table (see above) provides some very useful descriptive statistics including the mean, standard deviation and 95% confidence intervals for the dependent variable (write) for each separate group (general, academic and vocation) as well as when all groups are combined (Total). These figures are useful when you need to describe your data. From the above output it is obvious that the students in the academic program have the highest mean writing score, while students in the vocational program have the lowest. Homogeneity of Variances Table
Test of Homogeneity of Variances writing score Levene Statistic 1.726 df1 2 df2 197 Sig. .181

One of the assumptions of the one-way ANOVA is that the variances of the groups you are comparing are similar. The table Test of Homogeneity of Variances (see above) shows the result of Levene's Test of Homogeneity of Variance, which tests for similar variances. If the significance value is greater than 0.05 (found in the Sig. column) then you have homogeneity of variances. We can see from this example that Levene's F Statistic has a significance value of 0.181 and, therefore, the assumption of homogeneity of variance is met. What if the Levene's F statistic was significant? This would mean that you do not have similar variances and you will need to refer to the Robust Tests of Equality of Means Table instead of the ANOVA Table.

14

ANOVA Table
ANOVA writing score Sum of Squares Between Groups Within Groups Total 3175.698 14703.177 17878.875 df 2 197 199 Mean Square 1587.849 74.635 F 21.275 Sig. .000

This is the table that shows the output of the ANOVA analysis and whether we have a statistically significant difference between our group means. We can see that in this example the significance level is 0.000 (P = .000), which is below 0.05 and, therefore, there is a statistically significant difference in the mean of general, academic and vocation with respect to writing score. This is great to know but we do not know which of the specific groups differed. Luckily, we can find this out in the Multiple Comparisons Table which contains the results of post-hoc tests. Robust Tests of Equality of Means Table
Robust Tests of Equality of Means writing score Statistica Welch 20.421 df1 2 df2 90.895 Sig. .000

a. Asymptotically F distributed.

We discussed earlier that even if there was a violation of the assumption of homogeneity of variances we could still determine whether there were significant differences between the groups by not using the traditional ANOVA but using the Welch test. Like the ANOVA test, if the significance value is less than 0.05 then there are statistically significant differences between groups. As we did have similar variances we do not need to consult this table for our example.

Post Hoc Tests

15

Multiple Comparisons writing score Tukey HSD (I) type of program general (J) type of program academic vocation academic general vocation vocation general academic Mean Difference (I-J) -4.92381* 4.57333* 4.92381* 9.49714* -4.57333* -9.49714* Std. Error 1.53928 1.77518 1.53928 1.48443 1.77518 1.48443 Sig. .005 .029 .005 .000 .029 .000 95% Confidence Interval Lower Bound -8.5589 .3811 1.2887 5.9916 -8.7655 -13.0027 Upper Bound -1.2887 8.7655 8.5589 13.0027 -.3811 -5.9916

*. The mean difference is significant at the 0.05 level.

From the results so far we know that there are significant differences between the groups as a whole. The table above, Multiple Comparisons, shows which groups differed from each other. The Tukey post-hoc test is generally the preferred test for conducting posthoc tests on a one-way ANOVA but there are many others. We can see from the table below that there is a significant difference in writing score between the group general, academic and vocation as the p value for all groups is less than 0.05.

Homogeneous Subsets

writing score Tukey HSDa,,b Subset for alpha = 0.05 type of program vocation general academic Sig. N 50 45 105 1.000 1.000 1 46.7600 51.3333 56.2571 1.000 2 3

Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 57.975. b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.

16

The next part of the SPSS output (shown above) summarizes the results of the multiple comparisons procedure. Often there are several subset columns in this section of the output. The means listed in each subset column are not statistically reliably different from each other. In this example, all three means are listed in a different subset column, so all the means are reliably different from any of the other means. This is consistent with the fact that we rejected the null hypothesis of the ANOVA. DEPENDENT SAMPLES PAIRED SAMPLE T TEST The dependent t-test can also look for "changes" between means when the subjects are measured on the same dependent variable but at two time points. A common use of this is in a pre-post study design. In this type of experiment we measure subjects at the beginning and at the end of some intervention, e.g. an exercise-training programme or business-skills course. For example, you might want to investigate whether a course of diet counseling can help people lose weight. To study this you could simply measure subjects' weight before and after the diet counseling course for any changes in weight using a dependent t-test. However, to improve the study designs you also include want to include a control trial. During this control trial, the subjects could either receive "normal" counseling or do nothing at all or something else you deem appropriate.

Stage 1: Objectives The dependent t-test (also called the paired t-test or paired-samples t-test) compares the means of two related groups to detect whether there are any statistically significant differences between these means. OR 17

A paired (samples) t-test is used when you have two related observations (i.e., two observations per subject) and you want to see if the means on these two normally distributed interval variables differ from one another. For example, using the HSB data file we will test whether the mean of read is equal to the mean of write. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing You need one dependent variable that is measured on an interval or ratio scale. You also need one categorical variable that has only two related groups. A dependent t-test is an example of a "within-subjects" or "repeated-measures" statistical test. This indicates that the same subjects are tested more than once. Thus, in the dependent t-test, "related groups" indicates that the same subjects are present in both groups. The reason that it is possible to have the same subjects in each group is because each subject has been measured on two occasions on the same dependent variable. For example, you might have measured 10 individuals' (subjects') performance in a spelling test (the dependent variable) before and after they underwent a new form of computerised teaching method to improve spelling. You would like to know if the computer training improved their spelling performance. Here, we can use a dependent t-test as we have two related groups. The first related group consists of the subjects at the beginning (prior to) the computerised spell training and the second related group consists of the same subjects but now at the end of the computerised training. Research Hypothesis Null Hypothesis (H0): No Significant Difference (in Means of Two Groups) H0: 1 = 2 Alternative Hypothesis (H1): Significant Difference (in Means of Two Groups) HA: 1 2 If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

18

The distribution of the differences between the scores of the two related groups needs to be normally distributed. We do this by simply subtracting each individuals score in one group from their score in the other related group and then testing for normality in the normal way. It is important to note that the two related groups do not need to be normally distributed themselves - just the differences between the groups. Stage4: Output Click Analyze - Compare Means Paired sample T test- drag write and read to paired variables table as variable 1 and variable 2- OK

Paired Samples Statistics Mean Pair 1 writing score reading score 52.7750 52.2300 N 200 200 Std. Deviation 9.47859 10.25294 Std. Error Mean .67024 .72499

Not much difference in the MEAN


Paired Samples Correlations N Pair 1 writing score & reading score 200 Correlation .597 Sig. .000

19

Paired Samples Test Paired Differences 95% Confidence Interval of Std. Error Mean Pair 1 writing score - reading score .54500 8.88667 .62838 -.69414 1.78414 .867 199 .387 Std. Deviation Mean the Difference Lower Upper t df Sig. (2-tailed)

These results indicate that the mean of read is not statistically significantly different from the mean of write (t = -0.867, p = 0.387). State 5: Reporting the results We might report the statistics in the following format: t(degrees of freedom[df]) = tvalue, P = significance level. In our case this would be: t(199) = .867, P IS NOT < 0.0005. We can conclude that there is NO statistical significant difference in the reading and writing score of students. N.B. SPSS will output many results to many decimal places but you should understand your measuring scale to know whether it is appropriate to report your results in such accuracy. Example-2 Fairness cream manufacturing company has measured the attitude of sample of 18 respondents before the brand is advertised and after the advertisement. The below table contains the resultant computer output for a paired sample t test. The variable before and after ad compagin is on 10 point scale with 1 representing Brand is Highly Disliked and a rating of 10 represents Brands is Highly Liked with other ratings having appropriate meanings. Assume that we had set the significance level at 0.05, and that the Null hypothesis is that there is no difference in the ratings given by respondents before and after they saw the ad campaign.
Respond Before Ad After Ad

20

ents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Campaign 3 4 2 5 3 4 5 3 4 2 2 4 1 3 6 3 2 3

Campaign 5 6 6 7 8 4 6 7 5 4 6 7 4 6 8 4 5 6

SPSS CommandsAs done in the previous example click on Compare Means- paired sample t test- Select two variables from the variable list appearing on the left side- transfer it to the box- OK

21

Paired Samples Statistics Mean Pair 1 Before Ad Campaign After Ad Campaign 3.28 5.78 N 18 18 Std. Deviation 1.274 1.309 Std. Error Mean .300 .308

Paired Samples Statistics Mean Pair 1 Before Ad Campaign After Ad Campaign 3.28 5.78 N 18 18 Std. Deviation 1.274 1.309 Std. Error Mean .300 .308

Paired Samples Test Paired Differences 95% Confidence Interval of Std. Error Mean Pair 1 Before Ad Campaign After Ad Campaign -2.500 Std. Deviation 1.295 Mean .305 the Difference Lower -3.144 Upper -1.856 t -8.192 df 17 Sig. (2-tailed) .000

Reporting of SPSS output: The output table above shows the 2-tailed significance of the test is 0.000. This is the p value, and it is less than the level of 0.05 we had set. Therefore, as per our decision rule specified in the example, we have to reject the null hypothesis at a sig level of 0.05, and conclude that there is a significant difference in the ratings given by respondents BEFORE and AFTER their exposure to the ad campaign. The mean ratings after the ad campaign are 5.78 and before the campaign, it is 3.28, and the difference of 2.5 is statistically significant. If we have a large sample of greater than 30, we can go for Z test with the same hypothesis. NON-PARAMETRIC STATISTICS 22

ONE SAMPLE- CHI-SQUARE A chi-square goodness of fit test allows us to test whether the observed proportions for a categorical variable differ from hypothesized proportions. For example, let's suppose that we believe that the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks. We want to test whether the observed proportions from our sample differ significantly from these hypothesized proportions. Stage 1: Objectives To test whether the observed proportions for a categorical variable differ from hypothesized proportions. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): No Significant Difference (in hypothesized and observed value) Alternative Hypothesis (H1): Significant Difference (in hypothesized and observed value) If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

One categorical variable, with two or more categories A hypothesized proportion (equal or unequal) No more than 20% of expected frequencies have counts less than 5.

Stage4: Output Click Analyze Nonparametric test- Chi-square Test- drag raceto test variable list-Value-enter expected value (10, 10, 10, 70)-OK

23

race Observed N hispanic asian african-amer white Total 24 11 20 145 200 Expected N 20.0 20.0 20.0 140.0 Residual 4.0 -9.0 .0 5.0

Test Statistics race Chi-Square df Asymp. Sig. a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 20.0. 5.029a 3 .170

24

State 5: Reporting the results These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.029, p = .170). ONE SAMPLE- KOLMOGOROV-SMIRNOV Stage 1: Objectives To test whether the variable write is normally distributed. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Since we are checking sample mean against hypothesized value, we should be using t test. But before using t test which is shown in the previous part of the document, we really need to check for normality both visually and numerically. We shall first check the normality with histogram plot and then proceed with KS test. Hypothesis Null Hypothesis (H0): The distribution of write is normal Alternative Hypothesis (H1): The distribution of write is not normal. If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

Nil

Stage4: Output Click graphs Legacy dialogs-Histogram-choose write as variable-click the box next to Display normal curve-OK

25

26

A truly normal curve is shaped like a bell that peaks in the middle and is perfectly symmetrical. This histogram does not appear to have a perfectly bell shaped curve. We cannot expect a perfect bell shaped curve as we are looking only at a sample and not the population. Since we are not very sure, we would like to conduct Kolmogorov-Smirnov test to evaluate the normality assumption. Click Analyze-Nonparametric Tests 1 sample K-S Drag write to Test variable listClick options- check descriptive-click Normal distribution in Test Distribution boxOK

27

Descriptive Statistics N writing score 200 Mean 52.7750 Std. Deviation 9.47859 Minimum 31.00 Maximum 67.00

28

One-Sample Kolmogorov-Smirnov Test writing score N Normal Parametersa,,b Mean Std. Deviation Most Extreme Differences Absolute Positive Negative Kolmogorov-Smirnov Z Asymp. Sig. (2-tailed) a. Test distribution is Normal. b. Calculated from data. 200 52.7750 9.47859 .134 .068 -.134 1.900 .001

State 5: Reporting the results Recall our null hypothesis that distribution write is normal. the p-value (0.001) is less than 0.05 we accept H1 hypothesis and conclude that write is not normal. Thus, we need to transform the write variable to run the parametric test meaningfully. ONE SAMPLE- RUNS Stage 1: Objective: The Runs test procedure tests whether the order of occurrence of two values of a variable is random. A run is a sequence of like observations. Its a statistical test that is used to know the randomness in data. Run test of randomness is sometimes called the Geary test, and it is a nonparametric test. Run test of randomness is an alternative test to test autocorrelation in the data. Autocorrelation means that the data has correlation with its lagged value. To confirm whether or not the data has correlation with the lagged value, run test of randomness is applied. A sample with too many or too few runs suggests that the sample is not random. For example, this can be used in the stock market. If we want to test whether prices of a particular company are behaving randomly or if there are any patterns in the price of that company, we can use the run test of randomness. Or, in another case, if we want to test whether or not a sample is independent of each other or if the sample has any pattern, we can use the run test of randomness. Thus, the test for such problems is called the Run test of randomness. In

29

SPSS run test of randomness can test many values in a single time but that value must be numeric or we should convert them into numeric form. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): The distribution of write is random Alternative Hypothesis (H1): The distribution of write is not random. If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions Data order: run test of randomness assumes that data is entered in order (not grouped). Numeric data: Run test of randomness assumes that data is in numeric form. This is a compulsory condition for run test, because in numeric form, it is easy to assign run to that particular value. Data Level: In run test or randomness we assume that data should be in order. But if data is not in ordered form, then the researcher has to assign a value. These values are one of the following: mean, median, mode or a cut point. By assigning one of these values, data can be ordered. Distribution: Run test of randomness is a non-parametric test. Hence this test does not assume any distribution like any other parametric test. Stage4: Output Click analyze Nonparametric-Runs- Drag write to Test variable list-Click options- check descriptive-click Median in Cut Point box -OK

30

Descriptive Statistics N writing score 200 Mean 52.7750 Std. Deviation 9.47859 Minimum 31.00 Maximum 67.00

31

Runs Test writing score Test Valuea Cases < Test Value Cases >= Test Value Total Cases Number of Runs Z Asymp. Sig. (2-tailed) a. Median 54.00 90 110 200 96 -.573 .567

State 5: Reporting the results For this above, we use median as cut point. The significance coefficient is two-tailed, meaning that it is testing if there are too few or too many runs compared to expected runs under random conditions. A finding of significance, as here, means that the researcher concludes the series does differ significantly from random. A negative Z value means there are fewer runs than would be expected. The assumption of randomness would have been upheld by a finding of nonsignificance. Here, the data on writing score appear to have been entered into the dataset in random order. Our Sig. / pvalue is greater than 0.05 reject H1 hypothesis. NON-PARAMETRIC STATISTICS TWO OR MORE INDEPENDENT SAMPLE- CHI-SQUARE Stage 1: Objective: The Chi-Square test for independence, also called Pearson's Chi-square test or the Chisquare test of association is used to discover if there is a relationship between two categorical variables. Find if there is a relationship between the type of school attended ( schtyp) and students' gender (female). Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic 32

background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): There is no relation between school attended and gender Alternative Hypothesis (H1): There is a significant relation between school attended and gender If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

Two variables that are ordinal or nominal (categorical data). There are two or more groups in each variable. The expected value for each cell is five or higher.

Stage4: Output Click Analyze- Descriptives Statistics- Crosstabs -Transfer one of the variables into the "Row(s):" box and the other variable into the "Column(s):" box. In our example we will transfer the " schtyp " variable into the "Row(s):" box and "Female" into the "Column(s):" box- If you want to display clustered bar charts (recommended) then make sure that "Display clustered bar charts" checkbox is ticked- Click on the statistics buttonSelect the "Chi-square" and "Phi and Cramer's V" options-continue- click cells buttonSelect "Observed" from the "Counts" area and "Row", "Column" and "Total" from the "Percentages" area-continue- Click the format button. [This next option is only really useful if you have more than two categories in one of your variables but we will show it here in case you have]- continue-OK

33

34

35

Case Processing Summary Cases Valid N type of school * female 200 Percent 100.0% N 0 Missing Percent .0% N 200 Total Percent 100.0%

type of school * female Crosstabulation female male type of school public Count % within type of school % within female % of Total private Count % within type of school % within female % of Total Total Count % within type of school % within female % of Total 77 45.8% 84.6% 38.5% 14 43.8% 15.4% 7.0% 91 45.5% 100.0% 45.5% female 91 54.2% 83.5% 45.5% 18 56.2% 16.5% 9.0% 109 54.5% 100.0% 54.5% Total 168 100.0% 84.0% 84.0% 32 100.0% 16.0% 16.0% 200 100.0% 100.0% 100.0%

36

Chi-Square Tests Asymp. Sig. (2Value Pearson Chi-Square Continuity Correctionb Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Casesb .047 200 1 .829 .047a .001 .047 df 1 1 1 sided) .828 .981 .828 .849 .492 Exact Sig. (2sided) Exact Sig. (1sided)

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 14.56. b. Computed only for a 2x2 table

Symmetric Measures Value Nominal by Nominal Phi Cramer's V N of Valid Cases .015 .015 200 Approx. Sig. .828 .828

37

State 5: Reporting the results These results indicate that there is no statistically significant relationship between the type of school attended and gender (chi-square with one degree of freedom = 0.047, p = 0.828). Phi and Cramer's V are both tests of the strength of association. We can see that the strength of association between the variables is very weak. It can be easier to visualize data than read tables. The clustered bar chart option allows a relevant graph to be produced that highlights the group categories and the frequency of counts in these groups. TWO OR MORE INDEPENDENT SAMPLE- - KOLMOGOROV-SMIRNOV The procedure and interpretation remains same for two or more samples. TWO OR MORE INDEPENDENT SAMPLE- RUNS The procedure and interpretation remains same for two or more samples

38

TWO OR MORE INDEPENDENT SAMPLE- WILCOXON-MANN-WHITNEY TEST Stage 1: Objective: The Wilcoxon-Mann-Whitney test is a non-parametric analog to the independent samples t-test and can be used when you do not assume that the dependent variable is a normally distributed interval variable (you only assume that the variable is at least ordinal). You will notice that the SPSS syntax for the Wilcoxon-Mann-Whitney test is almost identical to that of the independent samples t-test. We will not assume that write, our dependent variable, is normally distributed. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): There is no significant difference between the underlying distributions of the write scores of males and the write scores of females Alternative Hypothesis (H1): There is a significant difference between the underlying distributions of the write scores of males and the write scores of females. If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

Random samples from populations The dependent variable is either ordinal, interval or ratio Samples do NOT need to be normally distributed

Stage4: Output Click Analyze-non parametric- 2 Independent samples Tests-click write in test variable list- click female in grouping variable Define groups- 0 (male) and 1(female)-click Mann Whitney U--OK

39

40

Ranks female writing score male female Total N 91 109 200 Mean Rank 85.63 112.92 Sum of Ranks 7792.00 12308.00

Test Statisticsa writing score Mann-Whitney U Wilcoxon W Z Asymp. Sig. (2-tailed) a. Grouping Variable: female 3606.000 7792.000 -3.329 .001

State 5: Reporting the results The results suggest that there is a statistically significant difference between the underlying distributions of the write scores of males and the write scores of females (z = -3.329, p = 0.001). TWO OR MORE INDEPENDENT SAMPLE- KRUSKAL-WALLIS H TEST Stage 1: Objective: The Kruskal-Wallis Test is the nonparametric test equivalent to the one-way ANOVA and an extension of the Mann-Whitney Test to allow the comparison of more than two independent groups. It is used when we wish to compare three or more sets of scores that come from different groups. As the Kruskal-Wallis Test does not assume normality in the data and is much less sensitive to outliers it can be used when these assumption have been violated and the use of the one-way ANOVA is inappropriate. In addition, if your data is ordinal then you cannot use a one-way ANOVA but you can use this test. To find, that there is a statistically significant difference among the three type of programs (general, academic and vocation) with respect to writing scores.

41

Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): There is no significant difference between categories of programe Alternative Hypothesis (H1): There is a significant difference between categories of programe. If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

One dependent variable that is ordinal, interval or ratio One independent variable that consists of three or more independent groups.

Stage4: Output Click Analyze-non parametric-K Independent samples- Now you will get Test for several independent samples-Click write in the test variable list-click program in the grouping variable- Define range for Program (here we have 3 categories in program and if you dont want the 1 category to appear you can start with 2 and 3- Options- Descriptives-click Kruskal-Wallis H in the test typeOK

42

Ranks type of program writing score general academic vocation Total N 45 105 50 200 Mean Rank 90.64 121.56 65.14

43

Test Statisticsa,b writing score Chi-Square df Asymp. Sig. a. Kruskal Wallis Test b. Grouping Variable: type of program 34.045 2 .000

State 5: Reporting the results In our example, we can report that there was a statistically significant difference between the different levels of programs that is general, academic and vocation with respect to writing scores. TWO OR MORE SAMPLES DEPENDENT- WILCOXON - SIGNED RANK SUM TEST Stage 1: Objective: The Wilcoxon Signed-Rank Test is the nonparametric test equivalent to the dependent ttest. It is used when we wish to compare two sets of scores that come from the same participants. This can occur when we wish to investigate any change in scores from one time point to another or individuals are subjected to more than one condition. As the Wilcoxon Signed-Ranks Test does not assume normality in the data it can be used when this assumption has been violated and the use of the dependent t-test is inappropriate. To find, if read and write are statistically different. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): There is no significant difference between read and write Alternative Hypothesis (H1): There is a significant difference between read and write. 44

If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

One dependent variable that is either ordinal, interval or ratio One independent variable that consists of one group or two "matched-pairs" groups/ scale.

Stage4: Output Click Analyze-non parametric-Two related sample tests- Wilcoxon-optionsdescriptives-continue-OK

45

Descriptive Statistics N reading score writing score 200 200 Mean 52.2300 52.7750 Std. Deviation 10.25294 9.47859 Minimum 28.00 31.00 Maximum 76.00 67.00

The table titled Descriptive Statistics is where SPSS has generated descriptive for your variables if you selected these options. If you did not select these options, this table will not appear in your results. You can use the results from this table to describe the reading and writing scores.
Ranks N writing score - reading score Negative Ranks Positive Ranks Ties Total a. writing score < reading score b. writing score > reading score c. writing score = reading score 88a 97b 15c 200 Mean Rank 90.27 95.47 Sum of Ranks 7944.00 9261.00

The Ranks table provides some interesting data on the comparison of participant's reading and writing score. We can see from the table's legend that 88 students had a higher reading score. However, 97 students also had a higher writing score and 15 are ties.
Test Statisticsb writing score reading score Z Asymp. Sig. (2-tailed) a. Based on negative ranks. b. Wilcoxon Signed Ranks Test -.903a .366

By examining the final Test Statistics table we can say that there is no a statistically significant difference between read and write. We are looking for the Asymp. Sig. (246

tailed) value, which in this case is 0.366. This is the P value for the test. This is greater than 0.05. State 5: Reporting the results A Wilcoxon Signed Ranks Test showed that there is no significant difference between reading and writing scores (Z = -.903, P = 0. .366).

TWO OR MORE SAMPLES DEPENDENT- McNEMAR Stage 1: Objective: You would perform McNemar's test if you were interested in the marginal frequencies of two binary outcomes. These binary outcomes may be the same outcome variable on matched pairs (like a case-control study) or two outcome variables from a single group. Continuing with the HSB dataset used in several above examples, let us create two binary outcomes in our dataset: himath and hiread. These outcomes can be considered in a two-way contingency table. To find out if there is any significant difference in the proportion of the group. Example case: We use dataset called as HSB, high school and beyond. This data file contains 200 observations from a sample of high school students with demographic information about the students, such as their gender (female), socio-economic status (ses) and ethnic background (race). It also contains a number of scores on standardized tests, including tests of reading (read), writing (write), mathematics (math) and social studies (socst). Stage 2: Designing Hypothesis Null Hypothesis (H0): The proportion of students in the himath group is the same as the proportion of students in hiread group (i.e., that the contingency table is symmetric). Alternative Hypothesis (H1): There is a significant difference in the proportion of students in himath and hiread group. If Sig. / p-value is less than .05 accept H1 hypothesis. Stage 3: Checking for Assumptions

Two binary variable

Stage4: Output 47

Click Transform- compute variable- himath = (math>60)-OK- compute variable hiread = (read>60)-OK-Analyze- descriptive statistics-cross tabs-HiMath in rowsHiRead in columns-Statistics- McNemar-Continue-OK

Case Processing Summary Cases Valid N HiMath * HiRead 200 Percent 100.0% N 0 Missing Percent .0% N 200 Total Percent 100.0%

48

HiMath * HiRead Crosstabulation Count HiRead Less than 60 HiMath Less than 60 >60 Total 135 18 153 Greater than 60 21 26 47 Total 156 44 200

Chi-Square Tests Exact Sig. (2Value McNemar Test N of Valid Cases a. Binomial distribution used. 200 sided) .749a

State 5: Reporting the results McNemar's chi-square statistic suggests that there is not a statistically significant difference in the proportion of students in the himath group and the proportion of students in the hiread group ( P = 0.749).

49

Potrebbero piacerti anche