Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Variables:
A quantity which changes its value time to time, place to place and person to person is called variable and if the corresponding probabilities are attached with the values of variable then it is called a random variable. For example If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as a random variable. x 1 2 3 4 P(x) 0.2 0.3 0.1 0.4
Population:
A large count or the whole count of the object related things is called population. There are two types of population it may be finite or infinite. If the population elements are countable then it is known as finite population but if the population elements are uncountable then it is called an infinite population. For example: Population of MBA students at IUGC (Finite Population) Population of the University teachers in Pakistan (Finite Population) Population of trees (Infinite Population) Population of sea life (Infinite Population) The population is also categorized in two ways. 1. 2. Homogeneous population Heterogeneous population
Homogeneous Population:
If all the population elements have the same properties then the population is known as homogeneous population. For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.
Heterogeneous Population:
If all the population elements do not have the same properties then the population is known as homogeneous population.
Page 2
Statistical Applications through SPSS For example: Population of MBA students (Male and Female), Population of plants, etc.
Parameter:
A constant computed from the population or a population characteristic is known as parameter. For Example: Population Mean , Population standard deviation , coefficient of skewness and kurtosis for the population.
Statistic:
A constant computed from the Sample or a sample characteristic is known as parameter. For Example: Sample meanx, sample standard deviation s, coefficient of skewness and kurtosis for the sample.
Estimator:
A Sample statistic used to estimate the population parameter is known as estimator. For Example: Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population mean. Sample variance is used to estimate the population variance. So sample variance is also called an estimator of population variance.
Hypothesis:
An assumption about the population parameter tested on the basis of sample information is called hypothesis or hypothesis testing. These assumptions are established in the way that we generate two alternative statements say null and alternative hypothesis in such a manner if one statement is found wrong automatically other one is selected as correct statement.
A Statement or the first think about the parameter value is called a null hypothesis. But statistically we can say that a null hypothesis is a statement should consist equality sign such as: H0: H0: H0: = 0 0 0
As it is clear from above statements there are two types of null hypothesis. 1- Simple null hypothesis 2- Composite null hypothesis
1) Alternative Hypothesis:
An Automatically generated statement against the established null hypothesis is called an alternative hypothesis. Quantitative Techniques in Analysis Page 4
Statistical Applications through SPSS For Example: Null Hypothesis H0: H0: H0: = 0 0 0 Alternatives Hypothesis (H1: 0, (H1: 0, (H1: 0, H1: > 0, H1: > 0, H1: > 0, H 1: H 1: H 1:
It is clear from the above stated alternatives that there are two different types of alternatives. 1- One tailed or One sided alternative hypothesis 2- Two tailed or two sided alternative hypothesis
Accept Ho
Other Population
It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed in the type II error when type I error is minimized.
P- Value:
It is the minimum value of alpha which is needed to reject a true null hypothesis. As it is the value of so it can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing. Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error associated with the testing. Quantitative Techniques in Analysis Page 6
T-test:A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.
A test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are
Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.
Interpretation of the results:If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.
Quantitative Techniques in Analysis Page 7
A test of whether the slope of a regression line differs significantly from 0. The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-tonoise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.
The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.
Quantitative Techniques in Analysis Page 8
Calculations:a)
In testing the null hypothesis that the population means is equal to a specified value 0, one uses the statistic.
Where s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test is n 1.
b)
A) Equal sample sizes, equal variance This test is only used when both:
the two sample sizes (that is, the n or number of participants of each group) are equal; It can be assumed that the two distributions have the same variance.
Violations of these assumptions are discussed below. The t statistic to test whether the means are different can be calculated as follows:
Where;
Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of t is the standard error of the difference between two means. For significance testing, the degrees of freedom for this test is n1 + n2 2 where n1 = # of participants of group # 1 and n2= # of participants of group # 2 A) Unequal sample sizes, unequal variance This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:
Page 9
Where n1 = number of participants of group 1 and n2 is number of participants group two. In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using
This is called the WelchSatterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances. This test can be used as either a one-tailed or two-tailed test. a) Dependent t-test for paired samples:This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".
For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant 0 is non-zero if you want to test whether the average of the difference is significantly different than 0. The degree of freedom used is N 1.
The data should be in the form of numerical (i.e the numerical variable) A test value which is our hypothetical value to which we are going to test.
To analyze the one-sample t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company. The hypotheses are:
a)
The null hypothesis states that the average salary of the employee is equal to 30,000. H0 : 30,000
Page 10
The alternative hypothesis states that the average salary of the employee is not equal to 30,000. HA: 30,000
Method:
Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.
Pictorial Representation
Analyze (Scale) Compare Means Give Test Value One-Sample T Test Drag Test Variable OK
Page 11
SPSS output:One-Sample Statistics N Current Salary 474 Mean $34,419.57 Std. Deviation $17,075.661 Std. Error Mean $784.311
Interpretation:In above table N shows the total number of observation. The average salary of total employees is 34,419.57. The standard deviation of the data is 17,075.661and the standard error of the mean is 784.311.
One-Sample Test Test Value = 30000 t Current Salary 5.635 df 473 Sig. (2-tailed) .000 Mean Difference $4,419.568 95% Confidence Interval of the Difference Lower $2,878.40 Upper $5,960.73
T value is positive which show that our estimated mean value is greater than actual value of mean. Degree of freedom is (N 1) = 473. The P-value is 0.000 which is less than 0.05. The difference between the estimated & actual mean is 4,419.568. Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.
ii)
iii)
iv) v) Decision:-
Page 12
On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero.
Example # 02
A) Independent t-test:SPSS need:1)
Two variable are required one should be numerical and other should be categorical with two levels.
To analyze the independent t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company containing the both males and females. In my analysis I assigned males as m and female as f. The hypotheses are:
a)
The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 :
i.e b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box. Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.
Page 13
Pictorial Representation
Analyze Grouping Variable Compare Means Define Groups Independent-Samples T Test OK Drag Test &
Page 14
SPSS output:Group Statistics Gender Current Salary Male Female N 258 216 Mean $41,441.78 $26,031.92 Std. Deviation $19,499.214 $7,558.021 Std. Error Mean $1,213.968 $514.258
Total number of male is 258 and the female is 216. The mean value of salaries of male employee is 41,441.78 & the female employee is 26,031.92. Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021. Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.
Current Salary
t-test for Equality of Means Mean Difference $15,409.862 10.945 472 .000 $15,409.862 11.688 344.262 .000 $1,318.400 $12,816.728 $18,002.996 $1,407.906 $12,643.322 $18,176.401 Std. Error Difference 95% Confidence Interval of the Difference Lower Upper
Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,
i) ii)
F value is 119.669 with significant value of 0.00 which is less than 0.05. On the basis of P-value of F-test part we assume that that the variance of the two populations is not equal.
Page 15
T value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees Degree of freedom is 344.262. The P-value is 0.000 which is less than 0.05. The difference between the two population mean is 15,409.862. The standard error difference between the two population mean is 1,318.400. Confidence interval has the lower & upper limit 12,816.728 & 18,002.996 respectively. The confidence interval limits does not contains zero.
v)
vi)
vii)
viii)
Decision:On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero.
Comments:The average salaries of male & female employees are not equal.
Example # 03
A) Paired t-test:SPSS need:1)
To analyze the paired t-test I used the begging & ending salaries of the employees of an organization. For this purpose, I have select the sample of 474 employees of the organization. The hypotheses are: a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 : i.e
b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Quantitative Techniques in Analysis Page 16
Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.
Pictorial Representation
Analyze Compare Means Variables (Scale) Paired-Samples T Test OK Drag Paired
Page 17
Pair 1
N 474 474
The mean vale of current & beginning salary is 34,419.57 & 17,016.09 respectively. Total number of both groups is 474 individually. The standard deviation of current & beginning salary is 17,075.661 & 7,870.638 respectively. The standard error mean of current & beginning salary is 784.331 & 361.510 respectively.
Paired Samples Correlations N Pair 1 Current Salary & Beginning Salary 474 Correlation .880 Sig. .000
iv)
Analyze
The total number of pair is 474. 0.88 show that the both values of group are highly co-related, which indicate that the employees who has greater begging salary has also greater current salary. The P-value is 0.00 which is less than 0.05.
Page 18
Mean
Std. Deviation
df
Sig. (2-tailed)
$17,403.481
$10,814.620
$496.732
$16,427.407
$18,379.555
35.036
473
.000
Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that, i) ii) iii) iv)
v)
The mean value of pair is 17,403.481. The standard deviation of pair is 10,814.620. The standard error mean of pair is 496.732. Confidence interval has the lower & upper limit 16,427.407 & 18,379.555 respectively. The confidence interval limits does not contains zero. T- Value is 35.036. Degree of freedom is (N-1) = 473. P-vale is 0.00 which is less than 0.05.
On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero. The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.
Comments:-
One-Way ANOVA
ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.
Data Source:
Hypothesis:
H0: HA: 1 = 2 = 3 at least one mean is not equal.
SPSS Need:
SPSS need two types of variables for analyzing one-way ANOVA. Numerical Variable (Scale). Categorical Variable (with more than two categories).
Method:
First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis. If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze & Factors Compare Means Post Hoc (Optional) One-Way ANOVA OK Drag Dependent List
Page 20
Output:
ANOVA Current Salary Between Groups Within Groups Total Sum of Squares 89438483925.943 48478011510.397 137916495436.340 df 2 471 473 Mean Square 44719241962.972 102925714.459 F 434.481 Sig. .000
The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.
(J) Employment Category Custodial Manager Custodial Clerical Manager Quantitative Techniques in Analysis Manager Clerical Custodial *. The mean difference is significant at the .05 level.
The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for Clerical Manager and Custodial Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial. Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.
Two-Way ANOVA
In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.
Data Source:
C:\SPSSEVAL\Carpet
Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA,
i.e. A) Preference B) Package design C) Brand (Numerical) (Categorical) (Categorical)
Hypothesis:
For Brand: For Package: H0: HA: H0': HA': i = j i j for all i & j i = j i j for all i & j
SPSS Need:
SPSS need two types of variables for analyzing two-way ANOVA. Numerical Variable (Scale). Two categorical Variables (with more than two levels).
Method:
First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis.
Page 22
If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.
Pictorial Representation
Analyze General Linear Model Drag Dependent Variable & Fixed Factors Univariate Post Hoc OK
Page 23
Output:
Between-Subjects Factors Package design Brand name 1.00 2.00 3.00 1.00 2.00 3.00 Value Label A* B* C* K2R Glory Bissell N 9 6 7 7 7 8
This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.
Tests of Between-Subjects Effects Dependent Variable: Preference Source package brand Error Total Type III Sum of Squares
a
df 2 2 13 22
F 16.883 1.135
The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows. The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason). The second row labeled brand gives the variability due to the different brand names (known reason). The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons). The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons.
Page 24
In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same. Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.
* * * *
Std Erro . r 2 0 6 .1 22 2 1 5 .0 01 2 0 6 .1 22 2 1 4 .2 91 2 1 5 .0 01 2 1 4 .2 91
Sig. .0 00 .0 00 .0 00 .3 22 .0 00 .3 22
As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category. The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for A* B* and A* C* comparison is shown as 0.000, whereas it is 0.322 for B* C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*. Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other. But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.
Chi-Square Test
Chi-square test is a test which is commonly used to test the hypothesis regarding;
Page 25
It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of 2 is always positive.
Data Source:
C:\SPSSEVAL\Carpet
Price
(Numerical)
SPSS Need:
SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.
Graphical Representation:
8
Frequency
Mean = 2.00 Std. Dev. = 0.87287 N = 22 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
Price
Page 26
Explanation of Graph
From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively. The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.
Method:
First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.
Pictorial Representation
Analyze Non-parametric test OK Chi-square Define test Variable list
Page 27
Page 28
Output
Price $1.19 $1.39 $1.59 Total Observed N 8 6 8 22 Expected N 7.3 7.3 7.3 Residual .7 -1.3 .7
First column of the above table shows the three categories in price variable. The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given. The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable. The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.
a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 7.3.
Page 29
The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.
Data Source:
C:\SPSSEVAL\Employee Data
Hypothesis:
H0: HA: Designation is independent of Sex. Designation is not independent of Sex.
SPSS need two categorical variables for analyzing Chi-Square test for independence.
First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.
Pictorial Representation
Analyze Descriptive Statistics Tick Chi-Square OK Crosstabs Drag Row and Column Variables
Page 30
Output
Quantitative Techniques in Analysis Page 31
Gender * Employment Category Crosstabulation Count Employment Category Clerical Custodial Manager 206 0 10 157 27 74 363 27 84 Total 216 258 474
Gender Total
Female Male
Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees. We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations. The results are given in two rows; the first row shows the number of female employees in each employment category. The second row shows the number of male employees in each employment category.
Chi-Square Tests Value 79.277a 474 df 2 Asymp. Sig. (2-sided) .000
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 12.30.
The above table gives the test results for the chi-square test for independence. The first row labeled Pearson ChiSquare shows that the value of 2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.
Second Approach
Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.
Method
First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table Quantitative Techniques in Analysis Page 32
contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category. So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.
. After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.
The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.
Page 33