Sei sulla pagina 1di 33

A

QTIA Statistical Applications through SPSS

Statistical Applications through SPSS

S. Ali Raza Naqvi

Variables:
A quantity which changes its value time to time, place to place and person to person is called variable and if the corresponding probabilities are attached with the values of variable then it is called a random variable. For example If we say x= 1 or x=7 or x=-6 then x is a variable but if a variable appears in the following way then it is known as a random variable. x 1 2 3 4 P(x) 0.2 0.3 0.1 0.4

Population:
A large count or the whole count of the object related things is called population. There are two types of population it may be finite or infinite. If the population elements are countable then it is known as finite population but if the population elements are uncountable then it is called an infinite population. For example: Population of MBA students at IUGC (Finite Population) Population of the University teachers in Pakistan (Finite Population) Population of trees (Infinite Population) Population of sea life (Infinite Population) The population is also categorized in two ways. 1. 2. Homogeneous population Heterogeneous population

Homogeneous Population:
If all the population elements have the same properties then the population is known as homogeneous population. For example: Population of shops, Population of houses, Population of boys, Population of rice in a box etc.

Heterogeneous Population:
If all the population elements do not have the same properties then the population is known as homogeneous population.

Quantitative Techniques in Analysis

Page 2

Statistical Applications through SPSS For example: Population of MBA students (Male and Female), Population of plants, etc.

S. Ali Raza Naqvi

Parameter:
A constant computed from the population or a population characteristic is known as parameter. For Example: Population Mean , Population standard deviation , coefficient of skewness and kurtosis for the population.

Statistic:
A constant computed from the Sample or a sample characteristic is known as parameter. For Example: Sample meanx, sample standard deviation s, coefficient of skewness and kurtosis for the sample.

Estimator:
A Sample statistic used to estimate the population parameter is known as estimator. For Example: Sample mean is used to estimate the population mean. So sample mean is also called an estimator of population mean. Sample variance is used to estimate the population variance. So sample variance is also called an estimator of population variance.

Hypothesis:
An assumption about the population parameter tested on the basis of sample information is called hypothesis or hypothesis testing. These assumptions are established in the way that we generate two alternative statements say null and alternative hypothesis in such a manner if one statement is found wrong automatically other one is selected as correct statement.

Types of Hypothesis: 1) Null Hypothesis:


Quantitative Techniques in Analysis Page 3

Statistical Applications through SPSS

S. Ali Raza Naqvi

A Statement or the first think about the parameter value is called a null hypothesis. But statistically we can say that a null hypothesis is a statement should consist equality sign such as: H0: H0: H0: = 0 0 0

As it is clear from above statements there are two types of null hypothesis. 1- Simple null hypothesis 2- Composite null hypothesis

1-Simple Null Hypothesis:


If a null hypothesis is based on the single value (or it consist of only equal sign) then the null hypothesis is called a simple null hypothesis For Example Phrases Average rain fall in United States of America during 1999 was 200 mm. The average concentrations of two substances are same The IQ level of MBA and BBA students are same. IQ level is independent from education level. H0: = 0

2-Composite Null Hypothesis:


If a null hypothesis is based on the interval of the parameter value (or it consist of less then or greater then sign with equal sign) then the null hypothesis is called a Composite null hypothesis For Example H0: H0: Phrases The mean height of BBA students are at most 70 inches The performance of PHD students is at most same as MBA students Variability in a data set must be positive (Greater or equal to zero) 0 0

1) Alternative Hypothesis:
An Automatically generated statement against the established null hypothesis is called an alternative hypothesis. Quantitative Techniques in Analysis Page 4

Statistical Applications through SPSS For Example: Null Hypothesis H0: H0: H0: = 0 0 0 Alternatives Hypothesis (H1: 0, (H1: 0, (H1: 0, H1: > 0, H1: > 0, H1: > 0, H 1: H 1: H 1:

S. Ali Raza Naqvi

< 0) < 0) < 0)

It is clear from the above stated alternatives that there are two different types of alternatives. 1- One tailed or One sided alternative hypothesis 2- Two tailed or two sided alternative hypothesis

1-One tailed Alternative Hypothesis:


If an alternative is based on either the greater then (>) or a less then (<) sign in the statement then the alternative hypothesis is known as the one tailed hypothesis. For Example: Phrases Average rain fall in Pakistan is more then from average rain fall in Jakarta. Inzamam is more consistent player then Shaid Afridi. Waseem Akram is a better bowler then McGrath. Gold prices are dependent on oil prices. H1: > 0, Or H 1: < 0

2-Two tailed Alternative Hypothesis:


If an alternative is based on only an unequal () sign in the statement then the alternative hypothesis is known as the two tailed hypothesis. For Example: Phrases The Concentration of two substances is not same. There is a significant difference between the wheat production of Sind and Punjab. The consistency of KSE and SSE is not same. In this type of alternatives the total chance of type I error remain in only one side of the normal curve In this type of alternatives the total chance of type I error is divided in two sides of the normal curve H1: 0,

Probabilities Associated with Decisions:


Quantitative Techniques in Analysis Page 5

Statistical Applications through SPSS

S. Ali Raza Naqvi

Ho is True Correct Decision 1- False Decision Reject Ho Type I Error


True Population

Ho is False False Decision Type II Error Correct Decision 1-

Accept Ho

Other Population

It is clear from the above figures that both the errors can not be minimized at the same time. An increase is observed in the type II error when type I error is minimized.

P- Value:
It is the minimum value of alpha which is needed to reject a true null hypothesis. As it is the value of so it can be explain as the minimum value of type I error which is associated with a hypothesis while it is testing. Therefore, it is used in two ways, one in decision making and the other to determine the probability of type I error associated with the testing. Quantitative Techniques in Analysis Page 6

Statistical Applications through SPSS

S. Ali Raza Naqvi

Decision Rule on the basis of p - value:


Reject Ho if p value < 0.05 Accept Ho if p value 0.05 For example: If the p-value for any test appears 0.01. It is indicating that our null hypothesis is to be rejected and there is only 1% chance of rejecting a true null hypothesis. That further can explain as we are 99% confident in rejection of the null hypothesis. Or we can say that we can reject our this null hypothesis up to = 1% or 99% confidence level If the p-value for any test appears 0.21. It is indicating that our null hypothesis is to be accepted and there is 21% chance of rejecting a true null hypothesis. That further can explain as we are 79% confident in our decision and rejection of the null hypothesis. Or we can say that our this true null hypothesis may be rejected at = 21%.

T-test:A t-test is a statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.

Uses of T-test:Among the most frequently used t tests are:

A test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points. We can use some kind of t-test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. There are different versions of the t- test depending on whether the two samples are

Unpaired, independent of each other (e.g., individuals randomly assigned into two groups, measured after an intervention and compared with the other group), or Paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention.

Interpretation of the results:If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.10, the 0.05, or 0.01 level), then the null hypothesis which usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ.
Quantitative Techniques in Analysis Page 7

Statistical Applications through SPSS

S. Ali Raza Naqvi

A test of whether the slope of a regression line differs significantly from 0. The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially another example of the signal-tonoise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is a measure of variability that is essentially noise that may make it harder to see the group difference. Figure shows the formula for the t-test and how the numerator and denominator are related to the distributions.

Statistical Analysis of the t-test:-

The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide it by the number of people in that group. We add these two values and then take their square root The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value we have to look it up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding. To test the significance, we need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the alpha level at .05. This means that five times out of a hundred we would find a statistically significant difference between the means even if there was none (i.e., by "chance"). We also need to determine the degrees of freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, we can look the t-value up in a standard table of significance (available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, we can conclude that the difference between the means for the two groups is different (even given the variability.
Quantitative Techniques in Analysis Page 8

Statistical Applications through SPSS

S. Ali Raza Naqvi

Calculations:a)

Independent one-sample t-test

In testing the null hypothesis that the population means is equal to a specified value 0, one uses the statistic.

Where s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test is n 1.
b)

Independent two-sample t-test:-

A) Equal sample sizes, equal variance This test is only used when both:

the two sample sizes (that is, the n or number of participants of each group) are equal; It can be assumed that the two distributions have the same variance.

Violations of these assumptions are discussed below. The t statistic to test whether the means are different can be calculated as follows:

Where;

Here is the grand standard deviation (or pooled standard deviation), 1 = group one, 2 = group two. The denominator of t is the standard error of the difference between two means. For significance testing, the degrees of freedom for this test is n1 + n2 2 where n1 = # of participants of group # 1 and n2= # of participants of group # 2 A) Unequal sample sizes, unequal variance This test is used only when the two sample sizes are unequal and the variance is assumed to be different. See also Welch's t test. The t statistic to test whether the means are different can be calculated as follows:

Quantitative Techniques in Analysis

Page 9

Statistical Applications through SPSS

S. Ali Raza Naqvi

Where n1 = number of participants of group 1 and n2 is number of participants group two. In this case, variance is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as being an ordinary Student's t distribution with the degrees of freedom calculated using

This is called the WelchSatterthwaite equation. Note that the true distribution of the test statistic actually depends (slightly) on the two unknown variances. This test can be used as either a one-tailed or two-tailed test. a) Dependent t-test for paired samples:This test is used when the samples are dependent; that is, when there is only one sample that has been tested twice (repeated measures) or when there are two samples that have been matched or "paired".

For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups (for instance drawn from the same family or age group: see table). The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant 0 is non-zero if you want to test whether the average of the difference is significantly different than 0. The degree of freedom used is N 1.

Example # 01 Analysis through SPSS:A) One-sample t-test:SPSS need:1) 2)

The data should be in the form of numerical (i.e the numerical variable) A test value which is our hypothetical value to which we are going to test.

To analyze the one-sample t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company. The hypotheses are:
a)

The null hypothesis states that the average salary of the employee is equal to 30,000. H0 : 30,000
Page 10

Quantitative Techniques in Analysis

Statistical Applications through SPSS b)

S. Ali Raza Naqvi

The alternative hypothesis states that the average salary of the employee is not equal to 30,000. HA: 30,000

Method:
Enter the data in the data editor and the variable is labeled as employee's current salary. Now click on Analyze which will produce a drop down menu, choose Compare means from that and click on one-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select a variable, which is to be computed. The variable computed in our case is Current salaries of the employees. The variables can be selected for analysis by transferring them to the test variable box. Next, change the value in the test value box, which originally appears as 0, to the one against which you are testing the sample mean. In this case, this value would be 35000. Now click on OK to run the analysis.

Pictorial Representation
Analyze (Scale) Compare Means Give Test Value One-Sample T Test Drag Test Variable OK

Quantitative Techniques in Analysis

Page 11

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:One-Sample Statistics N Current Salary 474 Mean $34,419.57 Std. Deviation $17,075.661 Std. Error Mean $784.311

Interpretation:In above table N shows the total number of observation. The average salary of total employees is 34,419.57. The standard deviation of the data is 17,075.661and the standard error of the mean is 784.311.
One-Sample Test Test Value = 30000 t Current Salary 5.635 df 473 Sig. (2-tailed) .000 Mean Difference $4,419.568 95% Confidence Interval of the Difference Lower $2,878.40 Upper $5,960.73

Interpretation:Through above table we can observe that,


i)

T value is positive which show that our estimated mean value is greater than actual value of mean. Degree of freedom is (N 1) = 473. The P-value is 0.000 which is less than 0.05. The difference between the estimated & actual mean is 4,419.568. Confidence interval has the lower & upper limit 2,878.4 & 5,960.73 respectively. The confidence interval limits does not contains zero.

ii)
iii)

iv) v) Decision:-

Quantitative Techniques in Analysis

Page 12

Statistical Applications through SPSS

S. Ali Raza Naqvi

On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero.

Comments:The average salary of employees is not equal to 30,000.

Example # 02
A) Independent t-test:SPSS need:1)

Two variable are required one should be numerical and other should be categorical with two levels.

To analyze the independent t-test I have use the employees salaries of an organization. For this purpose, I have select the sample of 474 employees of the company containing the both males and females. In my analysis I assigned males as m and female as f. The hypotheses are:
a)

The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 :

i.e b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Enter the data in the data editor and the variables are labeled as employee's beginning salary and employee's designations respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on independent samples t-test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform the independent samples t-test, transfer the dependent variable into the test variable box and transfer the variable that identifies the groups into the grouping variable box. In this case, the Beginning salary of the employees is the dependent variable to be analyzed and should be transferred into test variable box by clicking on the first arrow in the middle of the two boxes. Job category is the variable which will identify the groups of the employees and it should be transferred into the grouping variable box. Once the grouping variable is transferred, the define groups button which was earlier inactive turns active. Click on it to define the two groups. In this case group1 represents the employees belong to clerical category and group2 represents the employees belong to the custodial category. Therefore put 1 in the box against group1 and 2 in the box against group2 and click continue. Now click on OK to run the analysis.

Quantitative Techniques in Analysis

Page 13

Statistical Applications through SPSS

S. Ali Raza Naqvi

Pictorial Representation
Analyze Grouping Variable Compare Means Define Groups Independent-Samples T Test OK Drag Test &

Quantitative Techniques in Analysis

Page 14

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:Group Statistics Gender Current Salary Male Female N 258 216 Mean $41,441.78 $26,031.92 Std. Deviation $19,499.214 $7,558.021 Std. Error Mean $1,213.968 $514.258

Interpretation:Through above table we can observe that,


i) ii) iii) iv)

Total number of male is 258 and the female is 216. The mean value of salaries of male employee is 41,441.78 & the female employee is 26,031.92. Standard deviation of salaries of male employee is 19,449.214 & the female employee is 7,558.021. Standard error of mean of salaries of male employees is 1,213.968 & the Standard error of mean of salaries of female employees is 514.258.

Current Salary

Levene's Test for Equality of Variances F Sig. t df Sig. (2tailed)

t-test for Equality of Means Mean Difference $15,409.862 10.945 472 .000 $15,409.862 11.688 344.262 .000 $1,318.400 $12,816.728 $18,002.996 $1,407.906 $12,643.322 $18,176.401 Std. Error Difference 95% Confidence Interval of the Difference Lower Upper

Equal variances assumed Equal variances not assumed 119.669 .000

Independent Samples Test

Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that,
i) ii)

F value is 119.669 with significant value of 0.00 which is less than 0.05. On the basis of P-value of F-test part we assume that that the variance of the two populations is not equal.

Quantitative Techniques in Analysis

Page 15

Statistical Applications through SPSS iii) iv)

S. Ali Raza Naqvi

T value is positive which show that the mean value of salaries of male employees is greater than the mean value of salaries of female employees Degree of freedom is 344.262. The P-value is 0.000 which is less than 0.05. The difference between the two population mean is 15,409.862. The standard error difference between the two population mean is 1,318.400. Confidence interval has the lower & upper limit 12,816.728 & 18,002.996 respectively. The confidence interval limits does not contains zero.

v)
vi)

vii)
viii)

Decision:On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero.

Comments:The average salaries of male & female employees are not equal.

Example # 03
A) Paired t-test:SPSS need:1)

Two numerical variables are required which should be equal in numbers.

To analyze the paired t-test I used the begging & ending salaries of the employees of an organization. For this purpose, I have select the sample of 474 employees of the organization. The hypotheses are: a) The null hypothesis states that the average salary of the male employee is equal to average salary of the male employee. H0 : i.e

b) The alternative hypothesis states that the average salary of the male employee is not equal to average salary of the male employee. HA : i.e Method:
Quantitative Techniques in Analysis Page 16

Statistical Applications through SPSS

S. Ali Raza Naqvi

Enter the data in the data editor and the variables are labeled as employee's current and beginning salary respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on Paired-samples t test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. From this box we have to select variables, which are to be computed. The two variables computed in our case are Current and Beginning salaries. Select these together and they will immediately appear in the box at the bottom labeled current selection. They are simultaneously highlighted in the box in which they originally appeared. Once the variables are selected the arrow at the center becomes active. The variables can be transferred to the Paired-Variables box by clicking on this arrow. They will appear in the box as Current-Beginning. Now click on OK to run the analysis.

Pictorial Representation
Analyze Compare Means Variables (Scale) Paired-Samples T Test OK Drag Paired

Quantitative Techniques in Analysis

Page 17

Statistical Applications through SPSS

S. Ali Raza Naqvi

SPSS output:Paired Samples Statistics Std. Error Mean $784.311 $361.510

Pair 1

Current Salary Beginning Salary

Mean $34,419.57 $17,016.09

N 474 474

Std. Deviation $17,075.661 $7,870.638

Interpretation:Through above table we can observe that, i)


ii) iii)

The mean vale of current & beginning salary is 34,419.57 & 17,016.09 respectively. Total number of both groups is 474 individually. The standard deviation of current & beginning salary is 17,075.661 & 7,870.638 respectively. The standard error mean of current & beginning salary is 784.331 & 361.510 respectively.
Paired Samples Correlations N Pair 1 Current Salary & Beginning Salary 474 Correlation .880 Sig. .000

iv)

Analyze

Interpretation:Through above table we can observe that, i)


ii) iii)

The total number of pair is 474. 0.88 show that the both values of group are highly co-related, which indicate that the employees who has greater begging salary has also greater current salary. The P-value is 0.00 which is less than 0.05.
Page 18

Quantitative Techniques in Analysis

Statistical Applications through SPSS


Paired Samples Test

S. Ali Raza Naqvi

Mean

Std. Deviation

Std. Error Mean

95% Confidence Interval of the Difference Lower Upper

df

Sig. (2-tailed)

Pair 1 Current Salary - Beginning Salary

$17,403.481

$10,814.620

$496.732

$16,427.407

$18,379.555

35.036

473

.000

Interpretation:In above table we have two parts (a) f-test, (b) t-test, through which we can observe that, i) ii) iii) iv)
v)

The mean value of pair is 17,403.481. The standard deviation of pair is 10,814.620. The standard error mean of pair is 496.732. Confidence interval has the lower & upper limit 16,427.407 & 18,379.555 respectively. The confidence interval limits does not contains zero. T- Value is 35.036. Degree of freedom is (N-1) = 473. P-vale is 0.00 which is less than 0.05.

vi) vii) Decision:-

On the basis of following observation I reject my Null hypothesis and accept the Alternative hypothesis. I am almost 100% sure on my decision. i) ii) The P-value is 0.000 which is less than 0.05. The confidence interval limits does not contains zero. The mean difference of the two paired variables i.e. current and beginning salary is significant or not same.

Comments:-

One-Way ANOVA
ANOVA is a commonly used statistical method for making simultaneous comparisons between two or more population means, that yield values that can be tested to determine whether a significant relation exist between variables or not. Its simplest form is One-Way ANOVA, it involves only one dependent variable and one or more independent variables.

Data Source:

C:\SPSSEVAL\Employee Data Quantitative Techniques in Analysis Page 19

Statistical Applications through SPSS

S. Ali Raza Naqvi

Variables: Here we analyze two different variables by One-Way ANOVA, i.e.


A) Current salary of the employees. B) Employment Category.

Hypothesis:
H0: HA: 1 = 2 = 3 at least one mean is not equal.

SPSS Need:

SPSS need two types of variables for analyzing one-way ANOVA. Numerical Variable (Scale). Categorical Variable (with more than two categories).

Method:

First of all enter the data in the data editor and the variables are labeled as employee's current salary and employment category respectively. Click on Analyze which will produce a drop down menu, choose Compare means from that and click on One-Way ANOVA, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform one-way ANOVA, transfer the dependent variable into the box labeled Dependent List and all factoring variable into the box labeled Factor. In our case Current salary is the dependent variable and should be transferred to the dependent list box by clicking on the first arrow in the middle of the two boxes. Employment Category is the factoring variable and should be transferred to the factor box by clicking on the second arrow and then click OK to run the analysis. If the null hypothesis is rejected, ANOVA only tells us that all population means are not equal. Multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.

Pictorial Representation
Analyze & Factors Compare Means Post Hoc (Optional) One-Way ANOVA OK Drag Dependent List

Quantitative Techniques in Analysis

Page 20

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output:
ANOVA Current Salary Between Groups Within Groups Total Sum of Squares 89438483925.943 48478011510.397 137916495436.340 df 2 471 473 Mean Square 44719241962.972 102925714.459 F 434.481 Sig. .000

The above table gives the test results for the analysis of one-way ANOVA. The results are given in three rows. The first row labeled between groups gives the variability due to the different designations of the employees (known reasons). The second row labeled within groups gives the variability due to random error (unknown reasons), and the third row gives the total variability. In this case, F-value is 434.481, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis and conclude that the average salary of the employees is not the same in all three categories.

Post Hoc Tests


Multiple Comparisons Dependent Variable: Current Salary LSD Mean Difference (I-J) -$3,100.349 -$36,139.258* $3,100.349 -$33,038.909* $36,139.258* $33,038.909* 95% Confidence Interval Lower Bound Upper Bound -$7,077.06 $876.37 -$38,552.99 -$33,725.53 -$876.37 $7,077.06 -$37,449.20 21 Page -$28,628.62 $33,725.53 $38,552.99 $28,628.62 $37,449.20

(I) Employment Category Clerical

(J) Employment Category Custodial Manager Custodial Clerical Manager Quantitative Techniques in Analysis Manager Clerical Custodial *. The mean difference is significant at the .05 level.

Std. Error $2,023.760 $1,228.352 $2,023.760 $2,244.409 $1,228.352 $2,244.409

Sig. .126 .000 .126 .000 .000 .000

Statistical Applications through SPSS

S. Ali Raza Naqvi

The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for Clerical Manager and Custodial Manager comparison is shown as 0.000, whereas it is 0.126 for Clerical Custodial comparison. This means that the average current salary of the employees between Clerical and Manager as well as Manager and Custodial are significantly different, whereas the same is not significantly different between Clerical and Custodial. Conclusion: As our null hypothesis is rejected and we conclude that all three means are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of managers is significantly different from other two means whereas the other two means are insignificant with each other.

Two-Way ANOVA
In two-Way Analysis, we have two independent variables or known factors and we are interested in knowing their effect on the same dependent variable.

Data Source:

C:\SPSSEVAL\Carpet

Variables: Here we analyze two different categorical variables with a numerical variable by Two-Way ANOVA,
i.e. A) Preference B) Package design C) Brand (Numerical) (Categorical) (Categorical)

Hypothesis:
For Brand: For Package: H0: HA: H0': HA': i = j i j for all i & j i = j i j for all i & j

SPSS Need:

SPSS need two types of variables for analyzing two-way ANOVA. Numerical Variable (Scale). Two categorical Variables (with more than two levels).

Method:

First of all enter the data in the data editor and the variables are labeled as Preference, brand, package design respectively. Click on Analyze which will produce a drop down menu, choose General Linear model from that and click on Univariate, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. To perform two-way ANOVA, transfer the dependent variable (Preference) into the box labeled Dependent variable and factor variable (Brand & Package) into the box labeled Fixed Factor. After defining all variables, now click on OK to run the analysis.

Quantitative Techniques in Analysis

Page 22

Statistical Applications through SPSS

S. Ali Raza Naqvi

If the null hypothesis is rejected, multiple comparisons are used to assess which group mean are differ from which other, once the overall F-test shows that at least one difference exists. Many tests are listed under Post Hoc in SPSS, LSD (Least Significant Difference) and Tukey test is one of the most conservative and commonly used tests.

Pictorial Representation
Analyze General Linear Model Drag Dependent Variable & Fixed Factors Univariate Post Hoc OK

Quantitative Techniques in Analysis

Page 23

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output:
Between-Subjects Factors Package design Brand name 1.00 2.00 3.00 1.00 2.00 3.00 Value Label A* B* C* K2R Glory Bissell N 9 6 7 7 7 8

This table shows the value label under each category and the frequency of each value label. We have totaled 6 value labels under package design and brand name.
Tests of Between-Subjects Effects Dependent Variable: Preference Source package brand Error Total Type III Sum of Squares
a

df 2 2 13 22

Mean Square 268.616 18.054 15.910

F 16.883 1.135

Sig. .000 .351

537.231 36.108 206.833 3758.000

a. R Squared = .763 (Adjusted R Squared = .617)

The above table gives the test results for the analysis of two-way ANOVA. The results are given in four rows. The first row labeled package gives the variability due to the different package design of the carpets, which may affect the customer's preferences (known reason). The second row labeled brand gives the variability due to the different brand names (known reason). The third row labeled error gives the variability due to random error, which also affects the customer's preferences (unknown reasons). The fourth row gives the total variability in the customer's preferences due to both known and unknown reasons.

Quantitative Techniques in Analysis

Page 24

Statistical Applications through SPSS

S. Ali Raza Naqvi

In this case, F-value for package design is 16.883, and the corresponding p-value is less than 0.05. Therefore we can safely reject the null hypothesis for package design and conclude that the average preference for all packages is not same. Now the F-value for brand name is 1.135, and the corresponding p-value is greater than 0.05. So we can accept the null hypothesis for brand and conclude that all average brand preferences are found approximately same.

Post Hoc Tests Package design


M ltiple C pa o u om ris ns D e e t Va b : Pre ep nd n ria le feren ce L SD Ma en D ren iffe ce (I-J) 1 .5 6 1 55 9 69 .2 8 -1 .5 5 1 56 -2 85 .2 7 -9 69 .2 8 2 85 .2 7

(I) P ackag d e esig n A * B * C *

(J) P cka e d sign a g e B* C * A* C * A* B*

* * * *

Std Erro . r 2 0 6 .1 22 2 1 5 .0 01 2 0 6 .1 22 2 1 4 .2 91 2 1 5 .0 01 2 1 4 .2 91

Sig. .0 00 .0 00 .0 00 .3 22 .0 00 .3 22

9 % C nfid n Inte l 5 o e ce rva L w r Bo d o e un U p r Bo nd pe u 7 13 .0 9 1 .0 2 6 97 4 27 .9 2 1 .6 5 3 12 -1 .09 2 6 7 -7 1 9 .0 3 -7 7 9 .0 9 2 08 .5 5 -1 .61 5 3 2 -4 2 2 .9 7 -2 0 5 .5 8 7 79 .0 9

Ba d o o se d m n se n b rve ea s. *. Th m a d re ce is sign e e n iffe n ifica at th .0 le l. nt e 5 ve

As our null hypothesis for package design is rejected, so multiple comparisons are used to assess that which group mean is different from the others. The above table gives the results for multiple comparisons between each value label under package design category. The Post-Hoc test presents the result of the comparison between all the possible pairs. Since we have three groups, a total of six pairs will be possible in which three will be mirror images. The results are shown in three rows. The pvalue for A* B* and A* C* comparison is shown as 0.000, whereas it is 0.322 for B* C* comparison. This means that the average preference for package design between A* and B* as well as A* and C* are significantly different, whereas the same is not significantly different between B* & C*. Conclusion: As our null hypothesis for package design is rejected and we conclude that all mean preferences for package design are not same. To identify the mean which is different from other we used LSD test and conclude that ANOVA of A* is significantly different from other two means whereas the other two means are insignificant with each other. But in the case of brand name, our null hypothesis is accepted and we conclude that all mean brand preferences are same. So there is no need for multiple comparisons in the case of brand.

Chi-Square Test
Chi-square test is a test which is commonly used to test the hypothesis regarding;

Goodness of fit test Test for Association / Independence / Attributes

It is denoted by "2" and its degree of freedom is "n-1", where

Quantitative Techniques in Analysis

Page 25

Statistical Applications through SPSS n = Number of categories

S. Ali Raza Naqvi

It is a positively skewed distribution so that, it has one tailed critical region on the right tail of the curve and the value of 2 is always positive.

Chi-Square Goodness of Fit Test


Chi-Square goodness of fit test is used when the distribution is non-normal and the sample size is less than 30, so the chi-square goodness of fit test determines whether the distribution follows uniform distribution or not.

Data Source:

C:\SPSSEVAL\Carpet

Variables: Here we are interested to analyze a numerical variable i.e. Hypothesis:


H0: HA: Fit is good. (Data follows Uniform Distribution/ Prices are Uniform) Fit is not good. (Data does not follow Uniform Distribution/ Prices are not uniform)

Price

(Numerical)

SPSS Need:

SPSS need a categorical variable or a numerical variable for analyzing Chi-Square goodness of fit test.

Graphical Representation:
8

Frequency

Mean = 2.00 Std. Dev. = 0.87287 N = 22 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50

Quantitative Techniques in Analysis

Price

Page 26

Statistical Applications through SPSS

S. Ali Raza Naqvi

Explanation of Graph
From the above graph we see that our numerical variable (price) is on x-axis and its frequency on the y-axis. The mean and standard deviation of 22 observations are 2.00 and 0.87287 respectively. The above graph clearly shows that the selected numerical variable i.e. price does not follow a normal distribution, so we use chi-square goodness of fit test to determine if the sample under investigation has been drawn from a population, which follows some specified distribution.

Method:
First of all enter the data in the data editor and the variables are labeled as price. Click on Analyze which will produce a drop down menu, choose non-parametric test from that and click on Chi-square test, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to analyze. When you select the test variable, the arrow between the two boxes will now active and you can transfer the variable on the box labeled test variable list by clicking on the arrow. In this case our test variable is price and it should be transferred to the test variable box. You also click on the options button, if you are interested to know the descriptive statistics of the tested variable. Now click on OK to run the analysis.

Pictorial Representation
Analyze Non-parametric test OK Chi-square Define test Variable list

Quantitative Techniques in Analysis

Page 27

Statistical Applications through SPSS

S. Ali Raza Naqvi

Quantitative Techniques in Analysis

Page 28

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output
Price $1.19 $1.39 $1.59 Total Observed N 8 6 8 22 Expected N 7.3 7.3 7.3 Residual .7 -1.3 .7

First column of the above table shows the three categories in price variable. The column labeled Observed N gives the actual number of cases falling in different categories of test variable, which is directly obtained from the data given. The column labeled Expected N gives the expected number of cases that should fall in each category of the test variable. The column labeled Residual gives the difference between observed and expected frequencies of each category, and it is commonly known as Error.

Test Statistics Chi-Square a df Asymp. Sig. Price .364 2 .834

a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 7.3.

Quantitative Techniques in Analysis

Page 29

Statistical Applications through SPSS

S. Ali Raza Naqvi

The above table gives the test results for Chi-Square Goodness of Fit Test. In this case the chi-square value is 0.364 with a degree of freedom 2. The p-value for the test is shown as 0.834 which is greater than 0.05, so we can accept our null hypothesis that Fit is good. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that our null hypothesis is correct and our test variable (price) follows a uniform distribution, and we are 16.6% confident in our decision and the rejection of the null hypothesis.

Chi-Square Test for Independence


Chi-Square test for independence is used to test the hypothesis that two categorical variables are independent of each other. A small chi-square statistics indicates that the null hypothesis is correct and that the two variables are independent of each other.

Data Source:

C:\SPSSEVAL\Employee Data

Variables: Here we analyze two different categorical variables i.e.


A) Gender of the employees B) Designation of the employees (Categorical) (Categorical)

Hypothesis:
H0: HA: Designation is independent of Sex. Designation is not independent of Sex.

SPSS Need: Method:

SPSS need two categorical variables for analyzing Chi-Square test for independence.

First of all enter the data in the data editor and the variables are labeled as Gender, Designation, respectively. Click on Analyze which will produce a drop down menu, choose Descriptive Statistics from that and click on Crosstabs, a dialogue box appears, in which all the input variables appear in the left-hand side of that box. Select the variable you want to create the row of your contingency table and transfer it to the box labeled Row(s), transfer the other variable to the box labeled Column(s). In this case we transfer gender to the box labeled Row(s) and designation to the box labeled column(s). Next, click on the Statistics button, which brings up a dialogue box. Here Tick the first box labeled Chi-Square and click continue to return to the previous screen. Click on OK to run the analysis.

Pictorial Representation
Analyze Descriptive Statistics Tick Chi-Square OK Crosstabs Drag Row and Column Variables

Quantitative Techniques in Analysis

Page 30

Statistical Applications through SPSS

S. Ali Raza Naqvi

Output
Quantitative Techniques in Analysis Page 31

Statistical Applications through SPSS

S. Ali Raza Naqvi

Gender * Employment Category Crosstabulation Count Employment Category Clerical Custodial Manager 206 0 10 157 27 74 363 27 84 Total 216 258 474

Gender Total

Female Male

Cross tabulation is used to examine the variation in the categorical data, it is a cross measuring analysis. Above we are cross examine the gender and designation of the employees. We take designation of the employees in the column and gender of the employees in row, and we have totaled 474 observations. The results are given in two rows; the first row shows the number of female employees in each employment category. The second row shows the number of male employees in each employment category.
Chi-Square Tests Value 79.277a 474 df 2 Asymp. Sig. (2-sided) .000

Pearson Chi-Square N of Valid Cases

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 12.30.

The above table gives the test results for the chi-square test for independence. The first row labeled Pearson ChiSquare shows that the value of 2 is 79.277 with 2 degree of freedom. The two-tailed p-value is shown as 0.000, which is less than 0.05, so we can reject our null hypothesis and conclude that the Designation is not independent of Sex. Conclusion: The test results are statistically significant at 5% level of significance and the data provide sufficient evidence to conclude that the designation of the employees is not independent of the sex, and we are almost 100% confident on our decision and the rejection of the null hypothesis.

Second Approach
Consider a case in which the data is not available and only the table labeled as Gender*Employment Category Crosstabulation in the above output, is given. On the basis of the output table you can easily find out the same result as above by using SPSS weight cases options. Below we briefly explain that how to enter the data on the basis of the table and to find out the desired results.

Method
First of all, in the variable view of SPSS define three variables and labeled them as Gender, Employment Category and Value. Now on the data view of the SPSS, enter the data in a different manner. We see that the table Quantitative Techniques in Analysis Page 32

Statistical Applications through SPSS

S. Ali Raza Naqvi

contains two rows and 3 columns. In row we have two categories i.e. Female and Male; similarly in columns we have three categories i.e. Clerical, Custodial, and Manager. Now the female and male employees, both are fall in the three employment category. So in the data view we simply define the row data i.e. Gender and its opposite define the column data i.e. Employment category and its corresponding frequencies in the Value column. The resulted data view is shown in the picture below.

. After defining the data just click on Data, which will produce a drop down menu, choose weight cases from that, a dialogue box appears in which all the variables are on the left hand side of that box. Tick weight cases by and drag value in the box labeled Frequency Variable by clicking on the arrow between the two boxes. Now click OK to return to the previous window.

The Further process is same as described above. Just define Gender in row and Employment category in Column. Tick Chi-square by clicking on the Statistics button. Now click OK to run the analysis. When the output appears, you will see that SPSS will give you the same result as we find out earlier through data.

Quantitative Techniques in Analysis

Page 33

Potrebbero piacerti anche